What is Self-Attention?

Self-Attention

Deep Learning

Self-attention is an attention mechanism where each element in a sequence computes attention scores with every other element in the same sequence. It enables Transformers to capture long-range dependencies regardless of distance.

Understanding Self-Attention

Self-attention is the mechanism that allows each element in a sequence to compute relevance scores with every other element, enabling a model to capture long-range dependencies regardless of distance. It forms the backbone of the transformer architecture, replacing the sequential processing of recurrent neural networks with highly parallelizable computations. In a self-attention layer, each token generates query, key, and value vectors, and attention weights are computed through scaled dot-product operations. Multi-head attention extends this by running several attention computations in parallel, capturing different types of relationships simultaneously. Self-attention powers models like BERT and GPT, enabling breakthroughs in text generation, machine translation, and even computer vision through vision transformers.

Self-Supervised Learning

Back to glossary

Self-Attention

Understanding Self-Attention

Related in Deep Learning

Activation Function

Adam Optimizer

Adapter Layers

Attention Mechanism

Autoencoder

Backpropagation

Batch Normalization

Batch Size