Deep Learning

Transformer

The Transformer is a neural network architecture based on self-attention mechanisms that processes all input positions in parallel. Introduced in 2017, it became the foundation for virtually all modern large language models and many vision models.

Understanding Transformer

The transformer is a neural network architecture introduced in the landmark 2017 paper "Attention Is All You Need" that replaced recurrent processing with self-attention mechanisms, enabling parallel computation across entire sequences. This design breakthrough eliminated the sequential bottleneck of RNNs and LSTMs, allowing models to capture long-range dependencies efficiently. Transformers consist of encoder and decoder stacks built from multi-head attention layers and feed-forward networks, with positional encodings providing sequence order information. The architecture powers virtually all modern large language models including GPT, BERT, LLaMA, and Gemini, as well as vision transformers for image understanding and diffusion models for image generation. Transformers scale remarkably well with increased data and compute, a property that has driven the rapid advancement of generative AI capabilities over recent years.

Is AI recommending your brand?

Find out if ChatGPT, Perplexity, and Gemini mention you when people search your industry.

Check your brand — $9

Trustworthy AI

Back to full glossary

Transformer

Understanding Transformer

Is AI recommending your brand?

Related Deep Learning Terms

Activation Function

Adam Optimizer

Adapter Layers

Attention Mechanism

Autoencoder

Backpropagation

Batch Normalization

Batch Size