Generative AI

LoRA

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that adds small trainable matrices to frozen pre-trained model weights. LoRA dramatically reduces the memory and compute required for fine-tuning large models.

Understanding LoRA

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that adapts large language models to new tasks by injecting small trainable matrices into existing model layers while keeping the original weights frozen. Instead of updating all billions of parameters during fine-tuning, LoRA decomposes weight updates into low-rank matrices, typically reducing trainable parameters by 10,000 times while maintaining performance comparable to full fine-tuning. This dramatically reduces GPU memory requirements and training time, making it feasible to customize large models on consumer hardware. LoRA adapters are small files that can be swapped and merged, enabling a single base model to serve many specialized applications. The technique has become the standard approach for fine-tuning open-source models like LLaMA and Mistral on platforms like Hugging Face. Variants such as QLoRA combine LoRA with quantization for even greater memory efficiency.

Is AI recommending your brand?

Find out if ChatGPT, Perplexity, and Gemini mention you when people search your industry.

Check your brand — $9

Loss Function

Back to full glossary

LoRA

Understanding LoRA

Is AI recommending your brand?

Related Generative AI Terms

Chain of Thought

ChatGPT

Claude

Diffusion Model

Discriminator

Few-Shot Prompting

Foundation Model

GAN