LoRA (Low-Rank Adaptation)
LoRA is a parameter-efficient fine-tuning technique that trains a small set of adapter weights instead of modifying all model parameters, making it possible to customise large AI models quickly and affordably.
What is LoRA?
How LoRA Works
Why LoRA Matters for Business
Practical Applications
Related Terms
Explore further
FAQ
Frequently asked questions
LoRA fine-tuning can be done on a single consumer GPU (16-24GB VRAM) in a few hours for most tasks. Cloud compute costs are typically under 100 USD for a training run. The main investment is in data preparation rather than compute.
LoRA trains adapter weights on a full-precision base model. QLoRA quantises the base model to 4-bit precision first, then applies LoRA adapters. QLoRA uses significantly less memory, making it possible to fine-tune larger models on smaller GPUs with comparable results.
Yes. Multiple LoRA adapters can be merged together or switched between at inference time. This enables maintaining a library of task-specific adapters that can be applied to a single base model as needed, without reloading the entire model.
Need help implementing this?
Our team can help you apply these concepts to your business. Book a free strategy call.