AI Hardware Calculator
Find out exactly what hardware you need to run AI models locally. Adjust the inputs below to match your use case.
Your Configuration
Model Size
Quantisation Level
Lower quantisation = less VRAM, slightly lower quality
Priority
Recommended Build
Required VRAM
5.0 GB
Recommended GPU
RTX 4060 (8 GB)
System RAM
32 GB
CPU
AMD Ryzen 5 / Intel i5
Storage
1 TB NVMe SSD
PSU
550W
Estimated Build Cost
£500 – £800
Build Tier
Entry
What You Can Run
- 7B chatbot (Llama 3.2, Mistral 7B)
- Code completion assistant
- Document summariser
- Local embeddings for RAG
Reference
VRAM Requirements by Model & Quantisation
| Model | FP16 | Q8 | Q5 | Q4 | Q3 |
|---|---|---|---|---|---|
| 7B | 14.0 GB | 7.0 GB | 4.4 GB | 3.5 GB | 2.6 GB |
| 13B | 26.0 GB | 13.0 GB | 8.1 GB | 6.5 GB | 4.9 GB |
| 30B | 60.0 GB | 30.0 GB | 18.8 GB | 15.0 GB | 11.3 GB |
| 70B | 140.0 GB | 70.0 GB | 43.8 GB | 35.0 GB | 26.3 GB |
| 120B+ | 240.0 GB | 120.0 GB | 75.0 GB | 60.0 GB | 45.0 GB |
Base VRAM only. Add KV cache overhead for long contexts and multiple concurrent users.
Software Stack
Recommended Software
Ollama
Easiest way to download and run models. One-command setup with built-in API.
llama.cpp
Maximum performance inference engine. Powers most local AI tools under the hood.
vLLM
Production serving with batching, OpenAI-compatible API. Best for multi-user setups.
Open WebUI
ChatGPT-like interface for local models. Works with Ollama out of the box.
LM Studio
GUI model manager for Mac and Windows. Great for non-technical users.
text-generation-webui
Feature-rich web interface with model loading, fine-tuning, and extensions.
Need Help With Your Local AI Setup?
We help UK businesses deploy AI on their own infrastructure — from hardware specification to production deployment.