Batch Size
Batch size is a training hyperparameter that determines how many data samples are processed before the model's weights are updated, affecting training speed, memory usage, and model quality.
What is Batch Size?
Why Batch Size Matters for Business
Related Terms
Explore further
FAQ
Frequently asked questions
Common batch sizes are powers of 2 (32, 64, 128, 256) for hardware efficiency. The optimal choice depends on dataset size, model architecture, and available memory. Starting with 32 or 64 and adjusting based on training behaviour is a reasonable approach.
Larger batches process more data per step and can better utilise GPU parallelism, but they may require more steps to converge and careful learning rate adjustment. The relationship between batch size and wall-clock training time is not always straightforward.
Gradient accumulation is a technique that simulates larger batch sizes by processing multiple smaller batches and accumulating their gradients before performing a weight update. This allows training with effectively large batches on hardware with limited memory.
Need help implementing this?
Our team can help you apply these concepts to your business. Book a free strategy call.