ONNX Runtime
ONNX Runtime is an open-source inference engine that runs AI models in the ONNX (Open Neural Network Exchange) format, enabling optimised, cross-platform model deployment across CPUs, GPUs, and specialised hardware.
What is ONNX Runtime?
Why ONNX Runtime Matters for Business
Related Terms
Explore further
FAQ
Frequently asked questions
ONNX (Open Neural Network Exchange) is an open format for representing machine learning models. It defines a common set of operators and a file format that allows models to be transferred between different frameworks and tools. ONNX Runtime is the engine that runs ONNX models.
Most models from major frameworks (PyTorch, TensorFlow, scikit-learn) can be exported to ONNX. Some very new or custom operators may not have ONNX equivalents yet, but coverage is comprehensive for standard architectures.
Yes, with caveats. ONNX Runtime supports LLM inference and offers optimisations for transformer models. However, for very large LLMs, specialised serving solutions like vLLM or TensorRT-LLM may offer better performance for specific deployment scenarios.
Need help implementing this?
Our team can help you apply these concepts to your business. Book a free strategy call.