All tools
Local & on-device AIUpdated today

vLLM

By vLLM (UC Berkeley Sky Lab)

Open-source high-throughput inference and serving engine using PagedAttention, supporting 200+ model architectures.

Best for

  • self-hosted serving
  • high-throughput inference
  • production deployment
Recent changes
  • Jun 20, 2026Added — open-source high-throughput inference and serving engine using PagedAttention.

Other Local & on-device AI