Local & on-device AIUpdated today

vLLM

By vLLM (UC Berkeley Sky Lab)

Open-source high-throughput inference and serving engine using PagedAttention, supporting 200+ model architectures.

Best for

Recent changes

Jun 20, 2026Added — open-source high-throughput inference and serving engine using PagedAttention.

Run open-weight LLMs locally with a single command. Bundles model weights, quantizations, and an OpenAI-compatible HTTP API into a clean CLI.

Desktop GUI for downloading and chatting with local LLMs. The friendly way to try open-weight models without touching a terminal.

C/C++ inference engine for LLaMA-family models. The library that quietly powers most local AI apps — fast, low-level, runs on almost anything.

Open-source ChatGPT alternative that runs entirely offline. Built on llama.cpp with a clean desktop UI and an OpenAI-compatible API.

Apple's array framework for Apple Silicon. Designed to run ML workloads natively on M-series Macs with unified memory between CPU and GPU.

Open-source desktop app for running LLMs locally with a chat UI, document RAG, and a browsable model catalog.

Self-hosted, extensible ChatGPT-style web interface for local and remote models, with offline operation and RAG.

All-in-one desktop/self-hosted app for document chat (RAG) and agents over local or cloud models.