llama.cpp
By ggml-org
C/C++ inference engine for LLaMA-family models. The library that quietly powers most local AI apps — fast, low-level, runs on almost anything.
Best for
- embedding LLM inference into apps
- maximum performance on CPU and consumer GPUs
- quantization to tiny memory footprints
Other Local & on-device AI
Ollama
Run open-weight LLMs locally with a single command. Bundles model weights, quantizations, and an OpenAI-compatible HTTP API into a clean CLI.
LM Studio
Desktop GUI for downloading and chatting with local LLMs. The friendly way to try open-weight models without touching a terminal.
Jan
Open-source ChatGPT alternative that runs entirely offline. Built on llama.cpp with a clean desktop UI and an OpenAI-compatible API.
MLX
Apple's array framework for Apple Silicon. Designed to run ML workloads natively on M-series Macs with unified memory between CPU and GPU.
GPT4All
Open-source desktop app for running LLMs locally with a chat UI, document RAG, and a browsable model catalog.
Open WebUI
Self-hosted, extensible ChatGPT-style web interface for local and remote models, with offline operation and RAG.
AnythingLLM
All-in-one desktop/self-hosted app for document chat (RAG) and agents over local or cloud models.
Msty
Private desktop AI workspace that runs local and cloud models side by side with personas and automations, no setup.