Local & on-device AI

llama.cpp

By ggml-org

C/C++ inference engine for LLaMA-family models. The library that quietly powers most local AI apps — fast, low-level, runs on almost anything.

Best for

Run open-weight LLMs locally with a single command. Bundles model weights, quantizations, and an OpenAI-compatible HTTP API into a clean CLI.

Desktop GUI for downloading and chatting with local LLMs. The friendly way to try open-weight models without touching a terminal.

Open-source ChatGPT alternative that runs entirely offline. Built on llama.cpp with a clean desktop UI and an OpenAI-compatible API.

Apple's array framework for Apple Silicon. Designed to run ML workloads natively on M-series Macs with unified memory between CPU and GPU.

Open-source desktop app for running LLMs locally with a chat UI, document RAG, and a browsable model catalog.

Self-hosted, extensible ChatGPT-style web interface for local and remote models, with offline operation and RAG.

All-in-one desktop/self-hosted app for document chat (RAG) and agents over local or cloud models.

Private desktop AI workspace that runs local and cloud models side by side with personas and automations, no setup.