Local & on-device AI

llamafile

By Mozilla

Packages an LLM and its runtime into a single executable file that runs locally across operating systems.

Best for

Run open-weight LLMs locally with a single command. Bundles model weights, quantizations, and an OpenAI-compatible HTTP API into a clean CLI.

Desktop GUI for downloading and chatting with local LLMs. The friendly way to try open-weight models without touching a terminal.

C/C++ inference engine for LLaMA-family models. The library that quietly powers most local AI apps — fast, low-level, runs on almost anything.

Open-source ChatGPT alternative that runs entirely offline. Built on llama.cpp with a clean desktop UI and an OpenAI-compatible API.

Apple's array framework for Apple Silicon. Designed to run ML workloads natively on M-series Macs with unified memory between CPU and GPU.

Open-source desktop app for running LLMs locally with a chat UI, document RAG, and a browsable model catalog.

Self-hosted, extensible ChatGPT-style web interface for local and remote models, with offline operation and RAG.

All-in-one desktop/self-hosted app for document chat (RAG) and agents over local or cloud models.