llamafile
By Mozilla
Packages an LLM and its runtime into a single executable file that runs locally across operating systems.
Best for
- single-file local LLM
- portable inference
- zero-install
Other Local & on-device AI
Ollama
Run open-weight LLMs locally with a single command. Bundles model weights, quantizations, and an OpenAI-compatible HTTP API into a clean CLI.
LM Studio
Desktop GUI for downloading and chatting with local LLMs. The friendly way to try open-weight models without touching a terminal.
llama.cpp
C/C++ inference engine for LLaMA-family models. The library that quietly powers most local AI apps — fast, low-level, runs on almost anything.
Jan
Open-source ChatGPT alternative that runs entirely offline. Built on llama.cpp with a clean desktop UI and an OpenAI-compatible API.
MLX
Apple's array framework for Apple Silicon. Designed to run ML workloads natively on M-series Macs with unified memory between CPU and GPU.
GPT4All
Open-source desktop app for running LLMs locally with a chat UI, document RAG, and a browsable model catalog.
Open WebUI
Self-hosted, extensible ChatGPT-style web interface for local and remote models, with offline operation and RAG.
AnythingLLM
All-in-one desktop/self-hosted app for document chat (RAG) and agents over local or cloud models.