MLX
By Apple
Apple's array framework for Apple Silicon. Designed to run ML workloads natively on M-series Macs with unified memory between CPU and GPU.
Best for
- running models on Apple Silicon
- fine-tuning on a MacBook
- unified-memory inference without GPU copies
Other Local & on-device AI
Ollama
Run open-weight LLMs locally with a single command. Bundles model weights, quantizations, and an OpenAI-compatible HTTP API into a clean CLI.
LM Studio
Desktop GUI for downloading and chatting with local LLMs. The friendly way to try open-weight models without touching a terminal.
llama.cpp
C/C++ inference engine for LLaMA-family models. The library that quietly powers most local AI apps — fast, low-level, runs on almost anything.
Jan
Open-source ChatGPT alternative that runs entirely offline. Built on llama.cpp with a clean desktop UI and an OpenAI-compatible API.
GPT4All
Open-source desktop app for running LLMs locally with a chat UI, document RAG, and a browsable model catalog.
Open WebUI
Self-hosted, extensible ChatGPT-style web interface for local and remote models, with offline operation and RAG.
AnythingLLM
All-in-one desktop/self-hosted app for document chat (RAG) and agents over local or cloud models.
Msty
Private desktop AI workspace that runs local and cloud models side by side with personas and automations, no setup.