All tools
Local & on-device AI

llama.cpp

By ggml-org

C/C++ inference engine for LLaMA-family models. The library that quietly powers most local AI apps — fast, low-level, runs on almost anything.

Best for

  • embedding LLM inference into apps
  • maximum performance on CPU and consumer GPUs
  • quantization to tiny memory footprints

Other Local & on-device AI