Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 713 Bytes

README.md

File metadata and controls

15 lines (11 loc) · 713 Bytes

Llama.cpp API

Prerequisites

  • Setup Tailscale on your devices
  • Git clone G. Gerganov's Llama.cpp from his Github repository
  • Download a .gguf model (mistral-7b-instruct-v0.2.Q4_K_M.gguf is recommended)
  • Place it inside ./models

Starting project

  • Git clone this repository inside Llama.cpp one's git clone git@github.com:MathieuDubart/Llama-cpp-api.git
  • Open server.py and change model path to match with your model name
  • Run python3 server.py in root directory
  • You can now access your API routes on every linked to Tailscale device, with your hosting device's Tailscale IP (Port 5000)) (e.g: 100.x.x.x:5000/generate)