The official API server for Exllama. OAI compatible, lightweight, and fast.
  • Python 95.4%
  • Jupyter Notebook 2.5%
  • Shell 0.7%
  • Batchfile 0.7%
  • Dockerfile 0.4%
  • Other 0.3%
Find a file
Repository files (latest commit first)
Filename Latest commit message Latest commit date
turboderp 2e50555d37
Some checks failed
Build and publish a Docker image / build-and-push-image (push) Has been cancelled
ruff / ruff (3.10) (push) Has been cancelled
Deploy to Wiki / deploy-wiki (push) Has been cancelled
Deploy OpenAPI docs to Pages / deploy (push) Has been cancelled
Dependencies: Update exllamav3
2026-06-06 23:15:43 +02:00
.github Actions: Update and add Wiki publish 2025-02-17 23:47:38 -05:00
backends exllamav3: Include stop conditions from backend tokenizer 2026-06-02 15:28:56 +02:00
colab Start: Migrate options from cu121/118 to cu12 2025-08-19 22:56:58 -04:00
common Dependencies: Don't try to import exllamav2 2026-06-05 01:58:47 +02:00
docker Docker: Update compose service 2026-05-24 20:33:03 +02:00
docs Tools: Add Step 3.7 tool format alias (qwen3_coder compatible) 2026-06-02 17:21:53 +02:00
endpoints Tools: Add Step 3.7 tool format alias (qwen3_coder compatible) 2026-06-02 17:21:53 +02:00
loras Implement lora support (#24) 2023-12-08 23:38:08 -05:00
models Tree: Update documentation and configs 2023-11-16 02:30:33 -05:00
sampler_overrides Sampling: Add adaptive-P params 2026-01-20 19:09:54 +01:00
templates Rework tool calls and OAI chat completions 2026-03-30 00:22:55 +02:00
tests Dependencies: Add cu13 install option and Dockerfile (exllamav3 only) 2026-05-23 01:21:18 +02:00
tools Logging: Add comprehensive request logging option 2026-05-27 00:33:45 +02:00
update_scripts Start: Make linux scripts executable 2024-08-03 15:19:31 -04:00
.dockerignore debloat docker build 2024-09-08 00:02:00 +01:00
.gitignore OAI: Log raw requests 2026-03-30 01:23:16 +02:00
api_tokens_sample.yml Improve docker deployment configuration (#163) 2024-08-18 15:19:18 -04:00
config_sample.yml Logging: Add comprehensive request logging option 2026-05-27 00:33:45 +02:00
formatting.bat feat: workflows for formatting/linting (#35) 2023-12-22 16:20:35 +00:00
formatting.sh feat: workflows for formatting/linting (#35) 2023-12-22 16:20:35 +00:00
LICENSE Create LICENSE 2023-11-16 17:43:23 -05:00
main.py Dependencies: Add cu13 install option and Dockerfile (exllamav3 only) 2026-05-23 01:21:18 +02:00
pyproject.toml Dependencies: Update exllamav3 2026-06-06 23:15:43 +02:00
README.md Add docker instructions to README.md 2026-05-10 11:26:53 +02:00
start.bat Start: Add check for uv 2025-08-19 22:57:03 -04:00
start.py Dependencies: Add cu13 install option and Dockerfile (exllamav3 only) 2026-05-23 01:21:18 +02:00
start.sh Start: Add check for uv 2025-08-19 22:57:03 -04:00

TabbyAPI

Python 3.10, 3.11, and 3.12 License: AGPL v3 Discord Server

Developer facing API documentation

Support on Ko-Fi

Important

In addition to the README, please read the Wiki page for information about getting started!

Note

Need help? Join the Discord Server and get the Tabby role. Please be nice when asking questions.

Note

Tool calling support has been revamped and now no longer relies on modified Jinja templates. See the docs for more.

Note

Want to run GGUF models? Take a look at YALS, TabbyAPI's sister project.

A FastAPI based application that allows for generating text using an LLM (large language model) using the Exllamav2 and Exllamav3 backends.

TabbyAPI is also the official API backend server for ExllamaV2 and V3.

Disclaimer

This project is marked as rolling release. There may be bugs and changes down the line. Please be aware that you might need to reinstall dependencies if needed.

TabbyAPI is a hobby project made for a small amount of users. It is not meant to run on production servers. For that, please look at other solutions that support those workloads.

Getting Started

Important

Looking for more information? Check out the Wiki.

For a step-by-step guide, choose the format that works best for you:

📖 Read the Wiki Covers installation, configuration, API usage, and more.

🎥 Watch the Video Guide A hands-on walkthrough to get you up and running quickly.

Docker

TabbyAPI publishes a CUDA image to GitHub Container Registry. Install Docker and the NVIDIA container toolkit, then start the published image with your models directory mounted into the container:

docker pull ghcr.io/theroyallab/tabbyapi:latest
docker run --gpus all --name tabbyapi -p 5000:5000 -v /path/to/models:/app/models ghcr.io/theroyallab/tabbyapi:latest

Replace /path/to/models with the folder that contains your local model directories. The API is exposed on http://localhost:5000.

For Docker Compose, custom config mounts, or building the image locally, see the Docker instructions.

Features

  • OpenAI compatible API
  • Loading/unloading models
  • HuggingFace model downloading
  • Embedding model support
  • JSON schema + Regex + EBNF support
  • AI Horde support
  • Speculative decoding via draft models
  • Multi-lora with independent scaling (ex. a weight of 0.9)
  • Inbuilt proxy to override client request parameters/samplers
  • Flexible Jinja2 template engine for chat completions that conforms to HuggingFace
  • Concurrent inference with asyncio
  • Utilizes modern python paradigms
  • Continuous batching engine using paged attention
  • Fast classifier-free guidance
  • OAI style tool/function calling

And much more. If something is missing here, PR it in!

Supported Model Types

TabbyAPI uses Exllama as a powerful and fast backend for model inference, loading, etc. Therefore, the following types of models are supported:

  • Exl2/GPTQ (deprecated, will be removed in the near future)

  • Exl3 (Highly recommended)

  • FP16/BF16

In addition, TabbyAPI supports parallel batching using paged attention for Nvidia Ampere GPUs and higher.

Contributing

Use the template when creating issues or pull requests, otherwise the developers may not look at your post.

If you have issues with the project:

  • Describe the issue in detail

  • If you have a feature request, please indicate it as such.

If you have a Pull Request

  • Describe the pull request in detail, what, and why you are changing something

Acknowldgements

TabbyAPI would not exist without the work of other contributors and FOSS projects:

Developers and Permissions

Creators/Developers: