Large Language Models

Download Models

TheBloke (opens in a new tab)

LLM Leaderboards

LLM Search Tools

LLM Inference Software

8-bit System Requirements

Model	VRAM Used	Minimum Total VRAM	Card Examples	RAM/Swap to Load*
7B	9.2GB	10GB	3060 12GB, 3080 10GB	24 GB
13B	16.3GB	20GB	3090, 3090 Ti, 4090	32 GB
30B	36GB	40GB	A6000 48GB, A100 40GB	64 GB
65B	74GB	80GB	A100 80GB	128 GB

4-bit System Requirements

Model	Minimum Total VRAM	Card Examples	RAM/Swap to Load*
7B	6GB	GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060	6 GB
13B	10GB	AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000	12 GB
30B	20GB	RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100	32 GB
65B	40GB	A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000	64 GB

*System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM.

text-generation-webui (opens in a new tab)

text-generation-webui - a big community favorite gradio web UI by oobabooga designed for running almost any free open-source and large language models downloaded off of HuggingFace (opens in a new tab) which can be (but not limited to) models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and many others. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui (opens in a new tab) of text generation. It is highly compatible with many formats.

Exllama (opens in a new tab)

A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs.

gpt4all (opens in a new tab)

Open-source assistant-style large language models that run locally on your CPU. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade processors.

TavernAI (opens in a new tab)

The original branch of software SillyTavern was forked from. This chat interface offers very similar functionalities but has less cross-client compatibilities with other chat and API interfaces (compared to SillyTavern).

SillyTavern (opens in a new tab)

Developer-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI+proxies, Poe, WindowAI(Claude!)), Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. Based on a fork of TavernAI 1.2.8

Koboldcpp (opens in a new tab)

A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint. What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights.

KoboldAI-Client (opens in a new tab)

This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed.

h2oGPT (opens in a new tab)

h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.

Models

The Bloke

The Bloke is a developer who frequently releases quantized (GPTQ) and optimized (GGML) open-source, user-friendly versions of AI Large Language Models (LLMs).

These conversions of popular models can be configured and installed on personal (or professional) hardware, bringing bleeding-edge AI to the comfort of your home.

Support TheBloke (opens in a new tab) here.

https://ko-fi.com/TheBlokeAI (opens in a new tab)

70B

30B

30B-Epsilon-GPTQ (opens in a new tab)

13B

7B

More Models

How to Get Started with Ai What Is Artificial Intelligence