💫 Learn
Large Language Models

Large Language Models

Download Models

LLM Leaderboards

LLM Search Tools


LLM Inference Software

8-bit System Requirements

ModelVRAM UsedMinimum Total VRAMCard ExamplesRAM/Swap to Load*
7B9.2GB10GB3060 12GB, 3080 10GB24 GB
13B16.3GB20GB3090, 3090 Ti, 409032 GB
30B36GB40GBA6000 48GB, A100 40GB64 GB
65B74GB80GBA100 80GB128 GB

4-bit System Requirements

ModelMinimum Total VRAMCard ExamplesRAM/Swap to Load*
7B6GBGTX 1660, 2060, AMD 5700 XT, RTX 3050, 30606 GB
13B10GBAMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A200012 GB
30B20GBRTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V10032 GB
65B40GBA100 40GB, 2x3090, 2x4090, A40, RTX A6000, 800064 GB

*System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM.

text-generation-webui (opens in a new tab)

text-generation-webui - a big community favorite gradio web UI by oobabooga designed for running almost any free open-source and large language models downloaded off of HuggingFace (opens in a new tab) which can be (but not limited to) models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and many others. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui (opens in a new tab) of text generation. It is highly compatible with many formats.

Exllama (opens in a new tab)

A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs.

gpt4all (opens in a new tab)

Open-source assistant-style large language models that run locally on your CPU. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade processors.

TavernAI (opens in a new tab)

The original branch of software SillyTavern was forked from. This chat interface offers very similar functionalities but has less cross-client compatibilities with other chat and API interfaces (compared to SillyTavern).

SillyTavern (opens in a new tab)

Developer-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI+proxies, Poe, WindowAI(Claude!)), Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. Based on a fork of TavernAI 1.2.8

Koboldcpp (opens in a new tab)

A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint. What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights.

KoboldAI-Client (opens in a new tab)

This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed.

h2oGPT (opens in a new tab)

h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.


Models

The Bloke

The Bloke is a developer who frequently releases quantized (GPTQ) and optimized (GGML) open-source, user-friendly versions of AI Large Language Models (LLMs).

These conversions of popular models can be configured and installed on personal (or professional) hardware, bringing bleeding-edge AI to the comfort of your home.

Support TheBloke (opens in a new tab) here.


70B


30B


13B


7B


More Models