NanoChat

This model was released on {release_date} and added to Hugging Face Transformers on 2025-11-27.

NanoChat

NanoChat is a compact decoder-only transformer model designed for educational purposes and efficient training. The model features several fundamental architectural innovations which are common in modern transformer models. Therefore, it is a good model to use as a starting point to understand the principles of modern transformer models. NanoChat is a variant of the Llama architecture, with simplified attention mechanism and normalization layers.

The architecture is based on nanochat by Andrej Karpathy, adapted for the Hugging Face Transformers library by Ben Burtenshaw.

The example below demonstrates how to use NanoChat for text generation with chat templates.

import torch
from transformers import pipeline

chatbot = pipeline(
    task="text-generation",
    model="karpathy/nanochat-d32",
    dtype=torch.bfloat16,
    device=0
)

conversation = [
    {"role": "user", "content": "What is the capital of France?"},
]

outputs = chatbot(conversation, max_new_tokens=64)
print(outputs[0]["generated_text"][-1]["content"])

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "karpathy/nanochat-d32"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)

conversation = [
    {"role": "user", "content": "What is the capital of France?"},
]

inputs = tokenizer.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
    )

# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))

echo -e '{"role": "user", "content": "What is the capital of France?"}' | transformers run --task text-generation --model karpathy/nanochat-d32 --device 0

NanoChatConfig

[[autodoc]] NanoChatConfig

NanoChatModel

[[autodoc]] NanoChatModel - forward

NanoChatForCausalLM

[[autodoc]] NanoChatForCausalLM - forward