EXAONE 4
This model was released on 2025-07-15 and added to Hugging Face Transformers on 2025-07-26.
EXAONE 4
Section titled “EXAONE 4”Overview
Section titled “Overview”EXAONE 4.0 model is the language model, which integrates a Non-reasoning mode and Reasoning mode to achieve both the excellent usability of EXAONE 3.5 and the advanced reasoning abilities of EXAONE Deep. To pave the way for the agentic AI era, EXAONE 4.0 incorporates essential features such as agentic tool use, and its multilingual capabilities are extended to support Spanish in addition to English and Korean.
The EXAONE 4.0 model series consists of two sizes: a mid-size 32B model optimized for high performance, and a small-size 1.2B model designed for on-device applications.
In the EXAONE 4.0 architecture, we apply new architectural changes compared to previous EXAONE models as below:
- Hybrid Attention: For the 32B model, we adopt hybrid attention scheme, which combines Local attention (sliding window attention) with Global attention (full attention) in a 3:1 ratio. We do not use RoPE (Rotary Positional Embedding) for global attention for better global context understanding.
- QK-Reorder-Norm: We reorder the LayerNorm position from the traditional Pre-LN scheme by applying LayerNorm directly to the attention and MLP outputs, and we add RMS normalization right after the Q and K projection. It helps yield better performance on downstream tasks despite consuming more computation.
For more details, please refer to our technical report, HuggingFace paper, blog, and GitHub.
All model weights including quantized versions are available at Huggingface Collections.
Model Details
Section titled “Model Details”Model Specifications
Section titled “Model Specifications”| Model Configuration | 32B | 1.2B |
|---|---|---|
| d_model | 5,120 | 2,048 |
| Number of layers | 64 | 30 |
| Normalization | QK-Reorder-LN | QK-Reorder-LN |
| Non-linearity | SwiGLU | SwiGLU |
| Feedforward dimension | 27,392 | 4,096 |
| Attention type | Hybrid (3:1 Local-Global) | Global |
| Head type | GQA | GQA |
| Number of heads | 40 | 32 |
| Number of KV heads | 8 | 8 |
| Head size | 128 | 64 |
| Max sequence length | 131,072 | 65,536 |
| RoPE theta | 1,000,000 | 1,000,000 |
| Tokenizer | BBPE | BBPE |
| Vocab size | 102,400 | 102,400 |
| Tied word embedding | False | True |
| Knowledge cut-off | Nov. 2024 | Nov. 2024 |
Usage tips
Section titled “Usage tips”Non-reasoning mode
Section titled “Non-reasoning mode”For general use, you can use the EXAONE 4.0 models with the following example:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "LGAI-EXAONE/EXAONE-4.0-32B"
model = AutoModelForCausalLM.from_pretrained( model_name, dtype="bfloat16", device_map="auto")tokenizer = AutoTokenizer.from_pretrained(model_name)
# choose your promptprompt = "Explain how wonderful you are"prompt = "Explica lo increíble que eres"prompt = "너가 얼마나 대단한지 설명해 봐"
messages = [ {"role": "user", "content": prompt}]input_ids = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
output = model.generate( input_ids.to(model.device), max_new_tokens=128, do_sample=False,)print(tokenizer.decode(output[0]))Reasoning mode
Section titled “Reasoning mode”The EXAONE 4.0 models have reasoning capabilities for handling complex problems. You can activate reasoning mode by using the enable_thinking=True argument with the tokenizer, which opens a reasoning block that starts with <think> tag without closing it.
messages = [ {"role": "user", "content": "Which one is bigger, 3.12 vs 3.9?"}]input_ids = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", enable_thinking=True,)
output = model.generate( input_ids.to(model.device), max_new_tokens=128, do_sample=True, temperature=0.6, top_p=0.95)print(tokenizer.decode(output[0]))Agentic tool use
Section titled “Agentic tool use”The EXAONE 4.0 models can be used as agents with their tool calling capabilities. You can provide tool schemas to the model for effective tool calling.
import random
def roll_dice(max_num: int): return random.randint(1, max_num)
tools = [ { "type": "function", "function": { "name": "roll_dice", "description": "Roll a dice with the number 1 to N. User can select the number N.", "parameters": { "type": "object", "required": ["max_num"], "properties": { "max_num": { "type": "int", "description": "Max number of the dice" } } } } }]
messages = [ {"role": "user", "content": "Roll D6 dice twice!"}]input_ids = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", tools=tools,)
output = model.generate( input_ids.to(model.device), max_new_tokens=1024, do_sample=True, temperature=0.6, top_p=0.95,)print(tokenizer.decode(output[0]))Exaone4Config
Section titled “Exaone4Config”[[autodoc]] Exaone4Config
Exaone4Model
Section titled “Exaone4Model”[[autodoc]] Exaone4Model - forward
Exaone4ForCausalLM
Section titled “Exaone4ForCausalLM”[[autodoc]] Exaone4ForCausalLM - forward
Exaone4ForSequenceClassification
Section titled “Exaone4ForSequenceClassification”[[autodoc]] Exaone4ForSequenceClassification - forward
Exaone4ForTokenClassification
Section titled “Exaone4ForTokenClassification”[[autodoc]] Exaone4ForTokenClassification - forward
Exaone4ForQuestionAnswering
Section titled “Exaone4ForQuestionAnswering”[[autodoc]] Exaone4ForQuestionAnswering - forward