Skip to content

GptOss

This model was released on 2025-08-05 and added to Hugging Face Transformers on 2025-08-05.

PyTorch FlashAttention SDPA

The GptOss model was proposed in blog post by .

The abstract from the paper is the following:

Tips:

  • Attention Sinks with Flex Attention: When using flex attention, attention sinks require special handling. Unlike with standard attention implementations where sinks can be added directly to attention scores, flex attention score_mod function operates on individual score elements rather than the full attention matrix. Therefore, attention sinks renormalization have to be applied after the flex attention computations by renormalizing the outputs using the log-sum-exp (LSE) values returned by flex attention.

This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/). The original code can be found here.

[[autodoc]] GptOssConfig

[[autodoc]] GptOssModel - forward

[[autodoc]] GptOssForCausalLM - forward

[[autodoc]] GptOssForSequenceClassification - forward

[[autodoc]] GptOssForTokenClassification - forward