GptOss
This model was released on 2025-08-05 and added to Hugging Face Transformers on 2025-08-05.
GptOss
Section titled “GptOss”Overview
Section titled “Overview”The GptOss model was proposed in blog post by
The abstract from the paper is the following:
Tips:
- Attention Sinks with Flex Attention: When using flex attention, attention sinks require special handling. Unlike with standard attention implementations where sinks can be added directly to attention scores, flex attention
score_modfunction operates on individual score elements rather than the full attention matrix. Therefore, attention sinks renormalization have to be applied after the flex attention computations by renormalizing the outputs using the log-sum-exp (LSE) values returned by flex attention.
This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/
GptOssConfig
Section titled “GptOssConfig”[[autodoc]] GptOssConfig
GptOssModel
Section titled “GptOssModel”[[autodoc]] GptOssModel - forward
GptOssForCausalLM
Section titled “GptOssForCausalLM”[[autodoc]] GptOssForCausalLM - forward
GptOssForSequenceClassification
Section titled “GptOssForSequenceClassification”[[autodoc]] GptOssForSequenceClassification - forward
GptOssForTokenClassification
Section titled “GptOssForTokenClassification”[[autodoc]] GptOssForTokenClassification - forward