StableLM
This model was released on 2023-09-05 and added to Hugging Face Transformers on 2024-02-14.
StableLM
Section titled “StableLM”
Overview
Section titled “Overview”StableLM 3B 4E1T (blog post) was proposed in StableLM 3B 4E1T: Technical Report by Stability AI and is the first model in a series of multi-epoch pre-trained language models.
Model Details
Section titled “Model Details”StableLM 3B 4E1T is a decoder-only base language model pre-trained on 1 trillion tokens of diverse English and code datasets for four epochs. The model architecture is transformer-based with partial Rotary Position Embeddings, SwiGLU activation, LayerNorm, etc.
We also provide StableLM Zephyr 3B, an instruction fine-tuned version of the model that can be used for chat-based applications.
Usage Tips
Section titled “Usage Tips”- The architecture is similar to LLaMA but with RoPE applied to 25% of head embedding dimensions, LayerNorm instead of RMSNorm, and optional QKV bias terms.
StableLM 3B 4E1T-based models uses the same tokenizer asGPTNeoXTokenizerFast.
StableLM 3B 4E1T and StableLM Zephyr 3B can be found on the Huggingface Hub
The following code snippet demonstrates how to use StableLM 3B 4E1T for inference:
>>> from transformers import AutoModelForCausalLM, AutoTokenizerfrom accelerate import Accelerator, set_seed>>> device = Accelerator().device # the device to load the model onto
>>> set_seed(0)
>>> tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-3b-4e1t")>>> model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-3b-4e1t")>>> model.to(device) # doctest: +IGNORE_RESULT
>>> model_inputs = tokenizer("The weather is always wonderful in", return_tensors="pt").to(model.device)
>>> generated_ids = model.generate(**model_inputs, max_length=32, do_sample=True)>>> responses = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)>>> responses['The weather is always wonderful in Costa Rica, which makes it a prime destination for retirees. That’s where the Pensionado program comes in, offering']Combining StableLM and Flash Attention 2
Section titled “Combining StableLM and Flash Attention 2”First, make sure to install the latest version of Flash Attention v2.
pip install -U flash-attn --no-build-isolationAlso make sure that your hardware is compatible with Flash-Attention 2. Read more about it in the official documentation of the flash-attn repository. Note: you must load your model in half-precision (e.g. torch.bfloat16).
Now, to run the model with Flash Attention 2, refer to the snippet below:
>>> import torch>>> from transformers import AutoModelForCausalLM, AutoTokenizerfrom accelerate import Accelerator, set_seed>>> device = Accelerator().device # the device to load the model onto
>>> set_seed(0)
>>> tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-3b-4e1t")>>> model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-3b-4e1t", dtype=torch.bfloat16, attn_implementation="flash_attention_2") # doctest: +SKIP>>> model.to(device) # doctest: +SKIP
>>> model_inputs = tokenizer("The weather is always wonderful in", return_tensors="pt").to(model.device)
>>> generated_ids = model.generate(**model_inputs, max_length=32, do_sample=True) # doctest: +SKIP>>> responses = tokenizer.batch_decode(generated_ids, skip_special_tokens=True) # doctest: +SKIP>>> responses # doctest: +SKIP['The weather is always wonderful in Costa Rica, which makes it a prime destination for retirees. That’s where the Pensionado program comes in, offering']StableLmConfig
Section titled “StableLmConfig”[[autodoc]] StableLmConfig
StableLmModel
Section titled “StableLmModel”[[autodoc]] StableLmModel - forward
StableLmForCausalLM
Section titled “StableLmForCausalLM”[[autodoc]] StableLmForCausalLM - forward
StableLmForSequenceClassification
Section titled “StableLmForSequenceClassification”[[autodoc]] StableLmForSequenceClassification - forward
StableLmForTokenClassification
Section titled “StableLmForTokenClassification”[[autodoc]] StableLmForTokenClassification - forward