Pinkstack
/

DistilGPT-OSS-qwen3-4B

Text Generation

text-generation-inference

Model card Files Files and versions

Pinkstack commited on Sep 19

Commit

96ea5c8

·

verified ·

1 Parent(s): f6f8725

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ tags:
 - distillation
 - math
 ---
 ![Distil gpt oss logo](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/RxoOFH7vZmoyyKOUlB_oX.png)
 # What it is
@@ -45,6 +45,18 @@ So the sequence starts:
 As you can see, you set the reasoning effort via the system prompt. We recommend going **2** lines down and only then putting "Reasoning effort: [low,medium,high]. For your information that output was generated by our model.
 # Additional information
 The model was trained using unsloth, using a mix of private datasets and public datasets.

 - distillation
 - math
 ---
+This is the bf16 safetensors variant
 ![Distil gpt oss logo](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/RxoOFH7vZmoyyKOUlB_oX.png)
 # What it is
 As you can see, you set the reasoning effort via the system prompt. We recommend going **2** lines down and only then putting "Reasoning effort: [low,medium,high]. For your information that output was generated by our model.
+# Examples
+1) "Is a banana an animal?" Reasoning was set to **high**.
+![Is a banana an animal?](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/f1N8knMusup4dugZ2WREB.png)
+2) "Write an HTML website about yourself" Reasoning was set to **medium**.
+![Write an HTML website about yourself](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/azInLvZ1KGpT5DXT2zCyV.png)
+3) "translate this to chinese: Hello! I am ChatGPT. A large language model by OpenAi." Reasoning was set to **low**.
+![translate this to chinese: Hello! I am ChatGPT. A large language model by OpenAi.](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/YH4Q0UY3aqeHRNhOgWv_V.png)
+As you can see, based on the reasoning effort of the model and your prompt, the model would think for a different amount of time.
+Keep in mind, these tests were done in LM Studio, GGUF q8_0 on a single consumer card (rtx 3080) where we got 95 - 80 Tokens/Second on 8192 context.
 # Additional information
 The model was trained using unsloth, using a mix of private datasets and public datasets.