codewithdark commited on
Commit
ab64bcc
·
verified ·
1 Parent(s): a18d68c

Upload model via QuantLLM

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. Llama-3.2-3B.Q5_K_M.gguf +3 -0
  3. README.md +127 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Llama-3.2-3B.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.2-3B.Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3d3b7993207a6cb6f53b1624a574ae3a43eb7d53c70aa3a9d833c79ece1283e
3
+ size 2322149696
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: meta-llama/Llama-3.2-3B
4
+ library_name: gguf
5
+ language:
6
+ - en
7
+ tags:
8
+ - quantllm
9
+ - gguf
10
+ - llama-cpp
11
+ - quantized
12
+ - transformers
13
+ - q5_k_m
14
+ ---
15
+
16
+ # Llama-3.2-3B-5bit-gguf
17
+ ![Format](https://img.shields.io/badge/format-GGUF-orange) ![Quantization](https://img.shields.io/badge/quantization-Q5_K_M-blue) ![QuantLLM](https://img.shields.io/badge/made%20with-QuantLLM-green)
18
+
19
+
20
+ ## Description
21
+
22
+ This is **meta-llama/Llama-3.2-3B** converted to GGUF format for use with llama.cpp, Ollama, LM Studio, and other compatible tools.
23
+
24
+ - **Base Model**: [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)
25
+ - **Format**: GGUF
26
+ - **Quantization**: Q5_K_M
27
+ - **Created with**: [QuantLLM](https://github.com/codewithdark-git/QuantLLM)
28
+
29
+
30
+ ## Usage
31
+
32
+ ### With llama.cpp
33
+
34
+ ```bash
35
+ # Download the model
36
+ huggingface-cli download QuantLLM/Llama-3.2-3B-5bit-gguf Llama-3.2-3B-5bit-gguf.Q5_K_M.gguf --local-dir .
37
+
38
+ # Run with llama.cpp
39
+ ./llama-cli -m Llama-3.2-3B-5bit-gguf.Q5_K_M.gguf -p "Hello, how are you?" -n 128
40
+ ```
41
+
42
+ ### With Ollama
43
+
44
+ ```bash
45
+ # Create a Modelfile
46
+ echo 'FROM ./Llama-3.2-3B-5bit-gguf.Q5_K_M.gguf' > Modelfile
47
+
48
+ # Create the model
49
+ ollama create llama-3.2-3b-5bit-gguf -f Modelfile
50
+
51
+ # Run
52
+ ollama run llama-3.2-3b-5bit-gguf
53
+ ```
54
+
55
+ ### With LM Studio
56
+
57
+ 1. Download the `.gguf` file from this repository
58
+ 2. Open LM Studio and go to the Models tab
59
+ 3. Click "Add Model" and select the downloaded file
60
+ 4. Start chatting!
61
+
62
+ ### With Python (llama-cpp-python)
63
+
64
+ ```python
65
+ from llama_cpp import Llama
66
+
67
+ llm = Llama.from_pretrained(
68
+ repo_id="QuantLLM/Llama-3.2-3B-5bit-gguf",
69
+ filename="Llama-3.2-3B-5bit-gguf.Q5_K_M.gguf",
70
+ )
71
+
72
+ output = llm(
73
+ "Write a story about a robot:",
74
+ max_tokens=256,
75
+ echo=True
76
+ )
77
+ print(output["choices"][0]["text"])
78
+ ```
79
+
80
+
81
+ ## Model Details
82
+
83
+ | Property | Value |
84
+ |----------|-------|
85
+ | Base Model | [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) |
86
+ | Format | GGUF |
87
+ | Quantization | Q5_K_M |
88
+ | License | apache-2.0 |
89
+ | Created | 2025-12-20 |
90
+
91
+
92
+ ## Quantization Details
93
+
94
+ - **Type**: Q5_K_M
95
+ - **Bits**: 5-bit
96
+ - **Description**: Good size-quality trade-off
97
+
98
+ ### Available Quantizations
99
+
100
+ | Quantization | Bits | Use Case |
101
+ |--------------|------|----------|
102
+ | Q2_K | 2-bit | Minimum size, experimental |
103
+ | Q3_K_M | 3-bit | Very constrained environments |
104
+ | Q4_K_M | 4-bit | **Recommended** for most users |
105
+ | Q5_K_M | 5-bit | Higher quality, more memory |
106
+ | Q6_K | 6-bit | Near-original quality |
107
+ | Q8_0 | 8-bit | Best quality, largest size |
108
+
109
+
110
+ ---
111
+
112
+ ## About QuantLLM
113
+
114
+ This model was converted using [QuantLLM](https://github.com/codewithdark-git/QuantLLM) -
115
+ the ultra-fast LLM quantization and export library.
116
+
117
+ ```python
118
+ from quantllm import turbo
119
+
120
+ # Load and quantize any model
121
+ model = turbo("meta-llama/Llama-3.2-3B")
122
+
123
+ # Export to any format
124
+ model.export("gguf", quantization="Q5_K_M")
125
+ ```
126
+
127
+ ⭐ Star us on [GitHub](https://github.com/codewithdark-git/QuantLLM)!