Manojb commited on
Commit
f725ac6
·
verified ·
1 Parent(s): e24b69a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +155 -154
README.md CHANGED
@@ -1,154 +1,155 @@
1
- ---
2
- base_model: agentrl/ReSearch-Qwen-7B
3
- datasets:
4
- - RUC-NLPIR/FlashRAG_datasets
5
- language:
6
- - en
7
- library_name: transformers
8
- license: mit
9
- quantized_by: mradermacher
10
- pipeline_tag: text-generation
11
- tags:
12
- - function-calling
13
- - tool-calling
14
- - codex
15
- - local-llm
16
- - gguf
17
- - 6gb-vram
18
- - ollama
19
- - code-assistant
20
- - api-tools
21
- - openai-alternative
22
- ---
23
-
24
- This is a packged Q8_0 only model from https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF that runs on 9-12GB VRAM without any quality loss.
25
-
26
-
27
-
28
-
29
- <!-- ### quantize_version: 2 -->
30
- <!-- ### output_tensor_quantised: 1 -->
31
- <!-- ### convert_type: hf -->
32
- <!-- ### vocab_type: -->
33
- <!-- ### tags: -->
34
-
35
- <!-- provided-files -->
36
- weighted/imatrix quants are available at https://huggingface.co/mradermacher/ReSearch-Qwen-7B-i1-GGUF
37
-
38
-
39
- </div>
40
-
41
- <p align="center">
42
- <img src="https://huggingface.co/agentrl/ReSearch-Qwen-7B/blob/main/assets/intro_bar.png" width="90%" alt="Intro" />
43
- <img src="https://huggingface.co/agentrl/ReSearch-Qwen-7B/blob/main/assets/method.png" width="90%" alt="Method" />
44
- </p>
45
-
46
-
47
- For this base model DONT apply the chat completion
48
-
49
- ## Setup
50
-
51
- Install ollama
52
- ```bash
53
- curl -fsSL https://ollama.com/install.sh | sh
54
- ```
55
-
56
- Go into your favourite folder
57
-
58
- ```bash
59
- # make sure you hve Python 3.8+
60
- # apt-get update && apt-get install libcurl build-essential curl
61
- pip install huggingface-hub ollama
62
- huggingface-cli download Manojb/Qwen-7B-toolcalling-ReSearch-gguf-Q8_0 --download-dir Qwen-7B-toolcalling-ReSearch-gguf-Q8_0
63
- cd "$(find . -type d -iname '*Qwen-7B-toolcalling-ReSearch-gguf-Q8_0*' | head -n 1)"
64
- source run_model.sh
65
- ```
66
-
67
- Or
68
-
69
- ```bash
70
- # Download and run instantly
71
- ollama create qwen-7b:toolcall -f ModelFile
72
- ollama run qwen-7b:toolcall # without chat completion
73
- ```
74
-
75
-
76
- ### Basic Function Calling
77
- ```python
78
- # Load with Ollama
79
- import requests
80
-
81
- response = requests.post('http://localhost:11434/api/generate', json={
82
- 'model': 'qwen-7b:toolcall',
83
- 'prompt': 'Get the current weather in San Francisco and convert to Celsius',
84
- 'stream': False
85
- })
86
-
87
- print(response.json()['response'])
88
- ```
89
-
90
- for Instruct models:
91
- ```bash
92
- curl http://localhost:11434/api/chat -d '{
93
- "model": "llama3.2",
94
- "stream": false,
95
- "messages": [
96
- {"role": "system", "content": "You are a helpful assistant."},
97
- {"role": "user", "content": "Why is the sky blue?"}
98
- ]
99
- }'
100
- ```
101
-
102
- ```python
103
- from ollama import chat
104
-
105
- # Your custom model name here
106
- model_name = "qwen-7b:toolcall"
107
-
108
- messages = [
109
- {"role": "system", "content": "You are an instruct model."},
110
- {"role": "user", "content": "Explain how to use this custom model in Python."}
111
- ]
112
-
113
- response = chat(model=model_name, messages=messages)
114
- print(response.message.content)
115
- ```
116
-
117
-
118
- ***ReSearch***, a novel framework that trains LLMs to ***Re***ason with ***Search*** via reinforcement learning without using any supervised data on reasoning steps. Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning.
119
-
120
-
121
-
122
-
123
- ## Usage
124
-
125
- If you are unsure how to use GGUF files, refer to one of [TheBloke's
126
- READMEs](https://huggingface.co/TheBloke/KafkaLM-70B-German-V0.1-GGUF) for
127
- more details, including on how to concatenate multi-part files.
128
-
129
-
130
-
131
- ## Provided Quants
132
-
133
- (sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
134
-
135
- | Link | Type | Size/GB | Notes |
136
- |:-----|:-----|--------:|:------|
137
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q2_K.gguf) | Q2_K | 3.1 | |
138
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q3_K_S.gguf) | Q3_K_S | 3.6 | |
139
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q3_K_M.gguf) | Q3_K_M | 3.9 | lower quality |
140
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q3_K_L.gguf) | Q3_K_L | 4.2 | |
141
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.IQ4_XS.gguf) | IQ4_XS | 4.4 | |
142
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q4_K_S.gguf) | Q4_K_S | 4.6 | fast, recommended |
143
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q4_K_M.gguf) | Q4_K_M | 4.8 | fast, recommended |
144
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q5_K_S.gguf) | Q5_K_S | 5.4 | |
145
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q5_K_M.gguf) | Q5_K_M | 5.5 | |
146
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q6_K.gguf) | Q6_K | 6.4 | very good quality |
147
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q8_0.gguf) | Q8_0 | 8.2 | fast, best quality |
148
- | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.f16.gguf) | f16 | 15.3 | 16 bpw, overkill |
149
-
150
- Here is a handy graph by ikawrakow comparing some lower-quality quant
151
- types (lower is better):
152
-
153
- ![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)
154
-
 
 
1
+ ---
2
+ base_model: agentrl/ReSearch-Qwen-7B
3
+ datasets:
4
+ - RUC-NLPIR/FlashRAG_datasets
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ license: mit
9
+ quantized_by: mradermacher
10
+ pipeline_tag: text-generation
11
+ tags:
12
+ - function-calling
13
+ - tool-calling
14
+ - codex
15
+ - local-llm
16
+ - gguf
17
+ - 6gb-vram
18
+ - ollama
19
+ - code-assistant
20
+ - api-tools
21
+ - openai-alternative
22
+ ---
23
+
24
+ This is a packged Q8_0 only model from https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF that runs on 9-12GB VRAM without any quality loss.
25
+
26
+
27
+
28
+
29
+ <!-- ### quantize_version: 2 -->
30
+ <!-- ### output_tensor_quantised: 1 -->
31
+ <!-- ### convert_type: hf -->
32
+ <!-- ### vocab_type: -->
33
+ <!-- ### tags: -->
34
+
35
+ <!-- provided-files -->
36
+ weighted/imatrix quants are available at https://huggingface.co/mradermacher/ReSearch-Qwen-7B-i1-GGUF
37
+
38
+ </div>
39
+
40
+ <p align="center">
41
+ <img src="https://huggingface.co/agentrl/ReSearch-Qwen-7B/resolve/main/assets/intro_bar.png" width="70%" alt="Intro" />
42
+ <br>
43
+ <img src="https://huggingface.co/agentrl/ReSearch-Qwen-7B/resolve/main/assets/method.png" width="70%" alt="Method" />
44
+ </p>
45
+
46
+
47
+
48
+ For this base model DONT apply the chat completion
49
+
50
+ ## Setup
51
+
52
+ Install ollama
53
+ ```bash
54
+ curl -fsSL https://ollama.com/install.sh | sh
55
+ ```
56
+
57
+ Go into your favourite folder
58
+
59
+ ```bash
60
+ # make sure you hve Python 3.8+
61
+ # apt-get update && apt-get install libcurl build-essential curl
62
+ pip install huggingface-hub ollama
63
+ huggingface-cli download Manojb/Qwen-7B-toolcalling-ReSearch-gguf-Q8_0 --download-dir Qwen-7B-toolcalling-ReSearch-gguf-Q8_0
64
+ cd "$(find . -type d -iname '*Qwen-7B-toolcalling-ReSearch-gguf-Q8_0*' | head -n 1)"
65
+ source run_model.sh
66
+ ```
67
+
68
+ Or
69
+
70
+ ```bash
71
+ # Download and run instantly
72
+ ollama create qwen-7b:toolcall -f ModelFile
73
+ ollama run qwen-7b:toolcall # without chat completion
74
+ ```
75
+
76
+
77
+ ### Basic Function Calling
78
+ ```python
79
+ # Load with Ollama
80
+ import requests
81
+
82
+ response = requests.post('http://localhost:11434/api/generate', json={
83
+ 'model': 'qwen-7b:toolcall',
84
+ 'prompt': 'Get the current weather in San Francisco and convert to Celsius',
85
+ 'stream': False
86
+ })
87
+
88
+ print(response.json()['response'])
89
+ ```
90
+
91
+ for Instruct models:
92
+ ```bash
93
+ curl http://localhost:11434/api/chat -d '{
94
+ "model": "llama3.2",
95
+ "stream": false,
96
+ "messages": [
97
+ {"role": "system", "content": "You are a helpful assistant."},
98
+ {"role": "user", "content": "Why is the sky blue?"}
99
+ ]
100
+ }'
101
+ ```
102
+
103
+ ```python
104
+ from ollama import chat
105
+
106
+ # Your custom model name here
107
+ model_name = "qwen-7b:toolcall"
108
+
109
+ messages = [
110
+ {"role": "system", "content": "You are an instruct model."},
111
+ {"role": "user", "content": "Explain how to use this custom model in Python."}
112
+ ]
113
+
114
+ response = chat(model=model_name, messages=messages)
115
+ print(response.message.content)
116
+ ```
117
+
118
+
119
+ ***ReSearch***, a novel framework that trains LLMs to ***Re***ason with ***Search*** via reinforcement learning without using any supervised data on reasoning steps. Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning.
120
+
121
+
122
+
123
+
124
+ ## Usage
125
+
126
+ If you are unsure how to use GGUF files, refer to one of [TheBloke's
127
+ READMEs](https://huggingface.co/TheBloke/KafkaLM-70B-German-V0.1-GGUF) for
128
+ more details, including on how to concatenate multi-part files.
129
+
130
+
131
+
132
+ ## Provided Quants
133
+
134
+ (sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
135
+
136
+ | Link | Type | Size/GB | Notes |
137
+ |:-----|:-----|--------:|:------|
138
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q2_K.gguf) | Q2_K | 3.1 | |
139
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q3_K_S.gguf) | Q3_K_S | 3.6 | |
140
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q3_K_M.gguf) | Q3_K_M | 3.9 | lower quality |
141
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q3_K_L.gguf) | Q3_K_L | 4.2 | |
142
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.IQ4_XS.gguf) | IQ4_XS | 4.4 | |
143
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q4_K_S.gguf) | Q4_K_S | 4.6 | fast, recommended |
144
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q4_K_M.gguf) | Q4_K_M | 4.8 | fast, recommended |
145
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q5_K_S.gguf) | Q5_K_S | 5.4 | |
146
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q5_K_M.gguf) | Q5_K_M | 5.5 | |
147
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q6_K.gguf) | Q6_K | 6.4 | very good quality |
148
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.Q8_0.gguf) | Q8_0 | 8.2 | fast, best quality |
149
+ | [GGUF](https://huggingface.co/mradermacher/ReSearch-Qwen-7B-GGUF/resolve/main/ReSearch-Qwen-7B.f16.gguf) | f16 | 15.3 | 16 bpw, overkill |
150
+
151
+ Here is a handy graph by ikawrakow comparing some lower-quality quant
152
+ types (lower is better):
153
+
154
+ ![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)
155
+