coolAI commited on
Commit
5bed6c0
Β·
verified Β·
1 Parent(s): ee53ad6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +165 -10
README.md CHANGED
@@ -1,19 +1,174 @@
1
  ---
 
2
  tags:
3
- - gguf
4
- - llama.cpp
5
  - unsloth
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  ---
8
 
9
- # precis-gguf - GGUF
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
- This model was finetuned and converted to GGUF format using [Unsloth](https://github.com/unslothai/unsloth).
 
12
 
13
- **Example usage**:
14
- - For text only LLMs: **llama-cli** **--hf** repo_id/model_name **-p** "why is the sky blue?"
15
- - For multimodal models: **llama-mtmd-cli** **-m** model_name.gguf **--mmproj** mmproj_file.gguf
16
 
17
- ## Available Model files:
18
- - `granite-4.0-h-micro.Q8_0.gguf`
19
- - `granite-4.0-h-micro.Q4_K_M.gguf`
 
1
  ---
2
+ base_model: unsloth/granite-4.0-h-micro
3
  tags:
4
+ - text-generation-inference
5
+ - transformers
6
  - unsloth
7
+ - granitemoehybrid
8
+ - trl
9
+ license: apache-2.0
10
+ language:
11
+ - en
12
+ ---
13
+
14
+ # Precis: Document Summarization
15
+
16
+ ## Model Overview
17
+
18
+ **Precis** is a specialized document summarization model fine-tuned from IBM's Granite 4.0-H-Micro (3.2B parameters) using efficient LoRA adapters. It generates comprehensive ~300-word summaries optimized for question-answering capability while maintaining complete privacy through local, on-premise processing.
19
+
20
+ **Key Features:**
21
+ - πŸ”’ **Privacy-First**: Process sensitive documents entirely on your infrastructure
22
+ - ⚑ **Fast**: 0.5s inference time (5-10x faster than cloud APIs)
23
+ - πŸ’° **Cost-Effective**: Zero per-document API fees
24
+ - πŸ“š **Long Context**: 128K tokens β‰ˆ 320-380 book pages
25
+ - 🎯 **Specialized**: Trained on 5,500+ document-summary pairs, processed millions of tokens during training
26
+
27
+
28
+ ## πŸš€ Quick Start
29
+
30
+ ### Using with Transformers + PEFT
31
+
32
+ ```python
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+ from peft import PeftModel
35
+ import torch
36
+
37
+ # Load base model
38
+ base_model = AutoModelForCausalLM.from_pretrained(
39
+ "unsloth/granite-4.0-h-micro",
40
+ torch_dtype=torch.float16,
41
+ device_map="auto"
42
+ )
43
+
44
+ # Load LoRA adapters
45
+ model = PeftModel.from_pretrained(base_model, "cernis-intelligence/precis")
46
+ tokenizer = AutoTokenizer.from_pretrained("cernis-intelligence/precis")
47
+
48
+ # Generate summary
49
+ document = """Your long document here..."""
50
+
51
+ messages = [
52
+ {"role": "user", "content": f"Summarize the following document in around 300 words:\n\n{document}"}
53
+ ]
54
+
55
+ inputs = tokenizer.apply_chat_template(
56
+ messages,
57
+ tokenize=True,
58
+ add_generation_prompt=True,
59
+ return_tensors="pt"
60
+ ).to(model.device)
61
+
62
+ outputs = model.generate(
63
+ inputs,
64
+ max_new_tokens=512,
65
+ temperature=0.3,
66
+ top_p=0.9,
67
+ do_sample=True
68
+ )
69
+
70
+ summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
71
+ print(summary)
72
+ ```
73
+
74
+ ### Using with Unsloth (Recommended)
75
+
76
+ ```python
77
+ from unsloth import FastLanguageModel
78
+
79
+ model, tokenizer = FastLanguageModel.from_pretrained(
80
+ model_name="cernis-intelligence/precis",
81
+ max_seq_length=2048,
82
+ load_in_4bit=True, # For lower memory usage
83
+ )
84
+
85
+ FastLanguageModel.for_inference(model)
86
+
87
+ messages = [
88
+ {"role": "user", "content": f"Summarize the following document in around 300 words:\n\n{document}"}
89
+ ]
90
+
91
+ inputs = tokenizer.apply_chat_template(
92
+ messages,
93
+ tokenize=True,
94
+ add_generation_prompt=True,
95
+ return_tensors="pt"
96
+ ).to("cuda")
97
+
98
+ outputs = model.generate(inputs, max_new_tokens=512, temperature=0.3)
99
+ summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
100
+ ```
101
+
102
+ ### Using with vLLM (Production)
103
+
104
+ ```python
105
+ from vllm import LLM, SamplingParams
106
+ from vllm.lora.request import LoRARequest
107
+
108
+ # Initialize vLLM with base model
109
+ llm = LLM(
110
+ model="unsloth/granite-4.0-h-micro",
111
+ enable_lora=True,
112
+ max_lora_rank=32,
113
+ gpu_memory_utilization=0.9
114
+ )
115
+
116
+ # Create LoRA request
117
+ lora_request = LoRARequest(
118
+ "precis-granite",
119
+ 1,
120
+ "cernis-intelligence/precis"
121
+ )
122
+
123
+ # Sampling parameters
124
+ sampling_params = SamplingParams(
125
+ temperature=0.3,
126
+ top_p=0.9,
127
+ max_tokens=512
128
+ )
129
+
130
+ # Generate
131
+ prompts = ["Summarize the following document in around 300 words:\n\n" + document]
132
+ outputs = llm.generate(prompts, sampling_params, lora_request=lora_request)
133
+
134
+ print(outputs[0].outputs[0].text)
135
+ ```
136
 
137
  ---
138
 
139
+ ## πŸ“Š Training Details
140
+
141
+ ### Base Model
142
+ - **Architecture**: IBM Granite 4.0-H-Micro
143
+ - **Parameters**: 3.2B (38.4M trainable via LoRA)
144
+ - **Context Length**: 128K tokens
145
+ - **License**: Apache 2.0
146
+
147
+ ## 🎯 Use Cases
148
+
149
+ ### βœ… Perfect For:
150
+ - πŸ“„ **Legal Document Review**: Summarize contracts while maintaining confidentiality
151
+ - πŸ₯ **Medical Records**: HIPAA-compliant summarization of patient notes
152
+ - πŸ’Ό **Financial Reports**: Analyze earnings reports without exposing sensitive data
153
+ - πŸ“š **Research Papers**: Quick digests of academic literature
154
+ - πŸ“§ **Email Threads**: Comprehensive summaries of long conversations
155
+
156
+ ### ⚠️ Considerations:
157
+ - Works best with documents under 380 pages (128K token limit)
158
+ - Optimized for English text (multilingual support coming)
159
+ - May miss some deeply nested structured data (tables, forms)
160
+ - For specialized needs, consider fine-tuning on domain-specific data
161
+
162
+ πŸ“„ License
163
+
164
+ This model is released under the **Apache 2.0 License**, same as the base IBM Granite 4.0 model.
165
 
166
+ ```
167
+ Copyright 2025
168
 
169
+ Licensed under the Apache License, Version 2.0 (the "License");
170
+ you may not use this file except in compliance with the License.
171
+ You may obtain a copy of the License at
172
 
173
+ http://www.apache.org/licenses/LICENSE-2.0
174
+ ```