Spaces:
Running
Running
File size: 6,987 Bytes
b1f0789 db1624f b1f0789 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 |
# π Timeout vs Memory Diagnostic Tools
## Overview
When working with heavy models in HF Spaces, you may encounter issues that could be caused by:
1. **Timeout**: The model takes too long to load (>5 minutes)
2. **Memory**: The system runs out of RAM
3. **Both**: A combination of both issues
This toolkit helps you identify and fix the exact problem.
## π Files Added
### 1. `diagnostic_tool.py`
**Purpose**: Identify if the problem is timeout or memory
**Usage**:
```bash
python hf-spaces/diagnostic_tool.py
```
**What it does**:
- Monitors system memory in real real real-time
- Tracks model loading time
- Detects the exact failure point
- Provides specific recommendations
**Output**:
```
π MODEL LOADING DIAGNOSTIC: meta-llama/Llama-3.2-1B
π INITIAL SYSTEM STATE:
- Available memory: 12.50 GB
- Used memory: 3.45 GB (21.6%)
β³ Starting model loading (timeout: 300s)...
[1/2] Loading tokenizer...
β Tokenizer loaded in 2.31s
[2/2] Loading model...
β Model loaded in 45.67s
β
LOADING SUCCESSFUL in 47.98s
π‘ RECOMMENDATIONS
β
Model loaded successfully.
```
### 2. `config_optimized.py`
**Purpose**: Smart configuration based on model size
**Features**:
- Auto-detects model size category (small/medium/large)
- Provides optimized timeout settings
- Recommends appropriate HF Spaces tier
- Warns about memory issues before loading
**Usage**:
```python
from config_optimized import HFSpacesConfig, get_optimized_request_config
# Get optimal timeout for a model
timeout = HFSpacesConfig.get_timeout_for_model("meta-llama/Llama-3.2-1B")
# Get full request config
config = get_optimized_request_config("meta-llama/Llama-3.2-1B")
response = requests.post(url, json=payload, **config)
# Check if model is recommended for your tier
is_ok = HFSpacesConfig.is_model_recommended("meta-llama/Llama-3.2-1B", tier="free")
```
### 3. `DIAGNOSTIC_README.md`
**Purpose**: Complete guide with solutions
**Contents**:
- How to identify timeout vs memory issues
- Step-by-step solutions for each problem
- Model size comparison table
- Code examples for fixes
- Best practices
### 4. Improved Error Messages in `optipfair_frontend.py`
**What changed**:
- More informative timeout error messages
- Explicit memory error detection
- Actionable recommendations in errors
- All messages in English
**Example**:
```
β **Timeout Error:**
The request exceeded 5 minutes (300s).
**Possible causes:**
1. The model is very large and takes long to load
2. The server is processing many requests
**Solutions:**
β’ Use a smaller model (1B parameters)
β’ Wait and try again (model may be caching)
β’ If it persists, run `diagnostic_tool.py` for more information
```
## π Quick Start Guide
### Step 1: Diagnose the Problem
```bash
cd hf-spaces
python diagnostic_tool.py
```
### Step 2: Read the Output
The tool will tell you:
- β
**Success**: Model loads fine
- β **MEMORY_ERROR**: Need more RAM or smaller model
- β° **TIMEOUT_ERROR**: Need more time or faster model
### Step 3: Apply the Solution
#### For TIMEOUT problems:
```python
# Option 1: Increase timeout in optipfair_frontend.py
response = requests.post(
url,
json=payload,
timeout=600 # Change from 300 to 600 seconds
)
# Option 2: Use config_optimized.py
from config_optimized import get_optimized_request_config
config = get_optimized_request_config(model_name)
response = requests.post(url, json=payload, **config)
```
#### For MEMORY problems:
```python
# Option 1: Use smaller model
AVAILABLE_MODELS = [
"meta-llama/Llama-3.2-1B", # β
Works on free tier
"oopere/pruned40-llama-3.2-1B", # β
Works on free tier
]
# Option 2: Use quantization (in backend)
from transformers import AutoModel, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModel.from_pretrained(
model_name,
quantization_config=quantization_config,
low_cpu_mem_usage=True,
)
# Option 3: Upgrade HF Spaces tier
# Free: 16GB RAM β PRO: 32GB RAM β Enterprise: 64GB RAM
```
## π Model Recommendations by Tier
### Free Tier (16GB RAM)
β
**Recommended**:
- meta-llama/Llama-3.2-1B (~4 GB, ~30s load)
- oopere/pruned40-llama-3.2-1B (~4 GB, ~30s load)
- google/gemma-3-1b-pt (~4 GB, ~30s load)
- Qwen/Qwen3-1.7B (~6 GB, ~45s load)
β οΈ **May work with optimization**:
- meta-llama/Llama-3.2-3B (~12 GB, ~90s load)
β **Won't work**:
- meta-llama/Llama-3-8B (~32 GB)
- meta-llama/Llama-3-70B (~280 GB)
### PRO Tier (32GB RAM)
β
**Additional models**:
- meta-llama/Llama-3.2-3B
- meta-llama/Llama-3-8B (with quantization)
### Enterprise Tier (64GB RAM)
β
**Additional models**:
- meta-llama/Llama-3-8B (full precision)
- Larger models with quantization
## π― Common Scenarios
### Scenario 1: "My model times out after 5 minutes"
**Diagnosis**: TIMEOUT_ERROR
**Solution**:
1. Check if model is too large for your tier
2. Increase timeout to 600s (10 minutes)
3. Consider pre-loading models at startup
### Scenario 2: "Process crashes without clear error"
**Diagnosis**: Likely MEMORY_ERROR (Out-Of-Memory kills the process)
**Solution**:
1. Run `diagnostic_tool.py` to confirm
2. Use smaller model (1B parameters)
3. Use int8 quantization
4. Upgrade to PRO tier
### Scenario 3: "Sometimes works, sometimes doesn't"
**Diagnosis**: Memory pressure or concurrent requests
**Solution**:
1. Implement model caching
2. Add memory monitoring
3. Use smaller default model
## π οΈ Advanced: Pre-loading Models
To avoid timeout on first request, pre-load models at startup:
```python
# In hf-spaces/app.py
from transformers import AutoModel, AutoTokenizer
MODEL_CACHE = {}
def preload_models():
"""Pre-load common models at startup"""
models = ["meta-llama/Llama-3.2-1B"]
for model_name in models:
try:
print(f"Pre-loading {model_name}...")
MODEL_CACHE[model_name] = {
"model": AutoModel.from_pretrained(
model_name,
low_cpu_mem_usage=True
),
"tokenizer": AutoTokenizer.from_pretrained(model_name)
}
print(f"β {model_name} ready")
except Exception as e:
print(f"β Could not pre-load {model_name}: {e}")
def main():
preload_models() # Load models before starting services
# ... rest of startup code
```
## π Support
If you still have issues after trying these solutions:
1. Check the full diagnostic output
2. Review HF Spaces logs
3. Verify your HF Spaces tier and limits
4. Consider using a different model architecture
## π Summary
| Issue | Symptom | Solution |
|-------|---------|----------|
| **Timeout** | Request > 5 min | Increase timeout, use cache |
| **Memory** | Process crashes/kills | Smaller model, quantization, upgrade tier |
| **Both** | Slow + crashes | Smaller model + longer timeout |
All tools are designed to help you quickly identify and fix the exact problem without guessing.
|