YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π§ Fine-tuned TinyLlama QA Model β With QLoRA Adapters This project fine-tunes the TinyLlama-1.1B-Chat-v1.0 model using a custom QA dataset and deploys it as a memory-efficient chatbot via QLoRA (Quantized LoRA). Ideal for low-resource environments without sacrificing response quality.
π¦ Base Model
- Name: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- Size: ~1.1B parameters
- Architecture: Causal Language Model (Chat format)
- Token Format:
[INST] Question [/INST] Answer - Smaller size (1.1B) for fast training & inference
- Chat instruction tuning
- Compatibility with LoRA + 4-bit quantization
π§ Training Strategy
| Feature | Value |
|---|---|
| Quantization | 4-bit (bnb nf4) |
| Optimizer Backend | bitsandbytes |
| Gradient Accumulation | 4 steps |
| Token Format | Chat template / INST |
| Epochs | 10 |
| Batch Size | 2 |
| Mixed Precision | FP16 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, |
| gate_proj, up_proj, down_proj |
Key Features: β QLoRA: 4-bit quantized base model + 16-bit LoRA adapters. β Memory Efficient: ~10MB adapter weights (vs. full 1.1B model). β Early Stopping: Patience=3 to prevent overfitting.
π― Inference Modes
| Mode | Trigger | Description |
|---|---|---|
| Exact Match | QA.json question match |
Static response |
| Partial Match | Substring match with known questions | Approximated answer |
| Generative | No match, use model | Generates via TinyLlama + QLoRA adapters |
Benefits:
- Only adapter weights are updated
- Base model remains untouched
- TinyLlama + QLoRA = π₯ High performance + low resource usage
π Model Summary
| Component | Detail |
|---|---|
| Base Model | TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
| Architecture | Causal LM (Transformer) |
| Adapters | QLoRA (r=32) |
| Max Sequence | 512 tokens |
| Fine-Tuning | QLoRA (4-bit NF4 + LoRA adapters) |
| Format | |
| Output | Personal QA generation/chatbot |
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support