Maaac
/

CodeLLaMA-Linux-BugFix

@@ -1,17 +1,42 @@
-  ---
-  license: mit
-  tags:
-    - codellama
-    - linux
-    - bugfix
-    - lora
-    - qlora
-    - git-diff
-  base_model: codellama/CodeLLaMA-7b-Instruct-hf
-  model_type: LlamaForCausalLM
-  library_name: peft
-  pipeline_tag: text-generation
-  ---
   # CodeLLaMA-Linux-BugFix
@@ -323,3 +348,329 @@
   - **v1.0.0**: Initial release with QLoRA training
   - **v1.1.0**: Added parallel dataset extraction
   - **v1.2.0**: Improved evaluation metrics and documentation

+---
+license: mit
+tags:
+  - codellama
+  - linux
+  - bugfix
+  - lora
+  - qlora
+  - git-diff
+base_model: codellama/CodeLLaMA-7b-Instruct-hf
+model_type: LlamaForCausalLM
+library_name: peft
+pipeline_tag: text-generation
+model-index:
+- name: CodeLLaMA-Linux-BugFix
+  results:
+  - task:
+      type: text-generation
+      name: Bug-fix Patch Generation
+    dataset:
+      type: custom
+      name: Linux Kernel Bugfix Commits
+      config: linux-bugfix-prompt-completion
+      split: test
+    metrics:
+      - type: bleu
+        value: 33.87
+        name: BLEU
+      - type: rouge1
+        value: 0.4355
+        name: ROUGE-1 F1
+      - type: rouge2
+        value: 0.3457
+        name: ROUGE-2 F1
+      - type: rougeL
+        value: 0.3612
+        name: ROUGE-L F1
+---
   # CodeLLaMA-Linux-BugFix
   - **v1.0.0**: Initial release with QLoRA training
   - **v1.1.0**: Added parallel dataset extraction
   - **v1.2.0**: Improved evaluation metrics and documentation
+=======
+---
+license: mit
+tags:
+  - codellama
+  - linux
+  - bugfix
+  - lora
+  - qlora
+  - git-diff
+base_model: codellama/CodeLLaMA-7b-Instruct-hf
+model_type: LlamaForCausalLM
+library_name: peft
+pipeline_tag: text-generation
+---
+# CodeLLaMA-Linux-BugFix
+A fine-tuned version of `CodeLLaMA-7B-Instruct`, designed specifically for Linux kernel bug fixing using QLoRA (Quantized Low-Rank Adaptation). The model learns to generate Git diff patches based on buggy C code and commit messages.
+---
+## 🎯 Overview
+This project targets automated Linux kernel bug fixing by:
+- **Mining real commit data** from the kernel Git history
+- **Training a specialized QLoRA model** on diff-style fixes
+- **Generating Git patches** in response to bug-prone code
+- **Evaluating results** using BLEU, ROUGE, and human inspection
+The model achieves strong performance in generating accurate Linux kernel bug fixes, making it a valuable tool for automated code review and bug detection.
+---
+## 📊 Performance Results
+### Evaluation Metrics
+✅ **BLEU Score**: 33.87
+✅ **ROUGE Scores**:
+- **ROUGE-1**: P=0.3775, R=0.7306, F1=0.4355
+- **ROUGE-2**: P=0.2898, R=0.6096, F1=0.3457
+- **ROUGE-L**: P=0.3023, R=0.6333, F1=0.3612
+These results demonstrate the model's ability to:
+- Generate syntactically correct Git diff patches
+- Maintain semantic similarity to reference fixes
+- Produce meaningful code changes that address the underlying bugs
+---
+## 🧠 Model Configuration
+- **Base model**: `CodeLLaMA-7B-Instruct`
+- **Fine-tuning method**: QLoRA with 4-bit quantization
+- **Training setup**:
+  - LoRA r=64, alpha=16, dropout=0.1
+  - Batch size: 64, LR: 2e-4, Epochs: 3
+  - Mixed precision (bfloat16), gradient checkpointing
+- **Hardware**: Optimized for NVIDIA H200 GPUs
+---
+## 📊 Dataset
+Custom dataset extracted from Linux kernel Git history.
+### Filtering Criteria
+Bug-fix commits containing:
+`fix`, `bug`, `crash`, `memory`, `null`, `panic`, `overflow`, `race`, `corruption`, etc.
+### Structure
+- Language: C (`.c`, `.h`)
+- Context: 10 lines before/after the change
+- Format:
+```json
+{
+  "input": {
+    "original code": "C code snippet with bug",
+    "instruction": "Commit message or fix description"
+  },
+  "output": {
+    "diff codes": "Git diff showing the fix"
+  }
+}
+```
+* **File**: `training_data_100k.jsonl` (100,000 samples)
+---
+## 🚀 Quick Start
+### Prerequisites
+- Python 3.8+
+- CUDA-compatible GPU (recommended)
+- 16GB+ RAM
+- 50GB+ disk space
+### Install dependencies
+```bash
+pip install -r requirements.txt
+```
+### 1. Build the Dataset
+```bash
+cd dataset_builder
+python extract_linux_bugfixes_parallel.py
+python format_for_training.py
+```
+### 2. Fine-tune the Model
+```bash
+cd train
+python train_codellama_qlora_linux_bugfix.py
+```
+### 3. Run Evaluation
+```bash
+cd evaluate
+python evaluate_linux_bugfix_model.py
+```
+### 4. Use the Model
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from peft import PeftModel
+# Load the fine-tuned model
+model = AutoModelForCausalLM.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
+model = PeftModel.from_pretrained(model, "train/output/qlora-codellama-bugfix")
+tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLLaMA-7b-Instruct-hf")
+# Generate a bug fix
+prompt = """
+Given the following original C code:
+if (!file->filter)
+    return;
+Instruction: Fix the null pointer dereference
+Return the diff that fixes it:
+"""
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=512, temperature=0.1)
+fix = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(fix)
+```
+---
+## 📁 Project Structure
+```
+CodeLLaMA-Linux-BugFix/
+├── dataset_builder/
+│   ├── extract_linux_bugfixes_parallel.py    # Parallel extraction of bug fixes
+│   ├── format_for_training.py                # Format data for training
+│   └── build_dataset.py                      # Main dataset builder
+├── dataset/
+│   ├── training_data_100k.jsonl              # 100K training samples
+│   └── training_data_prompt_completion.jsonl # Formatted training data
+├── train/
+│   ├── train_codellama_qlora_linux_bugfix.py # Main training script
+│   ├── train_codellama_qlora_simple.py       # Simplified training
+│   ├── download_codellama_model.py           # Model download utility
+│   └── output/
+│       └── qlora-codellama-bugfix/           # Trained model checkpoints
+├── evaluate/
+│   ├── evaluate_linux_bugfix_model.py        # Evaluation script
+│   ├── test_samples.jsonl                    # Test dataset
+│   └── output/                               # Evaluation results
+│       ├── eval_results.csv                  # Detailed results
+│       └── eval_results.json                 # JSON format results
+├── requirements.txt                          # Python dependencies
+├── README.md                                 # This file
+└── PROJECT_STRUCTURE.md                      # Detailed project overview
+```
+---
+## 🧩 Features
+* 🔧 **Efficient Fine-tuning**: QLoRA + 4-bit quant = massive memory savings
+* 🧠 **Real-world commits**: From actual Linux kernel development
+* 💡 **Context-aware**: Code context extraction around bug lines
+* 💻 **Output-ready**: Generates valid Git-style diffs
+* 📈 **Strong Performance**: BLEU score of 33.87 with good ROUGE metrics
+* 🚀 **Production-ready**: Optimized for real-world deployment
+---
+## 📈 Evaluation Metrics
+* **BLEU**: Translation-style match to reference diffs
+* **ROUGE**: Overlap in fix content and semantic similarity
+* **Human Evaluation**: Subjective patch quality assessment
+### Current Performance
+- **BLEU Score**: 33.87 (excellent for code generation tasks)
+- **ROUGE-1 F1**: 0.4355 (good semantic overlap)
+- **ROUGE-2 F1**: 0.3457 (reasonable bigram matching)
+- **ROUGE-L F1**: 0.3612 (good longest common subsequence)
+---
+## 🧪 Use Cases
+* **Automated kernel bug fixing**: Generate fixes for common kernel bugs
+* **Code review assistance**: Help reviewers identify potential issues
+* **Teaching/debugging kernel code**: Educational tool for kernel development
+* **Research in automated program repair (APR)**: Academic research applications
+* **CI/CD integration**: Automated testing and fixing in development pipelines
+---
+## 🔬 Technical Highlights
+### Memory & Speed Optimizations
+* 4-bit quantization (NF4)
+* Gradient checkpointing
+* Mixed precision (bfloat16)
+* Gradient accumulation
+* LoRA parameter efficiency
+### Training Efficiency
+* **QLoRA**: Reduces memory usage by ~75%
+* **4-bit quantization**: Further memory optimization
+* **Gradient checkpointing**: Trades compute for memory
+* **Mixed precision**: Faster training with maintained accuracy
+---
+## 🛠️ Advanced Usage
+### Custom Training
+```bash
+# Train with custom parameters
+python train_codellama_qlora_linux_bugfix.py \
+    --learning_rate 1e-4 \
+    --num_epochs 5 \
+    --batch_size 32 \
+    --lora_r 32 \
+    --lora_alpha 16
+```
+### Evaluation on Custom Data
+```bash
+# Evaluate on your own test set
+python evaluate_linux_bugfix_model.py \
+    --test_file your_test_data.jsonl \
+    --output_dir custom_eval_results
+```
+---
+## 🤝 Contributing
+1. Fork this repo
+2. Create a feature branch (`git checkout -b feature/amazing-feature`)
+3. Commit your changes (`git commit -m 'Add amazing feature'`)
+4. Push to the branch (`git push origin feature/amazing-feature`)
+5. Open a Pull Request 🙌
+### Development Guidelines
+- Follow PEP 8 style guidelines
+- Add tests for new features
+- Update documentation for API changes
+- Ensure all tests pass before submitting PR
+---
+## 📄 License
+MIT License – see `LICENSE` file for details.
+---
+## 🙏 Acknowledgments
+* **Meta** for CodeLLaMA base model
+* **Hugging Face** for Transformers + PEFT libraries
+* **The Linux kernel community** for open access to commit data
+* **Microsoft** for introducing LoRA technique
+* **University of Washington** for QLoRA research
+---
+## 📚 References
+* [CodeLLaMA (Meta, 2023)](https://arxiv.org/abs/2308.12950)
+* [QLoRA (Dettmers et al., 2023)](https://arxiv.org/abs/2305.14314)
+* [LoRA (Hu et al., 2021)](https://arxiv.org/abs/2106.09685)
+* [Automated Program Repair: A Survey](https://ieeexplore.ieee.org/document/8449519)
+---
+## 📞 Support
+For questions, issues, or contributions:
+- Open an issue on GitHub
+- Check the project documentation
+- Review the evaluation results in `evaluate/output/`
+---
+## 🔄 Version History
+- **v1.0.0**: Initial release with QLoRA training
+- **v1.1.0**: Added parallel dataset extraction
+- **v1.2.0**: Improved evaluation metrics and documentation