| | --- |
| | license: mit |
| | base_model: uclanlp/plbart-multi_task-python |
| | language: |
| | - en |
| | library_name: transformers |
| | tags: |
| | - text-generation |
| | - code-generation |
| | - vulnerability-injection |
| | - security |
| | - vaitp |
| | - finetuned |
| | pretty_name: "FBogaerts/plbart-multi_task-python-Finetuned Finetuned for Vulnerability Injection" |
| | --- |
| | |
| | # FBogaerts/plbart-multi_task-python-Finetuned Finetuned for Vulnerability Injection (VAITP) |
| | |
| | This model is a fine-tuned version of **uclanlp/plbart-multi_task-python** specialized for the task of security vulnerability injection in Python code. It has been trained to follow a specific instruction format to precisely modify code snippets and introduce vulnerabilities. |
| | |
| | This model was developed as part of the research for our paper: *(coming soon)*. |
| | |
| | The VAITP CLI Framework and related resources can be found at our [GitHub repository](coming soon). |
| | |
| | ## Model Description |
| | |
| | This model was fine-tuned to act as a "Coder" LLM. It takes a specific instruction set and a piece of original Python code, and its objective is to return the modified code with the requested vulnerability injected. |
| | |
| | The model excels when prompted using the specific format it was trained on. |
| | |
| | ## Intended Uses & Limitations |
| | |
| | **Intended Use** |
| | |
| | This model is intended for research purposes in the field of automated security testing, SAST/DAST tool evaluation, and the generation of training data for security-aware models. It should be used within a sandboxed environment to inject vulnerabilities into non-production code for analysis. |
| | |
| | **Out-of-Scope Uses** |
| | |
| | This model should **NOT** be used for: |
| | - Generating malicious code for use in real-world attacks. |
| | - Directly modifying production codebases. |
| | - Any application outside of controlled, ethical security research. |
| | |
| | The generated code should always be manually reviewed before use. |
| | |
| | ## How to Use |
| | |
| | This model expects a very specific prompt format, which we call the `FINETUNED_STYLE` in our paper. The format is: |
| |
|
| | `{instruction} _BREAK_ {original_code}` |
| |
|
| | Here is an example using `transformers`: |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | |
| | model_name = "FBogaerts/plbart-multi_task-python-Finetuned" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForCausalLM.from_pretrained(model_name) |
| | |
| | instruction = "Modify the function to introduce a OS Command Injection vulnerability. The vulnerable code must contain the pattern: 'User-controlled input is used in a subprocess call with shell=True'." |
| | original_code = "import subprocess\ndef execute(cmd):\n subprocess.run(cmd, shell=False)" |
| | |
| | prompt = f"{instruction} _BREAK_ {original_code}" |
| | |
| | inputs = tokenizer(prompt, return_tensors="pt") |
| | outputs = model.generate(**inputs, max_new_tokens=256) |
| | |
| | vulnerable_code = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | # The model will output the full modified code block. |
| | # Further cleaning may be needed to extract only the code. |
| | print(vulnerable_code) |
| | ``` |
| | Training Procedure |
| |
|
| | Training Data |
| |
|
| | The model was fine-tuned on a dataset of 1,406 examples derived from the DeVAITP Vulnerability Corpus. Each example consists of a triplet: (instruction, original_code, vulnerable_code). The instructions were generated using the meta-prompting technique described in our paper, with meta-llama/Meta-Llama-3.1-8B-Instruct serving as the Planner model. |
| |
|
| | Training Hyperparameters |
| |
|
| | The model was fine-tuned using the following key hyperparameters: |
| |
|
| | Framework: Hugging Face TRL |
| | |
| | Learning Rate: 2e-5 |
| | |
| | Number of Epochs: 1 |
| | |
| | Batch Size: 1 |
| | |
| | Hardware: Google Colab (L4 GPU) |
| | |
| | Evaluation |
| |
|
| | (coming soon) |
| |
|
| | Citation |
| |
|
| | If you use this model in your research, please cite our paper: |
| | (BibTeX entry will be provided upon publication) |
| |
|
| |
|
| |
|