GPT2-Alpaca-4bit

This model is a fine-tuned version of openai-community/gpt2 on the tatsu-lab/alpaca dataset.

It was trained using QLoRA (4-bit quantization + LoRA) to follow instructions.

Model Description

  • Model Type: Causal Language Model
  • Base Model: GPT-2
  • Dataset: Alpaca (Instruction Tuning)
  • Language: English
  • Training Method: QLoRA (4-bit quantization via bitsandbytes + peft)

How to Use

To use this model, you need to load the base GPT-2 model in 4-bit precision and then attach the trained LoRA adapters.

Installation

pip install transformers torch peft bitsandbytes accelerate

Inference Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, PeftConfig

# 1. Load the Base Model (GPT-2) with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model_id = "openai-community/gpt2"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id, 
    quantization_config=bnb_config, 
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# 2. Load the LoRA Adapters
peft_model_id = "estradax/gpt2-alpaca-4bit"
model = PeftModel.from_pretrained(model, peft_model_id)

# 3. Run Inference
text = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is the capital of France?\n\n### Response:\n"
inputs = tokenizer(text, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for estradax/gpt2-alpaca-4bit

Adapter
(1647)
this model

Dataset used to train estradax/gpt2-alpaca-4bit