GPT2-Alpaca-4bit
This model is a fine-tuned version of openai-community/gpt2 on the tatsu-lab/alpaca dataset.
It was trained using QLoRA (4-bit quantization + LoRA) to follow instructions.
Model Description
- Model Type: Causal Language Model
- Base Model: GPT-2
- Dataset: Alpaca (Instruction Tuning)
- Language: English
- Training Method: QLoRA (4-bit quantization via
bitsandbytes+peft)
How to Use
To use this model, you need to load the base GPT-2 model in 4-bit precision and then attach the trained LoRA adapters.
Installation
pip install transformers torch peft bitsandbytes accelerate
Inference Code
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, PeftConfig
# 1. Load the Base Model (GPT-2) with 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
base_model_id = "openai-community/gpt2"
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# 2. Load the LoRA Adapters
peft_model_id = "estradax/gpt2-alpaca-4bit"
model = PeftModel.from_pretrained(model, peft_model_id)
# 3. Run Inference
text = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is the capital of France?\n\n### Response:\n"
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 19
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for estradax/gpt2-alpaca-4bit
Base model
openai-community/gpt2