PerceptCLIP
/

PerceptCLIP_IQA

computer_vision

perceptual_tasks

Model card Files Files and versions

Amitz244 commited on Mar 20

Commit

e69c065

·

verified ·

1 Parent(s): 694091d

Create README.md

Files changed (1) hide show

README.md +81 -0

README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+language:
+- en
+base_model:
+- openai/clip-vit-large-patch14
+tags:
+- IQA
+- computer_vision
+- perceptual_tasks
+- CLIP
+- KonIQ-10k
+---
+**PerceptCLIP-IQA** is a model designed to predict **image quality assessment (IQA) score**. This is the official model from the paper:
+📄 **["Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks"](https://arxiv.org/abs/2503.13260)**.
+We apply **LoRA adaptation** on the **CLIP visual encoder** and add an **MLP head** for IQA score prediction. Our model achieves **state-of-the-art results**.
+## Training Details
+- *Dataset*: [KonIQ-10k](https://arxiv.org/pdf/1910.06180)
+- *Architecture*: CLIP Vision Encoder (ViT-L/14) with *LoRA adaptation*
+- *Loss Function*: Pearson correlation induced loss \( L_{PLCC} = \frac{1}{2} (1 - PLCC(\tilde{y}, y)) \)
+- *Optimizer*: AdamW
+- *Learning Rate*: 5e-05
+- *Batch Size*: 32
+## Installation & Requirements
+You can set up the environment using environment.yml or manually install dependencies:
+- python=3.9.15
+- cudatoolkit=11.7
+- torchvision=0.14.0
+- transformers=4.45.2
+- peft=0.14.0
+## Usage
+To use the model for inference:
+```python
+from torchvision import transforms
+import torch
+from PIL import Image
+from huggingface_hub import hf_hub_download
+import importlib.util
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Load the model class definition dynamically
+class_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_IQA", filename="modeling.py")
+spec = importlib.util.spec_from_file_location("modeling", class_path)
+modeling = importlib.util.module_from_spec(spec)
+spec.loader.exec_module(modeling)
+# initialize a model
+ModelClass = modeling.clip_lora_model
+model = ModelClass().to(device)
+# Load pretrained model
+model_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_IQA", filename="perceptCLIP_IQA.pth")
+model.load_state_dict(torch.load(model_path, map_location=device))
+model.eval()
+# Load an image
+image = Image.open("image_path.jpg").convert("RGB")
+# Preprocess and predict
+def IQA_preprocess():
+    transform = transforms.Compose([
+        transforms.Resize(224),
+        transforms.CenterCrop(size=(224, 224)),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073),
+                             std=(0.26862954, 0.26130258, 0.27577711))
+    ])
+    return transform
+image = IQA_preprocess()(image).unsqueeze(0).to(device)
+with torch.no_grad():
+    iqa_score = model(image).item()
+print(f"Predicted quality Score: {iqa_score:.4f}")