SPEC-CLIP-ViT-B-32

Model Sources

Model Usage

download checkpoint

huggingface-cli download wjpoom/SPEC-CLIP-ViT-B-32 --local-dir checkpoints/SPEC-CLIP-ViT-B-32

load model

# pip install open_clip_torch
import torch
from PIL import Image
import open_clip

model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained='checkpoints/SPEC-CLIP-ViT-B-32', load_weights_only=False)
model.eval()  # model in train mode by default, impacts some models with BatchNorm or stochastic depth active
tokenizer = open_clip.get_tokenizer('ViT-B-32')

image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])

with torch.no_grad(), torch.autocast("cuda"):
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)  # prints: [[1., 0., 0.]]

Contact

Feel free to contact us if you have any questions or suggestions

Email (Wujian Peng): wjpeng24@m.fudan.edu.cn

Citation

@inproceedings{peng2024synthesize,
  title={Synthesize diagnose and optimize: Towards fine-grained vision-language understanding},
  author={Peng, Wujian and Xie, Sicheng and You, Zuyao and Lan, Shiyi and Wu, Zuxuan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={13279--13288},
  year={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Absolute Size I2T on SPEC
SPEC paper

68.900
Absolute Size T2I on SPEC
SPEC paper

60.700
Relative Size I2T on SPEC
SPEC paper

40.300
Relative Size T2I on SPEC
SPEC paper

44.100
Absolute Position I2T on SPEC
SPEC paper

30.600
Absolute Position T2I on SPEC
SPEC paper

34.200
Relative Position I2T on SPEC
SPEC paper

46.600
Relative Position T2I on SPEC
SPEC paper

46.900
Existence I2T on SPEC
SPEC paper

83.400
Existence T2I on SPEC
SPEC paper

53.100