beingamanforever commited on
Commit
c482d55
·
verified ·
1 Parent(s): 2dba87c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -3
README.md CHANGED
@@ -1,6 +1,76 @@
1
- That's a great step! A professional Model Card is essential for Hugging Face.
2
 
3
- I will generate a formal, well-structured `README.md` file that clearly documents your ICM module, including its architecture, training setup, and how others can use it.
4
 
 
5
 
6
- http://googleusercontent.com/immersive_entry_chip/0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Task Classification Model (ICM)
2
 
3
+ ## Model Description
4
 
5
+ A BERT-based sequence classification model that routes computer vision questions to appropriate specialized modules. Classifies questions into 4 task categories: VQA, Captioning, Grounding, and Geometry.
6
 
7
+ - **Repository:** beingamanforever/ICM
8
+ - **Base Model:** bert-base-uncased
9
+ - **Task:** 4-way Sequence Classification
10
+
11
+ ## Labels
12
+
13
+ | ID | Label | Description |
14
+ |---|---|---|
15
+ | 0 | vqa | Visual Question Answering ("What color is the car?") |
16
+ | 1 | captioning | Image Description ("Describe the sunset.") |
17
+ | 2 | grounding | Object Localization ("Find the person in the image.") |
18
+ | 3 | geometry | Spatial/Metric Queries ("Calculate the area of the red box.") |
19
+
20
+ ## Architecture
21
+
22
+ BERT-Base encoder + 3-layer MLP classifier on [CLS] token:
23
+
24
+ - Layer 1: Linear(768 → 256) + ReLU + Dropout(0.1)
25
+ - Layer 2: Linear(256 → 128) + ReLU + Dropout(0.1)
26
+ - Layer 3: Linear(128 → 4)
27
+
28
+ ## Training
29
+
30
+ | Hyperparameter | Value |
31
+ |---|---|
32
+ | Samples | 1,600 (400 per class) |
33
+ | Epochs | 5 |
34
+ | Learning Rate | 2e-5 |
35
+ | Batch Size | 32 |
36
+ | Optimizer | AdamW |
37
+ | Loss | Cross Entropy |
38
+
39
+ **Data:** Synthetic questions from balanced JSON files (vqa_qs.json, captioning_qs.json, grounding_qs.json, geometry_qs.json)
40
+
41
+ ## Usage
42
+ ```python
43
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
44
+ import torch
45
+
46
+ model_name = "beingamanforever/ICM"
47
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
48
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
49
+
50
+ questions = [
51
+ "What is the distance between the two trees?",
52
+ "Describe what the child is wearing.",
53
+ "Is the traffic light green?",
54
+ "Box the location of the blue umbrella."
55
+ ]
56
+
57
+ inputs = tokenizer(questions, return_tensors="pt", padding=True, truncation=True)
58
+ with torch.no_grad():
59
+ logits = model(**inputs).logits
60
+ predictions = torch.argmax(logits, dim=-1)
61
+
62
+ for q, pred in zip(questions, predictions):
63
+ print(f"{q} → {model.config.id2label[pred.item()]}")
64
+ ```
65
+
66
+ ## Limitations
67
+
68
+ - **Synthetic Training Data:** May not generalize to complex real-world queries
69
+ - **Text-Only:** Processes questions without image context
70
+ - **Domain Scope:** Optimized for vision task routing, not general NLP classification
71
+
72
+ ## Intended Use
73
+
74
+ - Automatic query routing in multimodal AI pipelines
75
+ - VQA dataset analysis and taxonomy studies
76
+ - Educational demonstrations of vision task classification