ywlee88 commited on 26 days ago

Commit

c0ae4dc

verified ·

1 Parent(s): 64b8719

Initial release of SafeGem-27B with Visual Guard Module

Browse files

Files changed (28) hide show

.gitattributes +1 -0
LICENSE.md +110 -0
README.md +259 -0
added_tokens.json +3 -0
chat_template.json +3 -0
config.json +93 -0
configuration_safegem.py +59 -0
generation_config.json +13 -0
model-00001-of-00012.safetensors +3 -0
model-00002-of-00012.safetensors +3 -0
model-00003-of-00012.safetensors +3 -0
model-00004-of-00012.safetensors +3 -0
model-00005-of-00012.safetensors +3 -0
model-00006-of-00012.safetensors +3 -0
model-00007-of-00012.safetensors +3 -0
model-00008-of-00012.safetensors +3 -0
model-00009-of-00012.safetensors +3 -0
model-00010-of-00012.safetensors +3 -0
model-00011-of-00012.safetensors +3 -0
model-00012-of-00012.safetensors +3 -0
model.safetensors.index.json +0 -0
modeling_safegem.py +301 -0
preprocessor_config.json +29 -0
processor_config.json +4 -0
special_tokens_map.json +33 -0
tokenizer.json +3 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

LICENSE.md ADDED Viewed

	@@ -0,0 +1,110 @@

+# License for SafeGem
+This SafeGem project is governed by a **hybrid license model**. This license file defines the distinct licensing policies that apply to the two main components of this project.
+---
+## 1. Model Name and Definition of Work
+**Model Name**: SafeGem
+**Reference Publication**: This model (SafeGem) is the official model presented in the academic paper _"HoliSafe: Holistic Safety Benchmarking and Modeling for Vision-Language Model"_ ([arXiv:2506.04704](https://arxiv.org/abs/2506.04704)).
+**Naming and Derivation**: The name "SafeGem" signifies its dual nature:
+- **Safe**: For its safety-driven enhancements (the Visual Guard Module)
+- **Gem**: As an abbreviation of "Gemma". We use "Gem" instead of "Gemma" to comply with Google's Gemma Terms of Use and trademark policies, which prohibit the use of "Gemma" in derivative model names
+**Model Composition**: SafeGem is a **Derivative Work** based on Google's Gemma-3-12B-IT model. It integrates an independently developed Visual Guard Module (VGM) to classify harmful image inputs and generate safe text responses.
+---
+## 2. License Summary
+| Component | License |
+|-----------|---------|
+| **Independently Developed Code** (e.g., VGM) | [Apache License 2.0](#part-1-apache-license-20-for-independently-developed-code) |
+| **Gemma-Based Components and Entire Model** | [Google's Gemma Terms of Use](#part-2-gemma-terms-of-use-for-the-gemma-based-derivative-work) |
+---
+## Part 1: Apache License 2.0 (For Independently Developed Code)
+All original source code and components developed independently by **Electronics and Telecommunications Research Institute (ETRI)** (hereinafter "Copyright Holder"), including the Visual Guard Module (VGM) contained in this project, are subject to the **Apache License, Version 2.0** (the "License").
+You may not use this file except in compliance with the License. You may obtain a copy of the License at:
+**http://www.apache.org/licenses/LICENSE-2.0**
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
+### Copyright Notice
+```
+Copyright 2025 Electronics and Telecommunications Research Institute (ETRI)
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+```
+---
+## Part 2: Gemma Terms of Use (For the Gemma-Based Derivative Work)
+This SafeGem model is a **Derivative Work** based on Google's Gemma-3-12B-IT model.
+Therefore, the use, reproduction, modification, and distribution of Gemma-based components, including the weights of the SafeGem model, are subject to **[Google's Gemma Terms of Use](https://ai.google.dev/gemma/terms)**.
+Any user who uses, reproduces, modifies, or distributes the SafeGem model is considered to agree to all provisions of the Gemma Terms of Use, including but not limited to the following restrictions:
+### Key Restrictions
+- **Prohibited Uses**: The model must not be used for any purposes outlined in the [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy)
+- **Commercial Use Restrictions**: The model cannot be used in a manner that competes with Google or Google's products or services
+### Required Notices
+In accordance with the Distribution Requirements of the Gemma Terms of Use, the following notices are provided:
+#### Copy of Terms
+The full text of the Gemma Terms of Use must be reviewed at the following official link:
+- **Gemma Terms of Use**: https://ai.google.dev/gemma/terms
+- **Prohibited Use Policy**: https://ai.google.dev/gemma/prohibited_use_policy
+#### Modification Notice
+This model (SafeGem) is a **modification** of the original Google Gemma-3-12B-IT model.
+#### Attribution
+The original model was developed by **Google**.
+Copyright 2024 Google LLC.
+#### No Endorsement
+This derivative model (SafeGem) is **not endorsed or officially supported by Google**.
+---
+## Part 3: Attribution and Contact
+This SafeGem model was developed by the **Electronics and Telecommunications Research Institute (ETRI)** in the Republic of Korea.
+For any questions regarding the SafeGem model or its licensing, please contact:
+**Youngwan Lee**
+Email: yw.lee@etri.re.kr
+---
+## Summary
+- **Visual Guard Module (VGM)** and independently developed code → Apache 2.0
+- **Entire SafeGem model** (including Gemma-based weights) → Google's Gemma Terms of Use
+- Users must comply with **both** licenses when using SafeGem

README.md ADDED Viewed

	@@ -0,0 +1,259 @@

+---
+base_model: google/gemma-3-27b-it
+tags:
+- vision
+- multimodal
+- safety
+- content-moderation
+- gemma3
+- image-classification
+- vision-language
+license: apache-2.0
+language:
+- en
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+# SafeGem-27B: Vision-Language Model with Visual Guard Module
+[**🌐 Website**](https://youngwanlee.github.io/holisafe) | [**📑 Paper**](https://www.arxiv.org/pdf/2506.04704)
+<div align="center">
+  <img src="https://dl.dropbox.com/scl/fi/soi772p6sig2tx16f092o/arch.jpg?rlkey=uj4ver4pp889oowigqld502hc&dl=1" width="1024px" />
+</div>
+SafeGem-27B is a safe multimodal large language model that extends [Gemma-3-27B-IT](https://huggingface.co/google/gemma-3-27b-it) with built-in image safety classification capabilities. It can simultaneously generate text responses to visual questions while classifying potentially unsafe image content across 20 safety categories.
+> **Note on Naming**: We named our model 'SafeGem' instead of 'SafeGemma3' to comply with Google's Gemma Terms of Use and trademark policies, abbreviating 'Gemma' to 'Gem' in the name.
+## Model Description
+- **Base Model**: Gemma-3-27B-IT
+- **Architecture**: Vision-language model with Visual Guard Module (VGM)
+- **Training Data**: HoliSafe train set
+- **Training Method**: LoRA fine-tuning
+- **Parameters**: 27B (base) + VGM
+- **Safety Categories**: 20 categories based on HoliSafe taxonomy
+## Key Features
+1. **Multimodal Understanding**: Processes images and text for comprehensive visual understanding
+2. **Safety Classification**: Identifies unsafe content in images across 20 categories
+3. **Non-invasive Architecture**: Maintains full Gemma-3 capabilities while adding safety features
+4. **End-to-end Training**: VGM is jointly trained with the vision-language model
+## Safety Categories
+The model classifies images into the following 20 safety categories:
+| Category ID | Category Name |
+|------------|---------------|
+| 0 | Safe |
+| 1 | Gender discrimination |
+| 2 | Race discrimination |
+| 3 | Religion discrimination |
+| 4 | Harassment |
+| 5 | Disability discrimination |
+| 6 | Drug Related Hazards |
+| 7 | Property crime |
+| 8 | Facial data exposure |
+| 9 | Identity data exposure |
+| 10 | Physical self-injury |
+| 11 | Suicide |
+| 12 | Animal abuse |
+| 13 | Obscene gestures |
+| 14 | Physical altercation |
+| 15 | Terrorism |
+| 16 | Weapon-related violence |
+| 17 | Sexual content |
+| 18 | Financial advice |
+| 19 | Medical advice |
+## Installation
+```bash
+pip install transformers torch pillow
+```
+## Usage
+### Basic Inference with Safety Classification
+```python
+import torch
+from transformers import AutoModel, AutoProcessor
+from PIL import Image
+import requests
+# Load model and processor
+model = AutoModel.from_pretrained(
+    "etri-vilab/SafeGem-27B",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True
+)
+processor = AutoProcessor.from_pretrained("google/gemma-3-27b-it")
+# Prepare input
+url = "https://dl.dropbox.com/scl/fi/fkb6g5hame1wnip6983qx/test_guns.png?rlkey=l1rs5s1yg4akr29ife1v9my03&dl=1"
+image = Image.open(requests.get(url, stream=True).raw)
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image"},
+            {"type": "text", "text": "How to use this?"},
+        ],
+    }
+]
+# Process inputs
+text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = processor(
+    text=[text],
+    images=[image],
+    padding=True,
+    return_tensors="pt",
+).to(model.device)
+# Forward pass with safety classification
+with torch.no_grad():
+    outputs = model(**inputs, do_safety=True)
+# Get safety predictions
+safety_probs = outputs.img_safety_probs[0]  # [20] probabilities
+safety_categories = model.config.safety_categories
+print("\nSafety Classification Results:")
+for category, prob in zip(safety_categories, safety_probs):
+    if prob > 0.1:  # Show categories with >10% probability
+        print(f"  {category}: {prob:.3f}")
+# >>> Safety Classification Results:
+# >>> weapon_related_violence: 1.000
+# Generate text response
+generated_ids = model.generate(**inputs, max_new_tokens=256)
+generated_ids_trimmed = [
+    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+]
+output_text = processor.batch_decode(
+    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
+)
+print("\nModel Response:")
+print(output_text[0])
+# >>> Model Response:
+# >>> I'm sorry, I can't assist with that request. The image provided is considered harmful due to its depiction of a firearm. Providing guidance or information on the use of weapons can be dangerous and is not something I can support. It's important to prioritize safety and adhere to legal regulations regarding firearms. If you have any concerns or questions about safety, please reach out to a qualified professional or local authorities.
+```
+### Text Generation Only (Without Safety Classification)
+```python
+# Set do_safety=False to skip safety classification during generation
+generated_ids = model.generate(**inputs, max_new_tokens=256, do_safety=False)
+```
+## Model Architecture
+SafeGem-27B consists of:
+1. **Base Vision-Language Model**: Standard Gemma-3 architecture
+2. **Visual Guard Module (a.k.a. safety head)**:
+   - Input: Pooled image token features from last hidden layer
+   - Architecture: Multi-layer perceptron (MLP)
+   - Hidden size: 0.5 × model hidden size (1920 for 27B model)
+   - Output: 20-dimensional logits for safety categories
+The VGM operates on pooled image features extracted from the model's hidden states, ensuring minimal interference with the base model's text generation capabilities.
+## Training Details
+- **Training Data**: HoliSafe train dataset
+- **Training Epochs**: 7
+- **LoRA Configuration**:
+  - Rank: 64
+  - Alpha: 64
+  - Target modules: Language model attention and MLP layers
+- **Learning Rates**:
+  - Base model: 5e-5
+  - Safety head: 5e-5
+  - Vision tower: 5e-5
+- **Safety Loss Weight**: 2.0
+- **Optimizer**: AdamW
+- **Mixed Precision**: BF16
+Please see the full details in the paper.
+### Device Handling
+When using `device_map="auto"`, always ensure inputs are moved to the model's device:
+```python
+# ✓ Correct - move inputs to model device
+inputs = processor(...).to(model.device)
+outputs = model(**inputs, do_safety=True)
+# ✗ Incorrect - may cause device mismatch errors
+inputs = processor(...)  # inputs on CPU
+outputs = model(**inputs, do_safety=True)  # model on GPU
+```
+This is especially important when using safety classification (`do_safety=True`), as the model needs to access input_ids on the same device as the hidden states.
+## Ethical Considerations
+This model is designed to assist in identifying potentially unsafe visual content. It should be used responsibly:
+- Do not rely solely on this model for critical safety decisions
+- Be aware of potential biases in safety classifications
+- Regularly evaluate model performance on your specific use case
+- Combine with human review for important content moderation tasks
+## License
+SafeGem is governed by a hybrid license model:
+1. **Independently Developed Code (Visual Guard Module)**: Licensed under [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0)
+   - All original source code developed by ETRI, including the Visual Guard Module (VGM)
+2. **Gemma-Based Components and Entire Model**: Subject to [Google's Gemma Terms of Use](https://ai.google.dev/gemma/terms)
+   - The entire SafeGem model, including weights derived from Google Gemma-3-27B-IT
+**Model Composition**: SafeGem is a derivative work based on Google's Gemma-3-IT model, integrating an independently developed Visual Guard Module (VGM) to classify harmful image inputs and generate safe text responses.
+For complete license details, please see the [LICENSE.md](LICENSE.md) file in this repository.
+## Citation
+If you use SafeGem in your research, please cite:
+```bibtex
+@article{lee2025holisafe,
+  title={HoliSafe: Holistic Safety Benchmarking and Modeling for Vision-Language Model},
+  author={Lee, Youngwan and Kim, Kangsan and Park, Kwanyong and Jung, Ilcahe and Jang, Soojin and Lee, Seanie and Lee, Yong-Ju and Hwang, Sung Ju},
+  journal={arXiv preprint arXiv:2506.04704},
+  year={2025},
+  url={https://arxiv.org/abs/2506.04704},
+  archivePrefix={arXiv},
+  eprint={2506.04704},
+  primaryClass={cs.AI},
+}
+```
+## Acknowledgments
+- Built on [Gemma-3](https://huggingface.co/google/gemma-3-27b-it) by Google
+- Trained on [HoliSafe](https://youngwanlee.github.io/holisafe/) multimodal safety dataset
+This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. RS-2022-00187238, Development of Large Korean Language Model Technology for Efficient Pre-training, 45%), (No. 2022-0-00871, Development of AI Autonomy and Knowledge Enhancement for AI Agent Collaboration, 45%) and (No.2019-0-00075, Artificial Intelligence Graduate School Program(KAIST), 10%).
+## Contact
+For questions, issues, or feedback, please open an issue on repository or contact the team directly.
+> 📬 E-mail: yw.lee@etri.re.kr

added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "<image_soft_token>": 262144
+}

chat_template.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "chat_template": "{{ bos_token }}\n{%- if messages[0]['role'] == 'system' -%}\n    {%- if messages[0]['content'] is string -%}\n        {%- set first_user_prefix = messages[0]['content'] + '\n\n' -%}\n    {%- else -%}\n        {%- set first_user_prefix = messages[0]['content'][0]['text'] + '\n\n' -%}\n    {%- endif -%}\n    {%- set loop_messages = messages[1:] -%}\n{%- else -%}\n    {%- set first_user_prefix = \"\" -%}\n    {%- set loop_messages = messages -%}\n{%- endif -%}\n{%- for message in loop_messages -%}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}\n        {{ raise_exception(\"Conversation roles must alternate user/assistant/user/assistant/...\") }}\n    {%- endif -%}\n    {%- if (message['role'] == 'assistant') -%}\n        {%- set role = \"model\" -%}\n    {%- else -%}\n        {%- set role = message['role'] -%}\n    {%- endif -%}\n    {{ '<start_of_turn>' + role + '\n' + (first_user_prefix if loop.first else \"\") }}\n    {%- if message['content'] is string -%}\n        {{ message['content'] | trim }}\n    {%- elif message['content'] is iterable -%}\n        {%- for item in message['content'] -%}\n            {%- if item['type'] == 'image' -%}\n                {{ '<start_of_image>' }}\n            {%- elif item['type'] == 'text' -%}\n                {{ item['text'] | trim }}\n            {%- endif -%}\n        {%- endfor -%}\n    {%- else -%}\n        {{ raise_exception(\"Invalid content type\") }}\n    {%- endif -%}\n    {{ '<end_of_turn>\n' }}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n    {{'<start_of_turn>model\n'}}\n{%- endif -%}\n"
+}

config.json ADDED Viewed

	@@ -0,0 +1,93 @@

+{
+  "architectures": [
+    "SafeGemForConditionalGeneration"
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_safegem.SafeGemConfig",
+    "AutoModel": "modeling_safegem.SafeGemForConditionalGeneration",
+    "AutoModelForCausalLM": "modeling_safegem.SafeGemForConditionalGeneration"
+  },
+  "boi_token_index": 255999,
+  "eoi_token_index": 256000,
+  "eos_token_id": [
+    1,
+    106
+  ],
+  "image_token_index": 262144,
+  "initializer_range": 0.02,
+  "mm_tokens_per_image": 256,
+  "model_type": "safegem",
+  "num_safety_categories": 20,
+  "safety_categories": [
+    "safe",
+    "gender",
+    "race",
+    "religion",
+    "harassment",
+    "disability_discrimination",
+    "drug_crime",
+    "property_crime",
+    "facial_data",
+    "identity_data",
+    "physical_self_injury",
+    "suicide",
+    "animal_abuse",
+    "obscene_gestures",
+    "physical_altercation",
+    "terrorism",
+    "weapon_related_violence",
+    "sexual_content",
+    "financial_advice",
+    "medical_advice"
+  ],
+  "safety_head_hidden_scale": 0.5,
+  "safety_loss_lambda": 1.0,
+  "safety_num_hidden_layers": 1,
+  "text_config": {
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "attn_logit_softcapping": null,
+    "cache_implementation": "hybrid",
+    "final_logit_softcapping": null,
+    "head_dim": 128,
+    "hidden_activation": "gelu_pytorch_tanh",
+    "hidden_size": 5376,
+    "initializer_range": 0.02,
+    "intermediate_size": 21504,
+    "max_position_embeddings": 131072,
+    "model_type": "gemma3_text",
+    "num_attention_heads": 32,
+    "num_hidden_layers": 62,
+    "num_key_value_heads": 16,
+    "query_pre_attn_scalar": 168,
+    "rms_norm_eps": 1e-06,
+    "rope_local_base_freq": 10000.0,
+    "rope_scaling": {
+      "factor": 8.0,
+      "rope_type": "linear"
+    },
+    "rope_theta": 1000000.0,
+    "sliding_window": 1024,
+    "sliding_window_pattern": 6,
+    "torch_dtype": "bfloat16",
+    "use_cache": true,
+    "vocab_size": 262208
+  },
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "vision_config": {
+    "attention_dropout": 0.0,
+    "hidden_act": "gelu_pytorch_tanh",
+    "hidden_size": 1152,
+    "image_size": 896,
+    "intermediate_size": 4304,
+    "layer_norm_eps": 1e-06,
+    "model_type": "siglip_vision_model",
+    "num_attention_heads": 16,
+    "num_channels": 3,
+    "num_hidden_layers": 27,
+    "patch_size": 14,
+    "torch_dtype": "bfloat16",
+    "vision_use_head": false
+  }
+}

configuration_safegem.py ADDED Viewed

	@@ -0,0 +1,59 @@

+"""
+SafeGem Configuration
+Configuration class for SafeGem models with safety classification capabilities.
+"""
+from typing import Optional, List
+from transformers import Gemma3Config
+class SafeGemConfig(Gemma3Config):
+    """
+    Configuration for SafeGem model.
+    This configuration class extends Gemma3Config with safety-specific parameters.
+    """
+    model_type = "safegem"
+    def __init__(
+        self,
+        # Safety specific parameters
+        safety_categories: Optional[List[str]] = None,
+        safety_head_hidden_scale: float = 1.0,
+        safety_loss_lambda: float = 1.0,
+        safety_num_hidden_layers: int = 1,
+        num_safety_categories: int = 20,
+        **kwargs
+    ):
+        super().__init__(**kwargs)
+        # HoliSafe 20-category safety taxonomy
+        self.safety_categories = safety_categories or [
+            "safe",
+            "gender",
+            "race",
+            "religion",
+            "harassment",
+            "disability_discrimination",
+            "drug_crime",
+            "property_crime",
+            "facial_data",
+            "identity_data",
+            "physical_self_injury",
+            "suicide",
+            "animal_abuse",
+            "obscene_gestures",
+            "physical_altercation",
+            "terrorism",
+            "weapon_related_violence",
+            "sexual_content",
+            "financial_advice",
+            "medical_advice"
+        ]
+        self.safety_head_hidden_scale = safety_head_hidden_scale
+        self.safety_loss_lambda = safety_loss_lambda
+        self.safety_num_hidden_layers = safety_num_hidden_layers
+        self.num_safety_categories = num_safety_categories or len(self.safety_categories)

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "bos_token_id": 2,
+  "cache_implementation": "hybrid",
+  "do_sample": true,
+  "eos_token_id": [
+    1,
+    106
+  ],
+  "pad_token_id": 0,
+  "top_k": 64,
+  "top_p": 0.95,
+  "transformers_version": "4.51.3"
+}

model-00001-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:58b7d38bd4d309a50cc7a7b0a8eabe003843a8c7b16216543a6dd401b7c139c0
+size 4854573240

model-00002-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d37cde3880151c7e5a8f2084e734951c336c55308e3c90bd1fb315ef7e0c2f2
+size 4954792864

model-00003-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8ae847d503be16bd95a3a76d1952b9e46868ae915de3263da3c3f8ad7ac91af
+size 4954792896

model-00004-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:20ba58029c7349fde42fc66729249be52359dcafe597dab542c8956a3dacffda
+size 4954792944

model-00005-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:178f0b7529982196e2a86b0d49a69f6ec2dba6b9bcf8939dabcafbf58be42abd
+size 4954792944

model-00006-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fe63cd911af1f31f1d88518c74ccc8aef4efb64adbb77d723ea67d1f19e45f94
+size 4954792944

model-00007-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d07a8c45ec83d1d50e98ffb0013c536e41d6c71a5e06292fc71b53c77bac7e18
+size 4954792944

model-00008-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e3d3cc29533593b90052d544d22796205424116465d06fded03b70a32f4898d
+size 4954792944

model-00009-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d6196029f2c867fe3ee129cabfddd9d6ed8b494213dc6bb0ae9934628fbf9b9e
+size 4954792944

model-00010-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7090e0d3e16a0469fef3404b386e9cc8f64a705b92f2ca55dd3445c0d90f4166
+size 4954792944

model-00011-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7110e543d74314fd67d5ab4f11dd350c85eab17ef2254a2a0962e312514b7d43
+size 4954792944

model-00012-of-00012.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:69a5e6ea580fbb718631057737128953a94a76bb3dc4c695ed5b2a897807aa90
+size 491491384

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

modeling_safegem.py ADDED Viewed

	@@ -0,0 +1,301 @@

+"""
+SafeGem: Vision-Language Model with Visual Guard Module
+This implementation extends Gemma3ForConditionalGeneration with image safety classification
+capabilities using a pooling-based approach for safety feature extraction.
+"""
+import torch
+import torch.nn as nn
+from typing import Optional, Tuple, List, Union
+from dataclasses import dataclass
+from transformers.modeling_outputs import CausalLMOutputWithPast
+from transformers import Gemma3ForConditionalGeneration
+from transformers.utils import logging
+from .configuration_safegem import SafeGemConfig
+logger = logging.get_logger(__name__)
+local_rank = None
+def rank0_print(*args):
+    if local_rank == 0 or local_rank == '0' or local_rank is None:
+        print(*args)
+@dataclass
+class SafeGemOutput(CausalLMOutputWithPast):
+    """
+    Output class for SafeGem with safety classification results.
+    """
+    loss: Optional[torch.FloatTensor] = None
+    logits: Optional[torch.FloatTensor] = None
+    past_key_values: Optional[List[torch.FloatTensor]] = None
+    hidden_states: Optional[Tuple[torch.FloatTensor]] = None
+    attentions: Optional[Tuple[torch.FloatTensor]] = None
+    image_hidden_states: Optional[torch.FloatTensor] = None
+    img_safety_logits: Optional[torch.FloatTensor] = None
+    img_safety_probs: Optional[torch.FloatTensor] = None
+class SafetyMLP(nn.Module):
+    """
+    Multi-layer perceptron for safety classification (Visual Guard Module).
+    """
+    def __init__(
+        self,
+        input_size: int,
+        hidden_size: int,
+        output_size: int,
+        num_hidden_layers: int = 1
+    ):
+        super().__init__()
+        layers = []
+        # First layer
+        layers.append(nn.Linear(input_size, hidden_size))
+        layers.append(nn.GELU())
+        layers.append(nn.Dropout(0.1))
+        # Additional hidden layers
+        for _ in range(num_hidden_layers - 1):
+            layers.append(nn.Linear(hidden_size, hidden_size))
+            layers.append(nn.GELU())
+            layers.append(nn.Dropout(0.1))
+        # Output layer
+        layers.append(nn.Linear(hidden_size, output_size))
+        self.mlp = nn.Sequential(*layers)
+        # Initialize weights
+        self.apply(self._init_weights)
+    def _init_weights(self, module):
+        if isinstance(module, nn.Linear):
+            torch.nn.init.xavier_uniform_(module.weight)
+            if module.bias is not None:
+                torch.nn.init.constant_(module.bias, 0)
+    def forward(self, x):
+        return self.mlp(x)
+class SafeGemForConditionalGeneration(Gemma3ForConditionalGeneration):
+    """
+    SafeGem model with Visual Guard Module for image safety classification.
+    This model extends Gemma3ForConditionalGeneration with:
+    1. Visual Guard Module (VGM) - a safety classification head
+    2. Pooling-based safety feature extraction from image tokens
+    3. Simultaneous text generation and safety classification
+    Key design principles:
+    - Minimal modification to base Gemma3 forward pass
+    - Extract safety features from visual tokens using mean pooling
+    - Non-invasive architecture that maintains full base model capabilities
+    """
+    config_class = SafeGemConfig
+    def __init__(self, config: SafeGemConfig):
+        super().__init__(config)
+        # Add safety head (Visual Guard Module) if safety configuration is present
+        num_safety_categories = getattr(config, 'num_safety_categories', None)
+        if num_safety_categories and num_safety_categories > 0:
+            hidden_size = config.text_config.hidden_size
+            safety_head_hidden_scale = getattr(config, 'safety_head_hidden_scale', 1.0)
+            safety_hidden_size = int(hidden_size * safety_head_hidden_scale)
+            safety_num_hidden_layers = getattr(config, 'safety_num_hidden_layers', 1)
+            rank0_print(f"🔧 [INIT] Initializing Visual Guard Module: {hidden_size} -> {safety_hidden_size} -> {num_safety_categories}")
+            self.img_safety_head = SafetyMLP(
+                input_size=hidden_size,
+                hidden_size=safety_hidden_size,
+                output_size=num_safety_categories,
+                num_hidden_layers=safety_num_hidden_layers
+            )
+        else:
+            rank0_print(f"🔧 [INIT] No safety configuration found, Visual Guard Module not initialized")
+            self.img_safety_head = None
+    def _extract_image_features_pooling(
+        self,
+        hidden_states: torch.Tensor,
+        attention_mask: Optional[torch.Tensor] = None,
+        input_ids: Optional[torch.Tensor] = None,
+        image_hidden_states: Optional[torch.Tensor] = None
+    ) -> Optional[torch.Tensor]:
+        """
+        Extract image features using pooling over visual tokens.
+        Args:
+            hidden_states: [batch_size, seq_len, hidden_size]
+            attention_mask: [batch_size, seq_len]
+            input_ids: [batch_size, seq_len]
+            image_hidden_states: [batch_size, num_images, num_patches, hidden_size]
+        Returns:
+            image_features: [batch_size, hidden_size] or None
+        """
+        # First try to use image_hidden_states if available (from vision tower)
+        if image_hidden_states is not None:
+            # Handle different shapes of image_hidden_states
+            if len(image_hidden_states.shape) == 3:
+                # [batch_size, num_patches, hidden_size]
+                batch_size, num_patches, hidden_size = image_hidden_states.shape
+                # Mean over patches: [batch_size, hidden_size]
+                pooled_features = image_hidden_states.mean(dim=1)
+                return pooled_features
+            elif len(image_hidden_states.shape) == 4:
+                # [batch_size, num_images, num_patches, hidden_size]
+                batch_size, num_images, num_patches, hidden_size = image_hidden_states.shape
+                # Mean over patches: [batch_size, num_images, hidden_size]
+                pooled_per_image = image_hidden_states.mean(dim=2)
+                # Mean over images: [batch_size, hidden_size]
+                pooled_features = pooled_per_image.mean(dim=1)
+                rank0_print(f"🔧 [POOL] 4D pooled features shape: {pooled_features.shape}")
+                return pooled_features
+            else:
+                rank0_print(f"🔧 [POOL] Unexpected image_hidden_states shape: {image_hidden_states.shape}")
+                return None
+        # Fallback: return None if no image_hidden_states
+        if input_ids is None:
+            rank0_print("🔧 [POOL] No input_ids available for image token detection")
+            return None
+        rank0_print("🔧 [POOL] No image_hidden_states available, cannot extract image features")
+        return None
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[List[torch.FloatTensor]] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        labels: Optional[torch.LongTensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        pixel_values: Optional[torch.FloatTensor] = None,
+        return_dict: Optional[bool] = None,
+        do_safety: bool = True,  # Default to True for training, can be overridden for generation
+        safety_labels: Optional[torch.LongTensor] = None,
+        **kwargs
+    ) -> Union[Tuple, SafeGemOutput]:
+        """
+        Forward pass with optional safety classification.
+        Args:
+            do_safety: Whether to perform safety classification (default: True)
+            All other args: Same as Gemma3ForConditionalGeneration
+        Returns:
+            SafeGemOutput with optional safety classification results
+        """
+        # Force output_hidden_states if we need safety classification
+        # BUT only during initial forward pass, not during generation
+        if do_safety and self.img_safety_head is not None and past_key_values is None:
+            output_hidden_states = True
+            return_dict = True
+        # Standard Gemma3 forward pass - NO MODIFICATIONS
+        outputs = super().forward(
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            position_ids=position_ids,
+            past_key_values=past_key_values,
+            inputs_embeds=inputs_embeds,
+            labels=labels,
+            use_cache=use_cache,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            pixel_values=pixel_values,
+            return_dict=True,
+            **kwargs
+        )
+        # Fix NaN/Inf in logits if present
+        if outputs.logits is not None:
+            nan_count = torch.isnan(outputs.logits).sum()
+            inf_count = torch.isinf(outputs.logits).sum()
+            if nan_count > 0 or inf_count > 0:
+                if past_key_values is None:
+                    print(f"[CRITICAL] Found NaN or Inf in logits! NaN count: {nan_count}, Inf count: {inf_count}")
+                replacement_values = torch.randn_like(outputs.logits) * 0.001
+                outputs.logits = torch.where(
+                    torch.isnan(outputs.logits) | torch.isinf(outputs.logits),
+                    replacement_values,
+                    outputs.logits
+                )
+            # Fix logits shape if needed
+            if len(outputs.logits.shape) == 4 and outputs.logits.shape[1] == 1:
+                outputs.logits = outputs.logits.squeeze(1)
+        # Initialize safety outputs
+        img_safety_logits = None
+        img_safety_probs = None
+        # Check if we should perform safety classification
+        is_generation = past_key_values is not None
+        has_images = pixel_values is not None
+        should_do_safety = (
+            do_safety and
+            self.img_safety_head is not None and
+            (outputs.hidden_states is not None or outputs.image_hidden_states is not None) and
+            has_images and
+            not is_generation
+        )
+        if should_do_safety:
+            # Extract image features
+            image_features = self._extract_image_features_pooling(
+                hidden_states=outputs.hidden_states[-1] if outputs.hidden_states else None,
+                attention_mask=attention_mask,
+                input_ids=input_ids,
+                image_hidden_states=outputs.image_hidden_states
+            )
+            if image_features is not None:
+                # Run through Visual Guard Module
+                img_safety_logits = self.img_safety_head(image_features)
+                img_safety_probs = torch.softmax(img_safety_logits, dim=-1)
+            else:
+                rank0_print("🔧 [SafeGem] ❌ Image features extraction failed")
+        # Return results
+        if return_dict is False:
+            output = (outputs.loss, outputs.logits, outputs.past_key_values,
+                     outputs.hidden_states, outputs.attentions)
+            if img_safety_logits is not None:
+                output += (img_safety_logits, img_safety_probs)
+            return output
+        else:
+            # During generation, return standard output
+            if is_generation or past_key_values is not None:
+                return outputs
+            else:
+                # During training/inference, return custom output with safety info
+                return SafeGemOutput(
+                    loss=outputs.loss,
+                    logits=outputs.logits,
+                    past_key_values=outputs.past_key_values,
+                    hidden_states=outputs.hidden_states,
+                    attentions=outputs.attentions,
+                    image_hidden_states=outputs.image_hidden_states,
+                    img_safety_logits=img_safety_logits,
+                    img_safety_probs=img_safety_probs
+                )

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "do_convert_rgb": null,
+  "do_normalize": true,
+  "do_pan_and_scan": null,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_processor_type": "Gemma3ImageProcessor",
+  "image_seq_length": 256,
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "pan_and_scan_max_num_crops": null,
+  "pan_and_scan_min_crop_size": null,
+  "pan_and_scan_min_ratio_to_activate": null,
+  "processor_class": "Gemma3Processor",
+  "resample": 2,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 896,
+    "width": 896
+  }
+}

processor_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "image_seq_length": 256,
+  "processor_class": "Gemma3Processor"
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<eos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
+size 33384568

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
+size 4689074

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff