| --- |
| tags: |
| - autoencoder |
| - image-colorization |
| - pytorch |
| - pytorch_model_hub_mixin |
| license: apache-2.0 |
| datasets: |
| - flwrlabs/celeba |
| language: |
| - en |
| metrics: |
| - mse |
| pipeline_tag: image-to-image |
| --- |
| |
| # Model Colorization Autoencoder |
|
|
| ## Model Description |
|
|
| This autoencoder model is designed for image colorization. It takes grayscale images as input and outputs colorized versions of those images. The model architecture consists of an encoder-decoder structure, where the encoder compresses the input image into a latent representation, and the decoder reconstructs the image in color. |
|
|
| ### Architecture |
|
|
| - **Encoder**: The encoder comprises three convolutional layers followed by max pooling and ReLU activations, each paired with batch normalization. It ends with a flattening layer and a fully connected layer to produce a latent vector. |
| - **Decoder**: The decoder mirrors the encoder, using linear and transposed convolutional layers with ReLU activations and batch normalization. The final layer outputs a color image using a sigmoid activation function. |
|
|
| The architecture details are as follows: |
| ```python |
| class ModelColorization(nn.Module, PyTorchModelHubMixin): |
| def __init__(self): |
| super(ModelColorization, self).__init__() |
| self.encoder = nn.Sequential( |
| nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1), |
| nn.MaxPool2d(kernel_size=2, stride=2), |
| nn.ReLU(), |
| nn.BatchNorm2d(64), |
| nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1), |
| nn.MaxPool2d(kernel_size=2, stride=2), |
| nn.ReLU(), |
| nn.BatchNorm2d(32), |
| nn.Conv2d(32, 16, kernel_size=3, stride=1, padding=1), |
| nn.MaxPool2d(kernel_size=2, stride=2), |
| nn.ReLU(), |
| nn.BatchNorm2d(16), |
| nn.Flatten(), |
| nn.Linear(16*45*45, 4000), |
| ) |
| self.decoder = nn.Sequential( |
| nn.Linear(4000, 16 * 45 * 45), |
| nn.ReLU(), |
| nn.Unflatten(1, (16, 45, 45)), |
| nn.ConvTranspose2d(16, 32, kernel_size=3, stride=2, padding=1, output_padding=1), |
| nn.ReLU(), |
| nn.BatchNorm2d(32), |
| nn.ConvTranspose2d(32, 64, kernel_size=3, stride=2, padding=1, output_padding=1), |
| nn.ReLU(), |
| nn.BatchNorm2d(64), |
| nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1, output_padding=1), |
| nn.Sigmoid() |
| ) |
| |
| def forward(self, x): |
| x = self.encoder(x) |
| x = self.decoder(x) |
| return x |
| |
| ``` |
|
|
| ### Training Details |
| The model was trained using PyTorch for 5 epochs. Here are the training and validation losses observed during the training: |
|
|
| Epoch 1: Training Loss: 0.0063, Validation Loss: 0.0042 |
|
|
| Epoch 2: Training Loss: 0.0036, Validation Loss: 0.0035 |
|
|
| Epoch 3: Training Loss: 0.0032, Validation Loss: 0.0032 |
|
|
| Epoch 4: Training Loss: 0.0030, Validation Loss: 0.0030 |
|
|
| Epoch 5: Training Loss: 0.0029, Validation Loss: 0.0030 |
|
|
| The model demonstrated continuous improvement in reducing both training and validation loss over the epochs. |
|
|
| ### Usage |
| You can load the model from the Hugging Face Hub using the following code: |
|
|
| ```python |
| # Ensure you have the necessary dependencies installed: |
| pip install torch torchvision transformers |
| |
| from transformers import AutoModel |
| |
| model = AutoModel.from_pretrained("sebastiansarasti/AutoEncoderImageColorization") |
| ``` |