A simple ViT model based on google/vit-base-patch16-224-in21k to classify a given image into KF, NonKF, or Rejected categories. It has been finetuned on a private dataset of 11.4gb of images, 58,781 total.
Real world accuracy of the model is decent, I wouldn't use it to decide whether or not images are deleted or not, but it is good enough to sort your saved Kemono Friends fanart images. It was not explicitly trained on cosplay, real world, or other non-fanart KF images.
The model was trained to 8 epochs, this model being the 5th. Around the 6th epoch and beyond, it stopped learning. Currently the goal for the model is to further train it on more NonKF images, due to the size of that category being so small.
This model was trained using the code on this repo. I trained it locally on my 7900xtx using the provided code and configs. The repo also has some tools if you are interested in creating your own finetuned ViT model.
- Downloads last month
- 41