YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

🧠 Machine Learning Model Comparison – Classification Project

This project compares a variety of supervised machine learning algorithms to evaluate their performance on structured classification tasks. Each model was analyzed based on speed, accuracy, and practical usability.


πŸ“Œ Models Included

No. Model Name Type
1 Logistic Regression Linear Model
2 Random Forest Ensemble (Bagging)
3 K-Nearest Neighbors Instance-Based (Lazy)
4 XGBoost Gradient Boosting
5 Support Vector Machine Margin-based Classifier
6 ANN (MLPClassifier) Neural Network
7 LightGBM Gradient Boosting (Histogram)
8 Naive Bayes Probabilistic

πŸ“Š Accuracy Summary

Model Accuracy (%) Speed
Logistic Regression ~84% πŸ”₯ Very Fast
Random Forest ~95% ⚑ Medium
KNN ~84% 🐒 Slow
XGBoost ~90% ⚑ Medium
SVM ~85% ⚑ Medium
ANN (MLP) ~51% ⚑ Medium
LightGBM ~90% πŸš€ Fastest
Naive Bayes ~80% πŸš€ Extremely Fast

🧠 Model Descriptions


1. Logistic Regression

  • A linear model that predicts class probabilities using a sigmoid function.
  • βœ… Best for interpretable and quick binary classification.
  • ❌ Not ideal for non-linear or complex patterns.

2. Random Forest

  • An ensemble of decision trees with majority voting.
  • βœ… Excellent accuracy and robustness.
  • ❌ Slower and harder to interpret than simpler models.

3. K-Nearest Neighbors (KNN)

  • A lazy learner that predicts based on the nearest data points.
  • βœ… Simple and training-free.
  • ❌ Very slow for large datasets; sensitive to noise.

4. XGBoost

  • A boosting algorithm that builds trees sequentially to minimize error.
  • βœ… High accuracy, regularization, built-in feature importance.
  • ❌ Slightly complex tuning; slower than simpler models.

5. Support Vector Machine (SVM)

  • Separates classes by finding the maximum margin hyperplane.
  • βœ… Excellent for high-dimensional or non-linear data.
  • ❌ Doesn’t scale well; requires feature scaling.

6. ANN (MLPClassifier – sklearn)

  • A basic feedforward neural network with hidden layers.
  • βœ… Capable of learning complex patterns.
  • ❌ Low accuracy in this project; needs better tuning and data scaling.

7. LightGBM

  • A gradient boosting framework optimized for speed and memory.
  • βœ… Faster than XGBoost, supports categorical features directly.
  • ❌ Can overfit small datasets if not tuned well.

8. Naive Bayes (GaussianNB)

  • A probabilistic classifier assuming feature independence.
  • βœ… Fastest model; works well for text and high-dimensional data.
  • ❌ Feature independence rarely true; weak for complex patterns.

πŸ§ͺ Recommendation Summary

Best For Model
Highest Accuracy Random Forest
Fastest Training Naive Bayes
Best for Large Data LightGBM
Best Baseline Logistic Regression
Best for Clean Data SVM
Best for Speed + Accuracy XGBoost

πŸ“Ž Resources Included

  • πŸ“ model.pkl files for each classifier
  • πŸ“„ cart.docx with graphs, charts, and performance analysis
  • 🧾 This README.md as the model card

For more information check cart.docx file.

πŸ”§ How to Use

from joblib import load
model = load("XGBoost_model.pkl")
prediction = model.predict(["Sample input text"])
print(prediction)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support