YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π§ Machine Learning Model Comparison β Classification Project
This project compares a variety of supervised machine learning algorithms to evaluate their performance on structured classification tasks. Each model was analyzed based on speed, accuracy, and practical usability.
π Models Included
| No. | Model Name | Type |
|---|---|---|
| 1 | Logistic Regression | Linear Model |
| 2 | Random Forest | Ensemble (Bagging) |
| 3 | K-Nearest Neighbors | Instance-Based (Lazy) |
| 4 | XGBoost | Gradient Boosting |
| 5 | Support Vector Machine | Margin-based Classifier |
| 6 | ANN (MLPClassifier) | Neural Network |
| 7 | LightGBM | Gradient Boosting (Histogram) |
| 8 | Naive Bayes | Probabilistic |
π Accuracy Summary
| Model | Accuracy (%) | Speed |
|---|---|---|
| Logistic Regression | ~84% | π₯ Very Fast |
| Random Forest | ~95% | β‘ Medium |
| KNN | ~84% | π’ Slow |
| XGBoost | ~90% | β‘ Medium |
| SVM | ~85% | β‘ Medium |
| ANN (MLP) | ~51% | β‘ Medium |
| LightGBM | ~90% | π Fastest |
| Naive Bayes | ~80% | π Extremely Fast |
π§ Model Descriptions
1. Logistic Regression
- A linear model that predicts class probabilities using a sigmoid function.
- β Best for interpretable and quick binary classification.
- β Not ideal for non-linear or complex patterns.
2. Random Forest
- An ensemble of decision trees with majority voting.
- β Excellent accuracy and robustness.
- β Slower and harder to interpret than simpler models.
3. K-Nearest Neighbors (KNN)
- A lazy learner that predicts based on the nearest data points.
- β Simple and training-free.
- β Very slow for large datasets; sensitive to noise.
4. XGBoost
- A boosting algorithm that builds trees sequentially to minimize error.
- β High accuracy, regularization, built-in feature importance.
- β Slightly complex tuning; slower than simpler models.
5. Support Vector Machine (SVM)
- Separates classes by finding the maximum margin hyperplane.
- β Excellent for high-dimensional or non-linear data.
- β Doesnβt scale well; requires feature scaling.
6. ANN (MLPClassifier β sklearn)
- A basic feedforward neural network with hidden layers.
- β Capable of learning complex patterns.
- β Low accuracy in this project; needs better tuning and data scaling.
7. LightGBM
- A gradient boosting framework optimized for speed and memory.
- β Faster than XGBoost, supports categorical features directly.
- β Can overfit small datasets if not tuned well.
8. Naive Bayes (GaussianNB)
- A probabilistic classifier assuming feature independence.
- β Fastest model; works well for text and high-dimensional data.
- β Feature independence rarely true; weak for complex patterns.
π§ͺ Recommendation Summary
| Best For | Model |
|---|---|
| Highest Accuracy | Random Forest |
| Fastest Training | Naive Bayes |
| Best for Large Data | LightGBM |
| Best Baseline | Logistic Regression |
| Best for Clean Data | SVM |
| Best for Speed + Accuracy | XGBoost |
π Resources Included
- π
model.pklfiles for each classifier - π
cart.docxwith graphs, charts, and performance analysis - π§Ύ This
README.mdas the model card
For more information check cart.docx file.
π§ How to Use
from joblib import load
model = load("XGBoost_model.pkl")
prediction = model.predict(["Sample input text"])
print(prediction)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support