Tourism Package Prediction Model

Model Description

This model predicts whether a customer will purchase a Wellness Tourism Package from "Visit with Us" travel company. It uses XGBoost classifier with a custom preprocessing pipeline to handle both numeric and categorical features.

Intended Use

Primary Use: Identify potential customers for the Wellness Tourism Package to optimize marketing outreach and improve conversion rates.

Users: Sales and marketing teams at travel companies.

Out-of-scope: This model should not be used for discriminatory purposes or decisions that could significantly impact individuals' lives beyond marketing preferences.

Training Data

  • Dataset: Tourism package purchase history
  • Features: 18 features including customer demographics, travel preferences, and sales interaction data
    • 12 numeric features (Age, CityTier, MonthlyIncome, etc.)
    • 6 categorical features (Gender, Occupation, Designation, etc.)
  • Target: Binary classification (ProdTaken: 0 = No purchase, 1 = Purchase)
  • Training Set: 3302 samples
  • Test Set: 826 samples
  • Class Imbalance: Handled using scale_pos_weight parameter

Model Architecture

Algorithm: XGBoost Classifier with sklearn preprocessing pipeline

Preprocessing:

  • Numeric features: Passthrough (no transformation)
  • Nominal categorical features: OrdinalEncoder
  • Ordinal feature (Designation): OrdinalEncoder with hierarchy (Executive β†’ Manager β†’ Senior Manager β†’ AVP β†’ VP)

Best Hyperparameters:

  • colsample_bylevel: 0.6
  • colsample_bytree: 0.6
  • learning_rate: 0.15
  • max_depth: 5
  • n_estimators: 250
  • reg_lambda: 0.5

Classification Threshold: 0.45 (optimized for F1-score)

Performance Metrics

Training Set

  • Accuracy: 0.9942
  • Precision: 0.9711
  • Recall: 1.0000
  • F1-Score: 0.9853

Test Set

  • Accuracy: 0.9286
  • Precision: 0.8165
  • Recall: 0.8113
  • F1-Score: 0.8139

How to Use

import joblib
import pandas as pd
from huggingface_hub import hf_hub_download

# Download the model
model_path = hf_hub_download(
    repo_id="nsriram78/tourism-package-prediction",
    filename="tourism_conversion_predict_model.joblib",
    repo_type="model"
)

# Load the model
model = joblib.load(model_path)

# Prepare input data (must match training feature order)
input_data = pd.DataFrame([{
    'Age': 35,
    'CityTier': 1,
    'DurationOfPitch': 15,
    'NumberOfPersonVisiting': 3,
    'NumberOfFollowups': 3,
    'PreferredPropertyStar': 4.0,
    'NumberOfTrips': 3,
    'Passport': 1,
    'PitchSatisfactionScore': 3,
    'OwnCar': 1,
    'NumberOfChildrenVisiting': 1,
    'MonthlyIncome': 22000,
    'TypeofContact': 'Self Enquiry',
    'Occupation': 'Salaried',
    'Gender': 'Male',
    'ProductPitched': 'Basic',
    'MaritalStatus': 'Married',
    'Designation': 'Manager'
}])

# Get prediction probability
prediction_proba = model.predict_proba(input_data)[0, 1]

# Apply custom threshold
prediction = (prediction_proba >= 0.45).astype(int)

print(f"Purchase Probability: {prediction_proba:.2%}")
print(f"Prediction: {'Will Purchase' if prediction == 1 else 'Will Not Purchase'}")

Training Procedure

  1. Data Preparation: 80/20 train-test split with stratification
  2. Hyperparameter Tuning: GridSearchCV with 5-fold cross-validation
  3. Optimization Metric: F1-Score (to balance precision and recall)
  4. Experiment Tracking: MLflow for logging parameters and metrics

Limitations and Considerations

  • The model is trained on historical data and may not generalize to significantly different customer populations
  • Performance depends on data quality and feature completeness
  • Class imbalance handled but may still affect predictions on minority class
  • Custom threshold of 0.45 optimized for current dataset; may need adjustment for different use cases
  • Model assumes input features are in the exact order and format as training data

Ethical Considerations

  • Ensure model is used responsibly for marketing purposes only
  • Regularly monitor for bias in predictions across different demographic groups
  • Respect customer privacy and comply with data protection regulations
  • Provide opt-out mechanisms for customers who don't wish to be contacted

Model Card Authors

Sriram Narasimhan

Model Card Contact

For questions or issues, please open an issue in the model repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using nsriram78/tourism-package-prediction 1