Tourism Package Prediction Model
Model Description
This model predicts whether a customer will purchase a Wellness Tourism Package from "Visit with Us" travel company. It uses XGBoost classifier with a custom preprocessing pipeline to handle both numeric and categorical features.
Intended Use
Primary Use: Identify potential customers for the Wellness Tourism Package to optimize marketing outreach and improve conversion rates.
Users: Sales and marketing teams at travel companies.
Out-of-scope: This model should not be used for discriminatory purposes or decisions that could significantly impact individuals' lives beyond marketing preferences.
Training Data
- Dataset: Tourism package purchase history
- Features: 18 features including customer demographics, travel preferences, and sales interaction data
- 12 numeric features (Age, CityTier, MonthlyIncome, etc.)
- 6 categorical features (Gender, Occupation, Designation, etc.)
- Target: Binary classification (ProdTaken: 0 = No purchase, 1 = Purchase)
- Training Set: 3302 samples
- Test Set: 826 samples
- Class Imbalance: Handled using
scale_pos_weightparameter
Model Architecture
Algorithm: XGBoost Classifier with sklearn preprocessing pipeline
Preprocessing:
- Numeric features: Passthrough (no transformation)
- Nominal categorical features: OrdinalEncoder
- Ordinal feature (Designation): OrdinalEncoder with hierarchy (Executive β Manager β Senior Manager β AVP β VP)
Best Hyperparameters:
- colsample_bylevel: 0.6
- colsample_bytree: 0.6
- learning_rate: 0.15
- max_depth: 5
- n_estimators: 250
- reg_lambda: 0.5
Classification Threshold: 0.45 (optimized for F1-score)
Performance Metrics
Training Set
- Accuracy: 0.9942
- Precision: 0.9711
- Recall: 1.0000
- F1-Score: 0.9853
Test Set
- Accuracy: 0.9286
- Precision: 0.8165
- Recall: 0.8113
- F1-Score: 0.8139
How to Use
import joblib
import pandas as pd
from huggingface_hub import hf_hub_download
# Download the model
model_path = hf_hub_download(
repo_id="nsriram78/tourism-package-prediction",
filename="tourism_conversion_predict_model.joblib",
repo_type="model"
)
# Load the model
model = joblib.load(model_path)
# Prepare input data (must match training feature order)
input_data = pd.DataFrame([{
'Age': 35,
'CityTier': 1,
'DurationOfPitch': 15,
'NumberOfPersonVisiting': 3,
'NumberOfFollowups': 3,
'PreferredPropertyStar': 4.0,
'NumberOfTrips': 3,
'Passport': 1,
'PitchSatisfactionScore': 3,
'OwnCar': 1,
'NumberOfChildrenVisiting': 1,
'MonthlyIncome': 22000,
'TypeofContact': 'Self Enquiry',
'Occupation': 'Salaried',
'Gender': 'Male',
'ProductPitched': 'Basic',
'MaritalStatus': 'Married',
'Designation': 'Manager'
}])
# Get prediction probability
prediction_proba = model.predict_proba(input_data)[0, 1]
# Apply custom threshold
prediction = (prediction_proba >= 0.45).astype(int)
print(f"Purchase Probability: {prediction_proba:.2%}")
print(f"Prediction: {'Will Purchase' if prediction == 1 else 'Will Not Purchase'}")
Training Procedure
- Data Preparation: 80/20 train-test split with stratification
- Hyperparameter Tuning: GridSearchCV with 5-fold cross-validation
- Optimization Metric: F1-Score (to balance precision and recall)
- Experiment Tracking: MLflow for logging parameters and metrics
Limitations and Considerations
- The model is trained on historical data and may not generalize to significantly different customer populations
- Performance depends on data quality and feature completeness
- Class imbalance handled but may still affect predictions on minority class
- Custom threshold of 0.45 optimized for current dataset; may need adjustment for different use cases
- Model assumes input features are in the exact order and format as training data
Ethical Considerations
- Ensure model is used responsibly for marketing purposes only
- Regularly monitor for bias in predictions across different demographic groups
- Respect customer privacy and comply with data protection regulations
- Provide opt-out mechanisms for customers who don't wish to be contacted
Model Card Authors
Sriram Narasimhan
Model Card Contact
For questions or issues, please open an issue in the model repository.