Running 2 ChinaTravel ๐ข 2 Evaluate and compare AI model performance on ChinaTravel benchmark tasks