PatronusAI/trace-dataset
Viewer
•
Updated
•
517
•
8
•
1
LLM Evaluation
Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis
MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments