SAGE: Benchmarking and Improving Retrieval for Deep Research Agents Paper • 2602.05975 • Published 6 days ago • 12
SAGE: Benchmarking and Improving Retrieval for Deep Research Agents Paper • 2602.05975 • Published 6 days ago • 12
SAGE: Benchmarking and Improving Retrieval for Deep Research Agents Paper • 2602.05975 • Published 6 days ago • 12
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17, 2025 • 6
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17, 2025 • 6
FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain Paper • 2510.15232 • Published Oct 17, 2025 • 6 • 2
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering Paper • 2510.06426 • Published Oct 7, 2025 • 3
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering Paper • 2510.06426 • Published Oct 7, 2025 • 3
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1, 2025 • 46
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1, 2025 • 46
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective Paper • 2505.15045 • Published May 21, 2025 • 55