Yao Dou

Hello, Welcome!

I am a fifth-year Ph.D. student at Georgia Tech, advised by Prof. Wei Xu. I received my B.S. from University of Washington at age 20, advised by Prof. Yejin Choi.

My research focuses on evaluating large language models on hard-to-evaluate tasks. I develop learned evaluation metrics, fine-grained evaluation methods, user simulators for multi-turn evaluation, and long-context evaluation with checklists and agents. I also explore applying LLMs to real-world problems such as measuring and protecting user privacy on social media.

Representative Work

Gavel: Agent Meets Checklist for Evaluating LLMs on Long-Context Legal Summarization [Website] [GitHub] [Paper]
Yao Dou, Wei Xu
ArXiv Jan, 2026

SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants? [Website] [GitHub] [Paper]
Yao Dou, Michel Galley, Baolin Peng, Chris Kedzie, Weixin Cai, Alan Ritter, Chris Quirk, Wei Xu, Jianfeng Gao
EMNLP 2025

Reducing Privacy Risks in Online Self-Disclosures with Language Models [Dataset] [Model] [Paper]
Yao Dou, Isadora Krsek, Tarek Naous, Anubha Kabra, Sauvik Das, Alan Ritter, Wei Xu
ACL 2024

Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA [Demo] [GitHub] [Paper]
David Heineman, Yao Dou, Mounica Maddela, Wei Xu
EMNLP 2023

Lens: A Learnable Evaluation Metric for Text Simplification [Demo] [Rank and Rate ] [GitHub] [Paper]
Mounica Maddela*, Yao Dou*, David Heineman, Wei Xu
ACL 2023

Tutorial

Automatic and Human-AI Interactive Text Generation [Webpage] [Proposal]
Yao Dou*, Philippe Laban*, Claire Cardent, Wei Xu
ACL 2024

Others

Evaluating LLMs on Chinese Idiom Translation [Paper]
Cai Yang, Yao Dou, David Heineman, Xiaofeng Wu, Wei Xu
COLM 2025

CollabLLM: From Passive Responders to Active Collaborators [Website] [GitHub] [Paper]
Shirley Wu, Michel Galley, Baolin Peng, Hao Cheng, Gavin Li, Yao Dou, Weixin Cai, James Zou, Jure Leskovec, Jianfeng Gao
ICML 2025 (Outstanding Paper Award)

Measuring, Modeling, and Helping People Account for Privacy Risks in Online Self-Disclosures with AI [Paper]
Isadora Krsek, Anubha Kabra, Yao Dou, Tarek Naous, Laura A. Dabbish, Alan Ritter, Wei Xu, Sauvik Das
CSCW 2025

CROSSNEWS: A Cross-Genre Authorship Verification and Attribution Benchmark [Paper]
Marcus Ma, Duong Minh Le, Junmo Kang, Yao Dou, John Cadigan, Dayne Freitag, Alan Ritter, Wei Xu
AAAI 2025

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models [Website] [Dataset] [GitHub] [Paper]
Mu Cai, Reuben Tan, Jianrui Zhang, Bocheng Zou, Kai Zhang, Feng Yao, Fangrui Zhu, Jing Gu, Yiwu Zhong, Yuzhang Shang, Yao Dou, Jaden Park, Jianfeng Gao, Yong Jae Lee, Jianwei Yang
ArXiv Oct, 2024

Improving Minimum Bayes Risk Decoding with Multi-Prompt [Paper]
David Heineman, Yao Dou, Wei Xu
EMNLP 2024

GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation [Paper]
Govind Ramesh, Yao Dou, Wei Xu
EMNLP 2024

Thresh 🌾: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation [Webpage] [GitHub] [Paper]
David Heineman, Yao Dou, Wei Xu
EMNLP 2023 (Demo)

Improving Large-scale Paraphrase Acquisition and Generation [Webpage] [Paper]
Yao Dou, Chao Jiang, Wei Xu
EMNLP 2022

Is GPT-3 Text Indistinguishable from Human Text? Scarecrow 🎃: A Framework for Scrutinizing Machine Text [Webpage] [Paper]
Yao Dou*, Maxwell Forbes*, Rik Koncel-Kedziorski, Noah A. Smith, Yejin Choi
ACL 2022

MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations [Paper]
Yao Dou, Maxwell Forbes, Ari Holtzman, Yejin Choi
AAAI 2021

Undergraduate Advising

David Heineman -> PYI at Allen Institute for AI
Cai Yang
Govind Ramesh
Jerry Lou Zheng
Vinayak Athavale