Scaling Faithful Reasoning in Large Language Models
Sarah Chen, Wei Zhang, Aditya Ramesh, Christopher Manning
I am an Associate Professor in the Department of Electrical Engineering and Computer Science at MIT, and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).
My research sits at the intersection of natural language processing and machine learning, with a particular focus on building language systems that reason reliably, explain their decisions, and align with human intentions.
Before joining MIT I was a research scientist at Google DeepMind. I completed my PhD at Stanford University under the supervision of Prof. Christopher Manning.
PhD in Natural Language Processing
Stanford University, Computer Science
2013 - 2018
Stanford, CA
Thesis: Structured Representations for Compositional Semantics
BSc (Honours) in Computer Science & Mathematics
University of Toronto, Computer Science
2009 - 2013
Toronto, Canada
Associate Professor in Computer Science
Massachusetts Institute of Technology, EECS / CSAIL
2021 - Present
Cambridge, MA
Research Scientist
Google DeepMind, Language Research
2018 - 2021
Mountain View, CA
Sarah Chen, Wei Zhang, Aditya Ramesh, Christopher Manning
Wei Zhang, Sarah Chen, Yann LeCun
Sarah Chen, Amanda Liu, David Silver
Sarah Chen, Priya Sharma, Carlos Rodriguez
Wei Zhang, Sarah Chen
Foundations of Faithful Reasoning in Language Models
Developing training methods and evaluation frameworks for improving logical consistency in large language models.
Human-Aligned NLP Systems
Multi-institution project on building NLP systems that align with human values and intentions.
MIT Technology Review Innovators Under 35
Recognized for pioneering work on faithful reasoning in AI systems.
Best Paper Award
NSF CAREER Award
Early-career faculty award for research on interpretable language models.
Mechanistic interpretability of transformer models
Preference learning and RLHF alternatives
Multilingual NLP and cross-lingual transfer
LLM evaluation benchmarks
Postdoc position on our DARPA-funded project on human-aligned NLP systems.
Requirements
PhD in NLP, ML, or related field. Publications in top venues.
Graduate seminar covering modern approaches to NLP including large language models, in-context learning, and alignment techniques.
Introduction to machine learning concepts, supervised and unsupervised methods, with a focus on deep learning fundamentals.
Excited to share that our paper 'Scaling Faithful Reasoning in Large Language Models' has been accepted as an oral presentation at NeurIPS 2024!
Read moreI am recruiting 2 PhD students to start Fall 2025. Research areas: LLM reasoning, interpretability, and alignment. Please apply through the MIT EECS admissions portal.
Read moreMIT Technology Review
The Researchers Making AI Think More Clearly
Feature article on our group's work on faithful reasoning in language models.
Lex Fridman Podcast
AI Alignment: Where Are We Now?
Conversation about the current state of AI alignment research and practical approaches.
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology