Using Persuasion Dialogues To Improve AI Safety

ECE/Khoury Assistant Professor Weiyan Shi joined the College of Engineering in fall 2024 and has already made headlines. She was named to the MIT Tech Review 35 Innovators Under 30 and received two paper awards at the Association for Computational Linguistics annual meeting for her research in persuasive dialogues to address AI safety issues.
Weiyan Shi, assistant professor of electrical and computer engineering, and Khoury College of Computer Sciences, is researching how AI systems can behave as good citizens through her work with natural language processes and persuasive dialogues. Her research recently garnered her industry and media recognition and two best-paper awards.
Shi was named to the MIT Tech Review 35 Innovators Under 35, which recognized her persuasive dialogues research as a “bold vision” and received two paper awards focusing on AI safety at the Association for Computational Linguistics annual meeting.
Her research on Large Language Models, (LLMs), natural language processing, and persuasive dialogues is currently focused on identifying safety risks in AI systems by testing them with various conversational and persuasive queries aimed to expose system vulnerabilities. If, through repeated questioning, often laced with persuasive messaging, the system finally responds with the data requested, then it has a security flaw that needs to be fixed.
The goal is to teach the AI systems to build up defenses against these queries and ultimately provide responses that are guided by ethics and honesty. Ultimately the AI systems will be able to internalize the differences between good and bad and make decisions based on that knowledge.
“Traditionally, people use code or random strings to try to break the rules and get information” Shi says. “We are trying to humanize the systems and to approach this from a different angle that hasn’t been thought about before.”
She also plans to extend her research to determine how people can better use AI systems to achieve positive results, like successfully asking for a contribution to a charity.
“The overarching goal is to persuade humans for social good and persuade AI for AI safety.”
Shi’s paper awards include the Outstanding Paper Award for “The Earth Is Flat Because … Investigating LLMs’ Belief Towards Misinformation via Persuasive Conversation” that studies potential outcomes when repeated and persuasive questions containing misinformation are posed to an AI system.
A second paper, “How Johnny Can Persuade LLMs To Jailbreak Them: Rethinking Persuasion To Challenge AI Safety by Humanizing LLMs” received the Best Social Impact Paper Award for research that examines how to persuade LLMs with tactics that eventually will expose system security risks.
Shi worked extensively in AI research prior to joining the College of Engineering in August 2024, including as a data scientist in industry developing chatbots. She also worked as an intern at Meta AI Research, where she co-developed a negotiation AI dialogue agent, Cicero, that negotiated, persuaded, coordinated, and collaborated with human players in a high-profile game of Diplomacy in 2022. She received her PhD in computer science from Columbia University in 2023.
Shi’s research vision is to build a natural interface between human intelligence and machine intelligence through natural conversations, and to persuade humans for social good and persuade AI for AI safety.