AI and Machine Learning Help Researchers Get Audio from Still Images and Silent Videos

ECE/Khoury Professor Kevin Fu has developed a machine learning tool called Side Eye that can extract audio from pictures and even muted videos.


This article originally appeared on Northeastern Global News. It was published by Cody Mello-Klein. Main photo: When you take a photo on your phone, the vibrations of your voice can create tiny bends in the light that are enough to extract audio, according to Kevin Fu, a professor of engineering and computer science at Northeastern University. Photo by Matthew Modoono/Northeastern University

With video calls becoming more common in the age of remote and hybrid workplaces, “mute yourself” and “I think you’re muted” have become part of our everyday vocabularies. But it turns out muting yourself might not be as safe as you think.

Kevin Fu, a professor of electrical and computer engineering and computer science at Northeastern University, has figured out a way to get audio from pictures and even muted videos. Using Side Eye, a machine learning assisted tool that Fu and his research team created, Fu can determine the gender of someone speaking in the room where a photo was taken –– and even the exact words they spoke.

Headshot of Kevin Fu.

Kevin Fu, professor of electrical and computer engineering and computer science at Northeastern. Photo by Matthew Modoono/Northeastern University

“Imagine someone is doing a TikTok video and they mute it and dub music,” Fu says. “Have you ever been curious about what they’re really saying? Was it ‘Watermelon watermelon’ or ‘Here’s my password’? Was somebody speaking behind them? You can actually pick up what is being spoken off camera.”

It sounds like the stuff of science fiction –– and it is. The idea for Side Eye was inspired by an episode of the sci-fi show “Fringe” that saw the main characters, a team of fringe science investigators working for the FBI, extracting audio from a melted pane of glass.

When the episode aired, one critic for Den of Geek called it a “ridiculous pseudo science technique.” Fu disagreed.

“I was like, ‘I bet we can do that,’” Fu says. “My lab specializes in the impossible. We usually expect the first reaction to anything we do to be ‘You can’t do that,’ and we say, ‘Well, we already did.’”

Side Eye takes advantage of the image stabilization technology that is now virtually standard across most phone cameras. To ensure a shaky hand doesn’t make for a blurry photo, cameras have small springs that hold the lens suspended in liquid. An electromagnet and sensors then push the lens in equal and opposite directions to reduce camera shake.

However, Fu says whenever someone speaks near a camera lens, it causes tiny vibrations in the springs and bends the light ever so slightly. The angle of the light changes almost imperceptibly –– “unless you’re looking for it,” Fu says.

Read full story at Northeastern Global News

Related Departments:Electrical & Computer Engineering