Extracting Sound From Visual Cues: You Probably Don’t Want To Share Secrets While Eating From A Bag of Potato Chips

If you think that people are hacking your cellphone and listening to your conversations, this news should freak you out even more.

Researchers at MIT, Microsoft, and Adobe have developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in a video, for example:

  • Intelligible speech from the vibrations of a potato-chip bag photographed from 15 feet away through soundproof glass.
  • Useful audio signals from videos of aluminum foil, the surface of a glass of water, and even the leaves of a potted plant (vibrating at less than a hundredth of a pixel).

In one experiment, they recovered sound from the vibrating earbuds plugged into a laptop playing music. Then, they played the garbled sound bites back to Shazam to automatically recognize and identify the song being played.

“When sound hits an object, it causes the object to vibrate,” says Abe Davis, a graduate student in electrical engineering and computer science at MIT and first author on the new paper. “The motion of this vibration creates a very subtle visual signal that’s usually invisible to the naked eye. People didn’t realize that this information was there.”

Luckily, you cannot recover these sound bites using a regular smart phone video. The researchers explain that reconstructing audio from video “requires that the frequency of the video samples — the number of frames of video captured per second — be higher than the frequency of the audio signal.

So, you need a camera capable of filming at high-speed (2,000 to 6,000 fps) — which is much faster than the 60 fps possible with some smartphones and much faster than the 200fps possible with some digital cameras, but well below the frame rates of the best commercial high-speed cameras, which can top 100,000 frames per second.

The researchers even took advantage of the rolling shutter effect exhibited by some cameras to recover sound from a plastic bag of candy.

Read more about The Visual Microphone: Passive Recovery of Sound from Video.

via MITnews