Imagine
someone listening in to your private conversation by filming the bag of chips
sitting on the other side of the room.
Researchers
at MIT did just that: They've created an algorithm that can reconstruct sound (and even intelligible speech)
with the tiny vibrations it causes on video.
When
sound hits an object, it makes distinct vibrations. “There’s this very subtle
signal that’s telling you what the sound passing through is,” said Abe
Davis, a
graduate student in electrical engineering and computer science at MIT and
first author on the paper.
This
particular study grew out of an earlier experiment at MIT, led by Michael
Rubinstein, now
a postdoctoral researcher at Microsoft Research New England.
In 2012,
Rubinstein amplified tiny variations in video to detect things
like the skin color change caused by the pumping of blood. Studying the
vibrations caused by sound was a logical next step. But getting intelligible
speech out of the analysis was surprising, Davis said.
In one
example shown in a compilation video, a bag of chips is filmed from 15 feet
away, through sound-proof glass. The reconstructed audio of someone reciting
“Mary Had a Little Lamb” in the same room as the chips isn’t crystal clear. But
the words being said are possible to decipher.
In most
cases, a high-speed camera is necessary to accomplish the feat. Still, at 2,000
to 6,000 frames per second, the camera used by the researchers is nothing
compared to the best available on the market, which can surpass 100,000 frames
per second. And the researchers found that even cheaper cameras could be used.
“It’s
surprisingly possible to take advantage of a bug called rolling shutter,” Davis
said. “Usually, it creates these artifacts in the image that people don’t
like.”
When cameras use rolling shutter to capture an image, they don’t capture one
single point in time. Instead, the camera scans across the frame in one
direction, picking up each row at a slightly different moment.
“It kind of turns a two-dimensional low-speed
camera into a one-dimensional high-speed camera,” Davis explained. “As a
result, we can recover sounds happening at frequencies several times higher
than the frame rate of the camera, which is remarkable when you consider that
it’s just a complete accident of the way we make them.”
There
are definitely limitations to the technology, Davis said, and it may not make
for better sound reconstruction than other methods already in use. “Big brother
won't be able to hear anything that anyone ever says all of a sudden,”
Davis said.
“But it is possible that you could use this to discover sound in
situations where you couldn’t before. It’s just adding one more tool for those
forensic applications.”
No comments:
Post a Comment