The chief concern about voice assistants (such as Siri and OK Google) has been surreptitious access to audio by manufacturers; while that’s a serious concern, it also tends to overshadow the fact that they can also be massive security vulnerabilities.
A new type of attack that uses ultrasonic waves to access a device through solid surfaces (such as a shared table) should serve as a wake-up call. Security researchers have pulled off attacks that use inaudible audio cues before, but prior techniques were delivered through the air and required line of sight on the device. This new version is more covert, requires less power, and extends the potential range considerably to the point of even possibly passing through multiple solid objects that are connected.
This new attack type is potentially the most dangerous of the bunch, but a number of existing attack angles are not getting serious enough attention. Voice cloning software can be used to not only defeat authentication measures, but this audio can also be formatted to be audible to voice assistants but not to the human ear. Researchers have also used lasers to trigger commands, taking control of devices from as much as 350 feet away using a telephoto lens.
Ultrasonic attacks on voice assistants
The new study, dubbed “SurfingAttack,” comes from a team of researchers based in various universities in the United States and China.
The attack makes use of a piezoelectric transducer to generate ultrasonic waves that replicate commands that voice assistants respond to. The transducer is attached to the same surface as the target device, passing the ultrasonic waves through the entirety of the material. These waves are inaudible and would effectively be undetectable to a human being without the use of special equipment.
In theory, any voice command that the target device is responsive to could be issued without detection. The researchers were able to successfully hack an SMS passcode to bypass two-factor text message authentication, and also to not only make phone calls but engage automated answering systems by using a synthesized voice. The attacks worked on the Google Assistants of 11 different types of Android phones, and the Siri feature of four different iPhones.
The potential range and effectiveness of the attack varies with the surface material, but a variety are viable: metal works best, followed by aluminum and glass (wood and plastic were not tested). The potential interference of various noises was also studied. Simulations of an office, cafe and airport setting demonstrated that the attack was still 100% successful with a low level of noise and 80% even with substantial background noise. There was 90% success with nearby conversations conducted at a normal volume. In general, the shorter the attack command the more likely it was to succeed even with background noise present.
An important side note is that smart speakers, such as Amazon Echo and Google Home, do not appear to be susceptible to this attack.
Many unpatched voice assistant vulnerabilities
On its own, this attack might seem to have such limited application that it isn’t much more than a novelty. However, it’s another tool in a box full of potential attacks on voice assistants for which there is still little remedy.
Ultrasound attacks that are imperceptible to the human ear date back to 2017; DolphinAttack was the first of these, using over-the-air ultrasound waves directed at nearby voice assistants within seven meters to pass hidden voice commands.
But what if a device is secured with a voiceprint? That’s where deepfaked audio comes in. As with the infamous videos, a learning algorithm can eventually build a text-to-speech replica of someone’s voice given enough quality input to work with; see Lyrebird for an example of the type of software that does this.
Late last year, researchers from Japan and the University of Michigan also demonstrated that voice assistants (in this case including home speakers) could be controlled with a well-placed laser. The researchers opened smart garage doors as part of their testing, but indicated that this technique could be used to replicate just about any command. All that is needed is line-of-sight; when the laser hits the diaphragm of the microphone, the light is converted into electrical signals. This works even when the mic is covered by a piece of tape.
So why are these vulnerabilities not addressed by manufacturers? It’s usually because a hardware redesign would be necessary. When researchers bring the vulnerabilities of voice assistants to the attention of manufacturers, their prescription is usually some added step at the user end: muting the microphone, setting up a voice PIN, and so on. While these added steps do often work, they are not always things that the end user would think to do.