01. Juli 2020, 09:18 Uhr | Tobias Schlichtmeier
This language wizard not only listens to the actual trigger word »Amazon«, but also jumps, for example, when the words »On Sunday« are spoken.
Alexa, Siri or Assistant – the range of digital assistants for everyday life is growing. Researchers recently investigated which words trigger the language assistants. More than we thought, that much can be revealed.
»Alexa: How will the weather be today«? - a classic among the questions for a language assistant. It's »in«, it's convenient and it's fun to retrieve knowledge about Siri and Co. easily. However, there are words where the digital assistants start up without our knowledge. Researchers at the Ruhr-University Bochum (RUB) and the Bochum Max Planck Institute (MPI) for Cyber Security and Privacy have investigated which words these are.
If you have such an assistant in your living room, the list of English, German and Chinese terms compiled by the researchers is certainly interesting. It contains all the words that have been repeatedly misinterpreted by various language assistants as requests to listen. Whenever the systems start up, they record a short sequence of what has been said and transmit the data to the manufacturer, sometimes without the users noticing.
Employees of the corporations transcribe and check the audio snippets provided. In this way, fragments of private conversations can end up at companies. A selection of the trigger words and illustrative videos show the researchers on their homepage.
The IT experts tested the language assistants from Amazon, Apple, Google, Microsoft and Deutsche Telekom as well as three Chinese models from Xiaomi, Baidu and Tencent. They played them hours of German, English and Chinese audio material, including several seasons from the series »Game of Thrones«, »Modern Family« and »Tatort« as well as news broadcasts. In addition, professional audio data sets used to train language assistants were also included. All speech assistants were equipped with a diode that registered when the activity indicator of the speech assistant lit up, thus visibly switching the device into active mode and thus triggering a reaction.
The setup also registered when a voice assistant sent data to the network. Whenever one of the devices switched to active mode, the researchers recorded which audio sequence this was. They later manually evaluated which terms had triggered the speech assistant.
The Bochum research team – Thorsten Eisenhofer, Jan Wiele, Lea Schönherr, Maximilian Golla, Dorothea Kolossa (f.l.t.r.) - investigated which terms language assistants misunderstand as triggers.
From the data, the team created an initial list of over 1,000 sequences that speech assistants incorrectly trigger. Depending on how much emphasis is placed on the word, Alexa, for example, listens to »unacceptable« and »election« in English or Google listens to »OK, cool«. In German, for example, Amazon is fooled by »Am Sonntag« and Siri by the term »Daiquiri«.
In order to understand what makes these terms false triggers, the researchers decomposed the words into their smallest possible sound units and identified the units that were often confused by the language assistants. Based on the findings, they generated new trigger words and showed that they also triggered the speech assistants.
»The units are intentionally programmed to be somewhat liberal because they are supposed to be able to understand their humans. So they are more likely to jump on once too often than not, « explains researcher Dorothea Kolossa.
In the next step, the scientists investigated in more detail how the manufacturers evaluate false triggers. A two-stage process is typical. First, the device analyzes locally whether a trigger word is contained in the perceived speech. If the device suspects that it has heard the trigger word, it starts to upload the current conversation to the manufacturer's cloud for further analysis with higher processing power.
If the cloud analysis identifies the term as a false trigger, the speech wizard remains silent, only its indicator light lights up briefly. In this case, several seconds of audio recording can already end up at the manufacturer's premises, where employees transcribe to avoid such a false trigger in the future.
From a privacy point of view, this is of course worrying, because sometimes very private conversations can end up with strangers, the researchers believe. From an engineering point of view, however, the procedure is understandable, because manufacturers could only improve the devices with the help of such data. Manufacturers must strike a balance between data protection and technical optimisation.