The use of voice interfaces to control devices by voice command is expected to increase rapidly in the coming years. However, this will only be possible if the voice processing takes place within the device itself and not in the cloud, believes Arm’s Chris Shore.
By 2022, it is estimated that demand for voice enabled devices will have grown to 1.6 billion units in the U.S. alone. In order to satisfy that demand in the aforementioned timescale, the majority of those devices will need to be developed in 2020 and deployed shortly after. While Arm anticipates some barriers to this growth, the required solutions needed exist and are currently in active development.
Today’s AI-powered natural language processing of smart voice devices occurs in cloud data centers, potentially thousands of miles away from the device, which can have implications related to energy consumption, cost, data privacy and bandwidth. This in turn represents an opportunity for smart devices to become smarter, as capabilities are developed which enable more intelligence at the edge. Only then, can these systems be scaled to the predicted level of deployment.
Arm predicts some key step changes in these systems and its partners agree. Here are three significant changes to the way that smart voice devices will function according to Arm.
Developments in artificial intelligence have fundamentally changed the way we interact with our surroundings through voice. However, existing voice assistants will need to be further developed, as they are heavily centralized in the cloud.
An increase in on-device processing capability will enable a switch to local processing, reducing bandwidth requirement, the cloud processing required as well as energy consumption, while simultaneously increasing privacy and security. Moreover, moving the processing to the device enables one to build a trusted and transparent relationship between humans and their devices.
A great deal of development time is currently being invested to extend the limits of what can be achieved on a standard microcontroller. Current algorithms and compute libraries are significantly increasing capabilities with limited processing resources.
However, to enable on-device language processing, a step-change is needed. This means that cloud-level performance must be achieved with constrained platforms.
Three developments will enable that to happen:
Familiarity with context, identity, and a user’s past activity will allow interactions to become fluid and natural. This personal aspect will also need to be pervasive, so that the user’s environment seamlessly adapts wherever he or she goes. Through data-augmented research, sufficient volumes of formulation examples can be generated to show how end users would talk to their voice assistant.
The Author
Chris Shore
is director, embedded solutions, in the Automotive and IoT Line of Business at Arm. He leads a team responsible for Arm’s CPU and system portfolio for IoT and embedded applications.
During over 18 years at Arm, Chris’ previous roles included responsibility for the technical content of Arm conferences, leading Arm’s customer training activity – delivering over 200 training courses every year to customers all over the world – and managing the Arm Approved design house partner program.
Chris holds an MA in Computer Science from Cambridge University, is a Chartered Engineer and a member of the Institute of Engineering and Technology (MIET).