Guest blog: Aswini Dasika, principal software engineer, DWP Digital, discusses some of the issues to explore in developing voice based interactions
If necessity is the mother of invention, I’d have to say that laziness is its father! And this is definitely the case in the evolution of user interfaces.
A user interface is a term commonly associated with the digital world and more specifically web based applications. However ‘user interface’ as a concept didn’t just start with digital applications. For instance, if you think of a paper based application form, which all of us have filled in at some point in our lives, that piece of paper is the user’s interface to provide data.
Then it was digital forms such as Word documents, and then came web based applications where you submit by an application on a website.
I’m not old enough to know how it used to be before paper based applications, but I assume it would have been verbal engagement. You go to an office, tell the office clerk what you want to apply for and job done.
So when I think of voice interactive technology and voice first systems being the next big thing in digital, it really does seem that we’ve come full circle.
A rapid evolution
I’m really excited about the potential of voice interactive technology that effectively enables us to talk to a computer system with the receiving system ‘listening’ and processing our request, just like a standard website would when a form has been submitted.
This is primarily enabled by two technologies - natural language processing (to convert audio into commands for the back end system to process) and machine learning (to learn accents, context etc.)
The evolution of voice interactive technology in past few years has been quite rapid. It’s made its way into home automation where you can control devices such as light bulbs, power sockets and washing machines using voice commands, smart devices and virtual assistants to help you with routine standard tasks with the ability to just ‘ask’ them to play music, order food or switch a light on.
A human touch
A voice first platform is where, primarily from a user interface point of view, you approach the design of a product for it to use voice based interaction as the main interface mechanism. You think of how you can make the human interaction with the system you are building as natural and intuitive as possible, in the same way people interact with each other.
While this provides a very natural human interaction with the computer system, it opens up a number of challenges. If you think of a standard web application interface, you design the web pages of your application to control what a user can do. And to perform all sorts of validations to make sure the user submits the data as you (or the back end system) expect it to be for a given context.
With voice based interaction it’s difficult to have that level of control, given the numerous ways of saying or asking the same thing. Different accent variations, never mind different languages spoken across the world, all come into play.
We have to be mindful of the user experience, and the challenge is avoiding the system response: “Sorry, I did not understand what you said”, or being asked to repeat your command. So the challenge is in reimagining how you shape your voice based interface to be highly intuitive and usable.
The good news is that you don’t have to start from scratch if you want to implement a voice first platform or a voice interactive system. There are some major players in the technology industry who already offer a service that can be adapted and implemented: Amazon (Alexa), Apple (Siri), Google (Google Assistant) and Microsoft (Cortana) among others. Although Amazon definitely looks to be ahead of the game, with its Echo line of devices and opening up the Alexa platforms for wider developer contribution, all these players have their own unique strengths.
I see one of the major challenges to be user authentication – this is something that needs to be solved before this technology can be adopted at scale.
While there are ways and means, provided by the platforms mentioned, to authenticate the users, the real step-up would be for the systems to authenticate purely based on voice. Or at least as the main authentication method, aided by others.
The other related challenge, I feel, is users’ identity protection, as systems would also have to store and access recorded voice samples and patterns.
Another barrier that could slow down wider user adoption of this technology is the scepticism that devices providing voice based services will usually be in constant listen mode and only respond when a certain ‘wake word’ has been used (this is true, by the way). So, in my view, the commercial success of this concept as a mainstream technology depends on how effectively the vendors and service providers prove its reliability and security.
The 'mood' question
However, the great advantage, but potential minefield, that voice based systems provide is clues to the ‘mood’ or context in which the user is using the system.
Voice analytics are progressing to analyse the tone and pitch of the user’s voice during an interaction, so that the system can understand the ‘mood’ of the conversation, and respond with different offers based on this. Literally, reading your mind! Voice analytics can also embed results into other platforms and be used to target marketing based on the insights from the customer interaction patterns.
Like with any emerging technology, there is immense potential that we could tap into to exploit its power to serve both mainstream and vulnerable users. However, a well thought out and careful implementation will help its adoption and quell any fears associated with it.
I personally believe very strongly that this technology would revolutionise our interactions with machines. After all, who wouldn’t want to take the lazy route of ordering from the couch to get things done?