Accidental empires, I’ve lost the remote, IBC and the power of voice – Rethink Technology Research

Rethink Technology Research

By Peter White Published  When we first heard that some of the TV companies wanted to put a feature into TVs they make, which allowed you to change channel by voice command, Faultline Online Reporter could not be less excited. On its own, the ability to change a TV channel is little more than a gimmick.
When Siri came out, we can remember being positively irritated at the voice and the stupid way that every iPhone owner just HAD to show us how clever it was, when all it did was regurgitate Wikipedia entries.
But the idea of an ever present voice assistant, which moves from your car to your home to your place of work, which is always on hand, who knows you and your habits and a surrounding ecosystem – just how that concept has emerged from those early beginnings is not quite clear – only that this is much more like something out of an entire genre of Sci-fi movies, and it has the ability to change the world as much as… say… the internet.
To us it seems like the entire mission to landgrab speech understanding was a bit of an accident. When AI began to become a hot word about 4 years ago, Natural Language was explained to us as the core AI platform, which all these AIs would operate through. We know for sure that this has just not happened. AI technology is in danger of not happening – primarily because 650 VC investments happened before much had been done to improve the core AI algorithms. But the vision made sense – when you want a specialist job doing, like “find the best photo of me from these 2,000 photos taken,” or “get me a meeting with these 20 people” an automated service is going to achieve that faster than you, and with a lot less of your effort.
The only issue here was that there was some kind of requirement that those services were AI based, which there is no need of at all. The only AI required in the equation is natural language voice understanding and that has improved leaps and bounds of late.
But at soon as Alexa replied to the Siri threat and made its play to dominate voice, along with additions from Cortana and Google Assistant, the race has been on to dominate voice – a race that Alexa appears to be winning.
Initially we thought that the killer app for voice was the help desk, where within two rings the phone is always answered, and a voice assistant collects some data on your problems and in some cases answers them, but for the trickier callers it is merely there to hold the fort until a real human comes free. We were led to believe that this could cut down calls which needed humans to a mere 15% of the time, releasing about 85% of the world’s help desk employees, to go and find something else they could do. The effect on the bottom line of many of the world’s operators would have been considerable. It’s not to say that this is not a viable application for voice at some point, but this is not how the technology world fights its wars.
Instead this year there have been a host of intelligent speakers announced – speakers you can talk to, and while that’s what Echo Dot can be described as, the clues are in the specification of Apple’s HomePod. It was created with Apple Music in mind and “provides deep knowledge of personal music preferences and tastes and helps users discover new music.” But that’s not what Alexa does. The product line does have those kinds of variants too, but you are encouraged to buy Echo Dots in packs of 6, quite simply to put them in all your rooms so that your voice is never out of reach of Alexa.
The Apple HomePod had six microphones, and noise cancellation from the TV or music speaker. So it can just hear you and comes with a custom array of seven beam-forming tweeters, each with its own amplifier, and precise directional control of beam shapes and sizes all directed by an Apple A8 chip as its intelligence.
Back in June, Qualcomm showed its Smart Audio Platform, with two SoC variants, various bits of software, so that OEMs can pile into this market. It is multi-microphone, far-field voice, highly responsive voice activation, and beamforming, with echo and noise-cancellation, noise suppression and ‘barge-in’ functions. Qualcomm calls its multi-room audio technology AllPlay and said it supports whole-home audio music streaming.
But Qualcomm’s Smart Audio is going to have its work cut out for it if the company is successful in its bid to buy NXP. At IBC we spoke to Martyn Humphries, VP consumer and Industrial App Processors, and Leonardo Azevedo, director product marketing – a double act who are convinced that voice has replaced smartphones as the way to control all in-home IoT devices.
The first thing they want to show me is their iMX8 M (the last M is for media). The family of chips are based on up to four ARM Cortex-A53 and other Cortex-M4 cores and targets audio, voice and video processing, and supports video of all types and complexity, and are their set top chips, among other things.
At first the conversation is stilted; “It’s a hybrid, supports Dolby, HDR10 and HLG, all in 1,” and I think I am in a discussion about set tops. “That’s right,” says Azevedo, “a set top that is also a smart speaker.” Now why would anybody want that? I ask.
“Speech is the new remote,” comes the reply. I suggest this is unlikely, and am told to wander IBC halls 8 to 12 and see that there are entire ranges of Sony, Samsung and LG TVs which no longer come with a remote, but have a chip like this one in it. My interest is piqued.
Now the voice fight goes full circle and I tell them how much I hate the idea of an entire voice chip just to change the channel. “No, you have to have them in all sorts of form factors, all over the house,” I am met with.
It’s like rejoining the countless discussions we have had about homes needing more than one WiFi Access Point, preferably one in each room – beamforming is a common technology for both speech and WiFi, it seems.
NXP is working to what appears to be a broader vision. They are arms dealers and will work with any of the voice platforms and provide the audio and the media chip to do a variety of things, although Google Voice Assistant is the dominant variant they have to show me.
“People expected you to have the internet of things all over your house and wanted to you carry your smartphone with you to control it,” said Azevedo. “But you come home, you put your phone on to charge, and don’t want to carry it when you are putting washing in the washing machine.”
Yes they both still believe that every device in the home will have an IoT controller in it, so they see voice being ever-present and commands from that voice being sprayed around the home by WiFi or 802.15.4 , the washing machine, the front doorbell, the toaster. We point out that this vision has stumbled somewhat as operators like AT&T and Comcast have tried to charge $30 a month for simple home automation services. This is all to do with high pricing, which comes from closed architectures, which do not rely on open APIs, they inform me and a little nodding voice reminds me of earlier conversations at the show which talked of how easy it is for Echo Dot to communicate with a new service – primarily because like the grown up it is, it has not tried to keep all the fruits of the voice revolution to itself, but made its APIs clear and open and easy to write to.
But it turns out there is a need for AI to tinge all these services says NXP, “These systems need to recognize if it’s your voice, so they can open the door to you, or they need to tell you the person at the door is a “trusted” person who always delivers for Fedex, or is a stranger.”
It strikes us that the Apple way of doing things, which relies on everyone having an Apple device and only allowing Siri to talk to approved devices which pay Apple money, is not a good route for the operator. Operators can work with Alexa, because they can use the voice capabilities to trigger their own services – such as search through set top.
We remind NXP that Comcast uses an Arris designed “push to talk” remote control, and having spoken to Arris a few hours before, mention that over 13.5 million Comcast homes have taken the device. People have to speak into the remote control of the Arris set top, and change channel with a handful of words – simply “NBC,” will get you to that channel.
Charles Cheevers, CTO at that division of Arris, explained that you “pushed to talk” to save battery, otherwise it had to be in an “always listening” mode. He then hales us with the best feature, “Where’s the remote?” gets the remote to volunteer its position (unless of course it has indeed run out of battery). Or perhaps if the remote was in your hand when you were trying to put the washing machine on (see earlier example) after Alexa told you the washing cycle had finished, and you accidentally dropped it into the machine.
The simple truth is that both approaches are right. Right now there is no global voice infrastructure for talking to yourself and hearing back your chosen assistant. It’s okay to try to be part of building that ecosystem as NXP is, and it’s okay to put in interim measures to use voice for point products like the remote. But only one strategy is long term. You can’t push to talk when you are calling for the coffee or washing machines to turn themselves on or off, and sometimes they need to speak to you – “The coffee, toast, is burning, boiling over, the washing machine cycle is complete. Or you have put wool on a cotton cycle.”
IBC 2017 then teaches us that voice IS the next big platform and while others continue to believe that VR, 360 and AR are the hottest new thing, those working on the underbelly of the technology know that this is a technology for which the right visualization device has not yet been built. While voice is there waiting to be done – right now.
As we travel around the show, everyone has a voice sub-text. SoftAtHome takes us aside at the end of the interview with David Souhami, ‎Director of Innovation & Product Marketing at SoftAtHome, and shows us his fledgling “Maestro,” his version of Alexa. He tells it to wake up and then does a TV program search and then triggers a movie by mentioning one word of its title. He flips it to surveillance and we keep an eye on my bag and coat outside. Unlike NXP, it is not in the set top, but on a smart speaker right next to the TV. They are clearly on the way to merging, as one of many form factors.
Another chat we have is with John Maguire, Managing Director, Accenture Digital Video. Previously, Maguire was obsessed with analytics and how to be sure that video has a high QoE no matter what device. Today he wants to know if voice is the next platform and how can he advise clients to make money out of it.
We kick it around for a while and suggest that if Amazon was like Apple, when you get up to leave, it might contact Uber and asks for a cheap offer, it speaks that offer to you, “Mr. Uber says he can offer you a special to take you to work, 20% off your usual price,” and Alexa takes a 30% slice of the pie. Absolutely that’s Apple’s way, but not Alexa’s – more used to 5% of a lot of what you spend, rather than 30% of a little. So the Alexa way may make that scenario work.
And the reason for all this is that we all want to show off our almost preternatural ability to connect and control the devices around us – we want to ask the TV to turn itself on and tell us the weather, yell homework at the kids and see said homework appear superimposed on the TV, program our GPS for downtown while we walk to the car, check that we turned the iron off in the home, just by asking from the car. Not because it will save us time and effort, but because it makes us look cool. Oh and it might save us some effort, sure.
Our big problem in a Google Voice Assistant world is that you can speak a Google search, but the search itself is no better than a typed one. Which is such a shame, because if my voice is recognized, and it knows who and where I am, it could do a lot better job of finding me what I am actually looking for instead of the strange replies Google brings me time and time again.
So those AI features should begin to slide into these services and I want to name that one “Intelligent Search” because context and identity should be able to help us do a lot better with search – and do it without having to log in.