Researchers at the University of Waterloo and the University of Maryland have collaborated with the Comcast Applied AI Research Lab to improve the voice query understanding capabilities of the Comcast Xfinity X1 entertainment platform.
Today, we have become accustomed to talking to intelligent agents that do our bidding — from Siri on a mobile phone to Alexa at home. Why wouldn’t we be able to do the same with TVs? Comcast’s Xfinity X1 does exactly that — the platform comes with a “voice remote” that accepts spoken queries. Your wish is its command — tell your TV to change channels, ask it about free kids’ movies, and even about the weather forecast.
Although the media and technology company had already delivered more than 20 million remotes to customers by the end of 2017 and fielded billions of voice commands, the Comcast team still saw room for improvement in the results returned by one of the company’s most popular products. Even though the device returns remarkably accurate results, thanks to an AI-powered platform and Comcast’s rich trove of entertainment metadata (information like a show’s title, actors and genre), it would still sometimes return odd responses. This was in part due to the type of AI used in the platform, which is based on matching patterns, and doesn’t always correctly interpret user intent.
How does their technique work? Explains Dr. Rao, “Say the viewer asks for ‘Chicago Fire,’ which refers to both a drama series and a soccer team — how does the system determine what you want to watch? What’s special about this approach is that we take advantage of context — such as previously watched shows and favourite channels — to personalize results, significantly increasing accuracy.”
Not content with this success, the researchers have begun to develop an even richer model, which is also outlined in their paper. The intuition is that by analyzing queries from multiple perspectives, the system can better understand what the viewer is saying. This model is being readied for deployment at the moment.
"My research group aims to build intelligent agents that can interact with humans in natural ways, and this project provides a great example of how we can deploy AI technologies to improve the user experience.”
For more information about this research, please see Jinfeng Rao, Ferhan Ture, Jimmy Lin, "Multi-task learning with neural networks for voice query understanding on an entertainment platform," Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, August 19–23, 2018.