There is a little bit of a voice command battle going on at the moment in the mobile space, with Apple’s digital Siri assistant and Google’s Voice Search offering duking it out against one another for the affections of users. Both come with their own set of pros and cons and will only get better as we move forward into the future, but it seems that Microsoft’s Research and Development teams have been working on something voice based of their own behind the scenes that could change the way humans interact with one another.
Although, Siri is a lot more powerful than Google’s Voice Search on the iOS platform, they both basically work in the same way by capturing audible commands from the user before uploading and processing that command on a back-end server before spitting the result back to the user and acting on the request. A video has been posted by Microsoft featuring a presentation from Rick Rashid that outlines the advancements the company has made in the field of natural user-interfaces using human speech.
Computer based systems that understand human speech isn’t a new invention. We have seen numerous releases over the years that show significant progress in this field, with Siri and Google Voice Search being a couple of the most recent examples of how this technology can be applied in the real world. However, Microsoft has been quietly attempting to take things a few steps further by building a system around technologies that can not only recognize human speech but also translate it into text of a foreign language and then repeat the words in that language using a synthesized voice generated to sound like the speaker.
The on-stage demonstration featuring Microsoft’s Chief Research Officer shows the technology in action by recognizing his spoken English words, converting them into Chinese on the projected display and then audibly announcing the exact same sentence back in Mandarin. All of that is mind blowing enough, but then things are taken a little further when you realize that Rashid and his colleagues have fed the system with over one hour worth of his own voice data to allow the system to repeat the sentence in Mandarin using his own voice!
The technology isn’t perfect and it still does make mistakes, but it is truly staggering to realize that this is possible. The future of human interaction is definitely on the horizon.