A journey of understanding
- Publication: Financial Times
- Date: 2001-05-16
- Author: Geoffrey Wheelwright
- Page: 6
- Language: English
Science fiction has a lot to answer for. Talking computers, gadgets that can understand human speech, and ubiquitous, wireless personal communications devices all made their first appearances in works of fiction. And now they all exist, in one form or another.
So it should perhaps come as no surprise that one of the latest initiatives to enhance and popularise such technology should bear the name of a science fiction character: Dr Who, of BBC television fame.
Dr Who is also the name given by Microsoft Research to a speech recognition engine which allows users of hand-held computers to break away from dependence on keyboards and touch screens as primary methods of interacting with their devices.
While the fictional Time Lord mysteriously traversed time and space in an old-style, battered police callbox known as the Tardis, the new Dr Who of Microsoft's research labs is designed to take its users on a different kind of journey. It is a journey of understanding, based on the notion that software should be able to listen to - and understand - natural language.
An interactive notepad
The Dr Who engine is an integral element of another Microsoft research project known as MiPad (for My Interactive notePad). This is based on the notion that hand-held devices (including hand-held computers and mobile phones) are, in some ways, the ideal platform for voice recognition to be used to enter data and issue commands to - or interact with - a voice or data network.
To start with, speech input can be one way to address the natural data input limitations of hand-held computers with-out keyboards as they offer an alternative to pen input. The latter, meanwhile, helps reduce some of the potential ambiguity of speech input - such as background noise, multiple users, accents, and idioms - by requiring users to set a con-text for their speech input before they begin speaking.
According to Alex Acero, manager of the speech research group at Microsoft Research, the combination of hand-held devices and wireless data communications extends the range of possible uses of voice recognition. The research group, he says, was challenged by the need to provide a desired range of voice recognition in a device that is traditionally small, low-powered and limited in both processing power and memory.
While today's crop of mobile phones and hand-held computers are vastly more capable than the desktop computers of 10 years ago (when voice recognition software first became popular for use on personal computers), they are still not powerful enough to handle high volumes of voice recognition work with speed and accuracy.
So Mr Acero and his team decided to take a different approach. Noting that many hand-held computers (and even some mobile phones) can already record speech digitally - and can also maintain a wireless connection to a digital network - the researchers looked at how the processing requirements for voice recognition might be shifted from the hand-held device to a more powerful server on that wireless network.
This plan, which underpins the work that Mr Acero is conducting within the MiPad project, offers the promise of allowing users to harness huge amounts of networked memory and processing power to remotely carry out real-time recognition of voice data collected from hand-held devices.
"We call this distributed speech recognition architecture - using a smart phone or personal digital assistant to do the sound capture and compression and having the recognition off-loaded to the server," says Mr Acero.
He says the research group also discovered that it did not even need to transmit the entire content of recorded speech to carry out the recognition.
Instead, it discovered that software could be used within the hand-held device to compress the recorded speech file.
By including only the so-called spectral features of the speech used by the Dr Who speech recognition engine (also known as the 'recogniser'), the amount of data that had to be transmitted wirelessly could be reduced.
In addition, further compression can be achieved by using software to eliminate silences from the recorded speech, replacing them with a 'map' of where the silence is to be placed when carrying out the recognition.
Microsoft's research assumes that this technology will be used over a wireless packet data network - such as the planned 3G wireless mobile phone network being rolled out by many of the world's leading telecoms companies.
'Best scenario for the user'
Mr Acero says, however, that the company is trying to stay apart from any decisions about which standards or proto-cols should be supported. "In the research lab, we haven't been looking at what medium in which we will be using it - we look instead at what the best scenario is for the user," he says.
"In general, Microsoft has been fairly agnostic to the protocols. Ideally, we would like to use this with any major protocol. But the technology is still in the lab and we have no product plans yet."
Derek Jacoby, program manager at Microsoft Research, says one other issue, the reluctance of many users to talk to machines, came to light when conducting the project. These psychological barriers remained, no matter how powerful and accurate the speech recognition software was.
"Many people will talk to their cellphone, but they don't want to talk to their hand-held device," he says.
"One of the observations we noted was that that users' behaviour patterns begin to change over time. When we put a device in someone's hands and followed them for a while, we found that they got used to it."
Disclaimer: These citations are created on-the-fly using primitive parsing techniques. You should double-check all citations. Send feedback to whovian@cuttingsarchive.org
- APA 6th ed.: Wheelwright, Geoffrey (2001-05-16). A journey of understanding. Financial Times p. 6.
- MLA 7th ed.: Wheelwright, Geoffrey. "A journey of understanding." Financial Times [add city] 2001-05-16, 6. Print.
- Chicago 15th ed.: Wheelwright, Geoffrey. "A journey of understanding." Financial Times, edition, sec., 2001-05-16
- Turabian: Wheelwright, Geoffrey. "A journey of understanding." Financial Times, 2001-05-16, section, 6 edition.
- Wikipedia (this article): <ref>{{cite news| title=A journey of understanding | url=http://cuttingsarchive.org/index.php/A_journey_of_understanding | work=Financial Times | pages=6 | date=2001-05-16 | via=Doctor Who Cuttings Archive | accessdate=17 November 2024 }}</ref>
- Wikipedia (this page): <ref>{{cite web | title=A journey of understanding | url=http://cuttingsarchive.org/index.php/A_journey_of_understanding | work=Doctor Who Cuttings Archive | accessdate=17 November 2024}}</ref>