Edit this essay
only $12.90/page

Speech Recognition Technologies Essay Sample

Speech Recognition Technologies Pages
Pages: Word count: Rewriting Possibility: % ()

Abstract

While commercial solutions for precise indoor positioning exist, they are costly and require installation of additional infrastructure, which limits opportunities for widespread adoption. Inspired by robotics techniques of Simultaneous Localization and Mapping (SLAM) and computer vision approaches using structured light patterns, we propose a self-contained solution to precise indoor positioning that requires no additional environmental infrastructure. Evaluation of our prototype, called TrackSense, indicates that such a system can deliver up to 4 cm accuracy with 3 cm precision in rooms up to five meters squared, as well as 2 degree accuracy and 1 degree precision on orientation. We explain the design and performance characteristics of our prototype and demonstrate a feasible miniaturization that supports applications that require a single device localizing itself in a space. We also discuss extensions to locate multiple devices and limitations of this approach.

2. Introduction We introduce a solution to indoor localization, TrackSense, that requires no additional infrastructure in the environment and provides 3D positioning and orientation data that performs well against existing research and commercial solutions. Although we have seen great progress toward the goal of indoor localization, almost all of the solutions that offer precise (few centimeter) indoor localization have been limited to techniques that require the introduction of new infrastructure to the physical space (e.g. cameras or beacons). These solutions are often costly and typically require time-consuming installations, and it is not easy to move the instrumentation from one space to another. Although existing commercial positioning systems are adequate for prototyping user experiences, their ultimate success relies on a localization approach that is inexpensive and easily deployed.

3. Accuracy It is notoriously difficult to measure the accuracy of speech recognition systems, as there are so many technical and human factors involved. Several experiments have attempted to compare speech recognition with other kinds of data entry, such as mouse, keyboard and handwriting recognition. Many of these experiments do not reach an overall conclusion concerning which system is better. One of the problems with the wider take-up of speech recognition is that the level of accuracy attained by a user does not match that stated on the software packaging . This can be for all manner of reasons, such as the machine specification being insufficient, or (more often) the level of training undertaken by the reader. Independent reviews of speech recognition systems   indicate a score of around 95% accuracy being possible with an increasing number of systems.

For example, in tests1 1 involving dictating a newspaper story, email message and business letter, Dragon NaturallySpeaking 6.0 scored 95% accuracy, ViaVoice scored 92% accuracy and NaturallySpeaking 5.0 scored only 85% accuracy. Speech recognition systems increasingly offer specialist vocabulary building systems. This step is particularly helpful when subject and user specific words and acronyms are likely to be used, such as specialist vocabulary from university subjects. Anecdotal evidence from various web sites points to a reduction in the error rate, by using specialist vocabulary, of usually around a third.

4. Future applications outside education There are a number of scenarios where speech recognition is either being delivered, developed for, researched or seriously discussed. As with many contemporary technologies, such as the Internet, online payment systems and mobile phone functionality, development is at least partially driven by the trio of often perceived evils that are games, gambling and girls (pornography). Though these applications are outside the educational sphere, it is important to remember that many ICT innovations, incorporated into academia over the last decade, were developed in other sectors.

5. Computer and video games Speech input has been used in a limited number of computer and video games, on a variety of PC and console-based platforms, over the past decade. For example, the game Seaman involved growing and controlling strange half- man half fish characters in a virtual aquarium. A microphone, sold with the game, allowed the player to issue one of a pre- determined list of command words and questions to the fish. The accuracy of interpretation, in use, seemed variable; during gaming sessions colleagues with strong accents had to speak in an exaggerated and slower manner in order for the game to understand their commands. Microphone-based games are available for two of the three main video game consoles (Playstation 2 and Xbox). However, these games primarily use speech in an online player to player manner, rather than spoken words being interpreted electronically. For example, a MotoGP for the Xbox allows online players to ride against each other in a motorbike race simulation, and speak (via microphone headset) to the nearest players (bikers) in the race. There is currently interest, but less developme nt, of video games that interpret speech.

6. Gambling

Online gambling has become a major industry in the last four years (to the degree that it has effected changes in gambling taxation laws in the UK and other countries). Speech recognition has application in games such as online poker (multiplayer), where vocal commands can be both heard by the other players, and are (where appropriate) interpreted by the host computer in order to deal more cards, adjust the money staked and so forth.

7. Precision surgery Developments in keyhole and micro surgery have clearly shown that an approach of as little invasive or non- essential surgery as possible increases success rates and patient recovery times. There is occasional speculation in various medical for a regarding the use of speech recognition in precision surgery, where a procedure is partially or totally carried out by automated means. For example, in removing a tumour or blockage without damaging surrounding tissue, a command could be given to make an incision of a precise and small length e.g. 2 millimetres.  However, the legal implications of such technology are a formidable barrier to significant developments in this area. If speech was incorrectly interpreted and e.g. a limb was accidentally sliced off, who would be liable – the surgeon, the surgery system developers, or the speech recognition software developers?

8. Wearable computers Perhaps the most futuristic application is in the use and functionality of wearable computers 2 5 i.e. unobtrusive devices that you can wear like a watch, or are even embedded in your clothes. These would allow people to go about their everyday lives, but still store information (thoughts, notes, to- do lists) verbally, or communicate via email, phone or videophone, through wearable devices. Crucially, this would be done without having to interact with the device, or even remember that it is there; the user would just speak, the device would know what to do with the speech, and would carry out the appropriate task. The rapid miniaturisation of computing devices, the rapid rise in processing power, and advances in mobile wireless technologies, are making these devices more feasible. There are still significant problems, such as background noise and the idiosyncrasies of an individuals language, to overcome. However, it is speculated that reliable versions of such devices will become commercially available during this decade.

References

www.uclan.ac.uk/facs/destech/compute/staff/read/Publish/read.pdf – Measuring the usability of text input methods for children – A comparison of speech recognition and three other methods of data entry. While the research revealed interesting observations about how the systems were used and reacted to input, the authors (as with several similar comparative papers in this field) shied away from an overall this is best conclusion. www.vhml.org/workshops/HF2002/papers/broughton/broughton.pdf – Measuring the accuracy of commercial automated speech recognition systems during conversational speech – for a number of stated factors, the accuracy in the experiment did not match that quoted by the software. www.womengamers.com/revprev/sim/seaman.html – Review of Seaman (2000) for the Sega Dreamcast. reviews.zdnet.co.uk/review/43/2/1605.html – review of Dragon NaturallySpeaking Preferred 6.0 www.uow.edu.au/student/services/uow/VRSoftware.html – a university-authored guide to the training process involved in using the speech recognition software it provides. www.ceangal.com/ www.furui.cs.titech.ac.jp/publication/2000/icassp2000-3735.pdf – Speech recognition technology in the ubiquitous wearable computing environment. An analysis of the potential, and current problems, of wearable computers that could be operated through speech recognition technology.

Search For The related topics

  • computer