Abstract
For years researchers have worked toward finding a way to allow people to talk to machines in the same manner a person communicates to another person. This verbal man to machine interface, called speech recognition, can be grouped into three types: isolated word recognition, connected word recognition, and continuous speech recognition. Isolated word recognizers recognize single words with distinctive pauses before and after them. Continuous speech recognizers recognize speech spoken as one person speaks to another, continuously without pauses. Connected word recognition is an extension of isolated word recognition which recognizes groups of words spoken continuously. A group of words must have distinctive pauses before and after it, and the number of words in a group is limited to some small value (typically less than six). If these types of recognition systems are to be successful in the real world, they must be speaker independent and support a large vocabulary. They also must be able to recognize the speech input accurately and in real time. Currently there is no system which can meet all of these criteria because a vast amount of computations are needed. This report examines the use of parallel processing to reduce the computation time for speech recognition. Two different types of parallel architectures are considered here, the Single Instruction stream - Multiple Data (S1MD) machine and the VLSI processor array. The SIMD machine is chosen for its flexibility, which makes it a good candidate for testing new speech recognition algorithms. The VLSI processor array is selected as being good for a dedicated recognition system because of its simple processors and fixed interconnections. This report involves designing SIMD systems and VLSI processor arrays for both isolated and connected word recognition systems. These architectures are evaluated and contrasted in terms of the number of processors needed, the interprocessor connections required, and the “power” each processor needs to achieve real time recognition. The results show that an SIMD machine using 100 processors, each with an MC68000 processor, can recognize isolated words in real time using a 20 KHz sampling rate and a 1,000 word vocabulary.
Date of this Version
12-1-1984
Comments
Phd thesis. This work was supported by National Science Foundation grants ECS-7909016 and ECS- 8120896.