π Github
π Report
Description
Abstract β Automatic Speech Recognition (ASR) systems are becoming part of daily life. From live captioning to automatic dictation, we have all used ASR in our lives. This report discusses a set of experiments on weighted finite state transducers (WFST) and Viterbi decoder, which are crucial parts of speech recognition systems. We probe WFSTs and Viterbi Decoder in a wide range of experiments and observe their effect on speech recognition by observing the change in accuracy and computational efficiency.
Citations
[1] Dempster et al. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via theem algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1β38.
[2] Ortmanns et al. Ortmanns, S., Ney, H., and Eiden, A. (1996a). Language-model look-ahead for large vocabulary speech recognition. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP β96, volume 4, pages 2095β2098 vol.4.
[3] Ortmanns et al. Ortmanns, S., Ney, H., Eiden, A., and Coenen Lehrstuhl, N. (1996b). Look-ahead techniques for improved beam.
[4] Riley et al. Riley, M., Allauzen, C., and Jansche, M. (2009). OpenFst: An open-source, weighted finite-state transducer library and its applications to speech and language. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts, pages 9β10, Boulder, Colorado. Association for Computational Linguistics.