Abstract
This paper discusses and experiments on frame-by-frame speech signal processing and recognition for Field Programmable Gate Array (FPGA) devices. The system proposes applications including a voice conversion system that needs signal processing and speech recognition for each frame because it requires real-time processing at each frame. Owing to the processing speed, the authors propose algorithms for FPGA as a hardware processor for Voice Activity Detection (VAD) and speech recognition decoder. However, resources for FPGA devices as gate circuits are minimal, therefore, the algorithms need to be customized in order to implement the FPGA. The algorithms are customized for VAD using a 2nd-order autocorrelation function, and for speech recognition using Euclidian distance. These methods implement an FPGA emulator that demonstrates VAD of speech and noise sections and a speech recognition experiment for discriminating Japanese vowels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
J. Benesty, M. Sondhi, Y. Huang, Springer Handbook of Speech Processing (Springer, 2008)
T.Z. Hua, L. Boerge, Automatic Speech Recognition on Mobile Devices and Over Communication Networks (Springer, 2008)
D. Yu, L. Deng, Automatic Speech Recognition: A Deep Learning Approach (Springer, 2015)
K. Kokubo, N. Hataoka, T. Lee, T. Kawahara, K. Shikano, Computational reduction of contenious speech recognition software “Julius” on super microprocessor. J. Inf. Process. 50, 2597–2606 (2009) (in Japanese)
C.G. Concejero, V. Rodellar, A.A. Marquina, E. Martinez, P. Gomez, Designing an independent speaker isolated speech recognition system on an FPGA. Res. Microelectron. Electron. 81–84 (2006)
S.P. Nedevischi, R.K. Patra, E.A. Brewer, Hardware speech recognition for user interfaces in low cost, low power devices. Proc. Des. Autom. Conf. 684–689 (2005)
K. Okamoto, H. Tamukoh, M. Sekine, Sound preprocessing circuit by consonant and vowel recognition system. IEICE Technical Report VLD2011-93 (CPSY2011-56, RECONF2011-52), 13–18 (2012) (in Japanese)
S.J. Melnikoff, S.F. Quigley, M.J. Russell, Implementing a simple continuous speech recognition system on an FPGA, in Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 275–276 (2002)
S.J. Melnikoff, S.F. Quigley, M.J. Russell, Speech recognition on an FPGA using continuous hidden Markov models, in Proceedings of 12th International Conference on Field-Programmable Logic and Applications, 201–211 (2002)
M. Nakayama, Japan Patent JP2011-84323 (JP2012-220607A) (2011)
M. Nakayama, N. Shigekawa, T. Yokouchi, Hardware speech recognition system for processing and recognition at moment. IEICE Technical Report, EA2010-99 (2010–12) (2010) (in Japanese)
M. Nakayama, N. Shigekawa, T. Yokouchi, S. Ishimitsu, Frame-by-frame speech recognition as hardware decoding on FPGA devices, in The 9th International Conference on Sensing Technology, ICST 2015, Auckland, New Zealand, 860–863 (2015)
T. Chiba, M. Kajiyama, The Vowel: Its Nature and Structure (Tokyo-Kaiseikan Pub. Co., Ltd., Tokyo, 1941)
B. Kavanagh, The phonemes of Japanese and English: a contrastive analysis study. J. Aomori Univ. Health Welfare 8, 283–292 (2007)
J. Sundberg, The Science of the Singing Voice (Northern Illinois University Press, 1989)
C.T. Herbst, S. Ternström, A comparison of different methods to measure the EGG contact quotient. Logoped. Phoniatr. Vocol. 31, 126–138 (2006)
G. Fant, Acoustic Theory of Speech Production (Mouton & Co., The Hague, Netherlands, 1960)
L.R. Rabiner, On the use of autocorrelation analysis for pitch detection. IEEE Trans. Sig. Process. 25, 24–33 (1977)
LabVIEW, National Instruments Corporation. http://www.ni.com/labview/
K. Kato, K. Fujii, K. Kawai, Y. Ando, T. Yano, Blending vocal music with a given sound field due to the characteristics of the running autocorrelation function of singing voices. J. Acoust. Soc. Am. 115, 2437 (2004)
ATR 503 sentences, Speech Resources Consortium (in Japanese). http://research.nii.ac.jp/src/ATR503.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Nakayama, M., Shigekawa, N., Yokouchi, T., Ishimitsu, S. (2017). Frame-by-Frame Speech Signal Processing and Recognition for FPGA Devices. In: Postolache, O., Mukhopadhyay, S., Jayasundera, K., Swain, A. (eds) Sensors for Everyday Life. Smart Sensors, Measurement and Instrumentation, vol 22. Springer, Cham. https://doi.org/10.1007/978-3-319-47319-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-47319-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47318-5
Online ISBN: 978-3-319-47319-2
eBook Packages: EngineeringEngineering (R0)