Speech Separation Based on Time Frequency Ratio of Mixtures and Track Identification
Analysis of non stationary signals like audio, speech and biomedical signals require good resolution both in time and frequency as their spectral components are not fixed. There are many applications of time frequency analysis in non stationary signals like source separation, signal denoising, automatic gain control, speaker recognition etc. This paper presents an application of time frequency analysis using STFT, Short Time Fourier Transform in speech and audio separation. This method is known as Blind Source Separation. The method is blind since the information about the sources and mixing type is not available. The method uses relative amplitude information and time frequency ratios of audio and speech mixtures in time frequency domain and ideal binary mask of source signals. A mixture of male speech, female speech and tones of musical instruments are considered for the separation first with a strong mixing matrix and next with a weak mixing matrix.
KeywordsShort time Fourier transform Binary masking Automatic speech recognition Time–frequency domain Ideal mask Ratio of mixtures
- 3.Araki S, Makino S, Sawada H, Mukai R (2004) Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ica. Fifth international conference on independent component analysis and blind signal separation, pp 898–905Google Scholar
- 4.Torkkola K (1996) Blind separation of convolved sources based on information maximization. IEEE Worshop on neural networks for signal processing, Kyoto, pp 423–432Google Scholar