Authors will get Hard copy of International certificates of publication and e-Certificates, Submit Manuscript Online

AUTOMATIC TEXT INDEPENDENT LANGUAGE IDENTIFICATION

   A.NAGESH-India

Abstract

Automatic language identification (LID) is the task of identifying the language of a given utterance of speech using a machine. It is gaining increased importance in the context of economic globalization.
The conventional language identification system requires difficult and time consuming labeling process of phoneme boundaries of the utterances in the speech corpus. In the present work, an attempt is made to develop an automatic language identification system that does not require labeled speech corpus or linguistic information of the target languages.
Till now, for conventional LID systems, features are extracted using Mel-frequency Cepstral Coefficients (MFCC). Although the importance of acoustic-phonetics information in the speech has been realized, an attempt has been made to extract new features from the speech signal. This new method of feature extraction is based on the principle that the frequency of occurrence of phonemes is different in different languages. In this work, the probability of each feature vector in the acoustic class is computed. New type of feature vectors are proposed which capture the variations in the frequency of occurrence of phonemes across the languages effectively. Based on these proposed new feature vectors, LID systems are built, namely, new features based LID system using Gaussian Mixture Models (GMM) and new features based LID system using hidden Markov Models (HMM).
The performance evaluation of three LID systems, MFCC features based LID system using Vector Quantization (VQ), new features based GMM and HMM LID systems are presented. It is established that, the identification performance of LID system using HMM has out performed the identification performance of LID systems using GMM and VQ.
It is also established that, based on these new type of features, HMM based LID system has achieved significant improvement in the identification performance than the conventional features (MFCC) based LID systems. The experiments were carried out on Oregon Graduate Institute Multi-language Telephone (OGI_MLT) speech corpus.



Paper Title
Main Area Of Research
Name Of the Author
Contact Number
Author Emails
Upload Manuscript:
Note: Please submit your manuscript in doc or docx format only.
How did you know about IJETR.org ?

Advertisement

Advertisement

Come advertise with us and get World wide recognization

CALL FOR PAPERS.....

You can submit your research paper directly to editor.ijetr@gmail.com for review

IJETR Journal Hard copy