Forced-Alignment and Edit-Distance Scoring for Vocabulary Tutoring Applications

  • Serguei Pakhomov
  • Jayson Richardson
  • Matt Finholt-Daniel
  • Gregory Sales
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5246)

Abstract

We demonstrate an application of Automatic Speech Recognition (ASR) technology to the assessment of young children’s basic English vocabulary. We use a test set of 2935 speech samples manually rated by 3 reviewers to compare several approaches to measuring and classifying the accuracy of the children’s pronunciation of words, including acoustic confidence scoring obtained by forced alignment and edit distance between the expected and actual ASR output. We show that phoneme-level language modeling can be used to obtain good classification results even with a relatively small amount of acoustic training data. The area under the ROC curve of the ASR-based classifier that uses a bi-phone language model interpolated with a general English bi-phone model is 0.80 (95% CI 0.78–0.82). The point where both sensitivity and specificity are at their maximum is where sensitivity is 0.74 and the specificity is 0.80 with 0.77 harmonic mean, which is comparable to human performance (ICC=0.75; absolute agreement = 81%).

Keywords

Automatic speech recognition vocabulary tutor sub-word language modeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Serguei Pakhomov
    • 1
  • Jayson Richardson
    • 2
  • Matt Finholt-Daniel
    • 3
  • Gregory Sales
    • 3
  1. 1.University of MinnesotaMinneapolis 
  2. 2.University of North CarolinaWilmington 
  3. 3.Seward IncorporatedMinneapolis 

Personalised recommendations