Diagnostic Assessment of Childhood Apraxia of Speech
Using Automatic Speech Recognition (ASR) Systems

L Shriberg, P Hosom, J Green


We report findings from two feasibility studies using automatic speech recognition (ASR) methods in childhood speech sound disorders. The studies implemented and evaluated the automation of two recently proposed diagnostic markers for suspected Apraxia of Speech (sAOS), termed the Lexical Stress Ratio (LSR) and the Coefficient of Variation Ratio (CVR). The LSR score is a composite of amplitude, frequency, and duration in the stressed compared to the unstressed vowel as obtained from a speakerís productions of eight trochaic word forms. Composite weightings for the three stress parameters were determined from a principal component analysis of vowels in a reference sample. The CVR score expresses the average normalized variability of pause events to speech events, as each event occurred in 24 utterances from a conversational speech sample. We describe the automation procedures used to obtain LSR and CVR scores for four children with sAOS and report comparative findings. The LSR values obtained with ASR were within 1.2% to 6.7% of the LSR values obtained manually. The CVR values obtained with ASR were within 0.7% to 2.7% of the CVR values obtained manually (Matlab).Discussion focuses on the potential strengths and limitations of ASR methods for automated diagnostic assessment of persons with speech sound disorders.