Speech Research Laboratory (SRL)

Overview
Efforts in the Speech Research Laboratory focus on two main topics: 1) the application of speech technology to the diagnosis and treatment of speech and hearing disorders, and 2) the development of speech synthesis technology for augmentative communication. The following currently active projects in our laboratory that address various facets of these topics.

Improving Speech Synthesis for Communication

This project encompasses development and testing of a concatenative text-to-speech (TTS) synthesis system (known as ModelTalker) and software that guides individuals through the process of creating a personal synthetic voice for use with ModelTalker, called ModelTalker Voice Recorder (MTVR). The overall system is intended to be of particular interest to Augmentative and Alternative Communication (AAC) device users who depend upon speech synthesis for communication. In addition to improved naturalness and intelligibility, the ModelTalker and MTVR software uniquely allows users to rapidly develop personal concatenative synthesis voices. With MTVR, individuals, such as those with ALS who are at risk of losing the ability to speak, can record their own speech for conversion to a personal synthetic voice for the ModelTalker TTS system. This voice banking capability has already been used successfully by many ALS patients. Moreover, children who are unable to speak can have personal voices recorded by other children to create gender and age appropriate voices.
The specific aims of this NIH-funded project are:

  • Aim 1 - Enhance the usability of MTVR. We have identified several specific improvements for the MTVR program that will (a) improve general ease of use, (b) improve accessibility for visually impaired users, and (c) simplify the speech recording process, especially for young users and users with limited vocabulary and literacy skills.
  • Aim 2 - Implement and test novel speech processing techniques to improve the robustness of our automatic voice construction process while reducing the storage requirements needed for the speech database.
  • Aim 3 - Evaluate the quality (intelligibility and acceptability) of automatically generated voices for a representative sample of individuals who will benefit from this technology recruited through speech clinics around the country.

Automating Functional Hearing Evaluation
In this project, a collaboration with the House Ear Institute and Compreval, we will examine the feasibility of a new hearing test instrument. It will use automated speech recognition (SR) to evaluate responses of subjects in the Hearing in Noise Test (HINT).
Adaptive speech intelligibility tests that measure the sentence recognition threshold (SRT) in quiet or noise have become widely used for functional hearing assessment in both clinical and occupational health settings. The HINT (Nilsson et al., 1994) is the most widely accepted. It has been developed in over a dozen languages for use in evaluations of auditory prostheses, i.e., cochlear implants, middle ear implants, bone-anchored hearing aids, and air conduction hearing aids. The HINT is also used in the US and Canada in occupational health to screen individuals for hearing-critical jobs in law enforcement and public safety. Such diverse usage of the HINT raises questions about the consistency of SRT measures obtained by the wide range of human observers who must make judgments about the accuracy with which each sentence is recognized and repeated. Inconsistent judgments can reduce the reliability and accuracy of the assessment.
This research will demonstrate the feasibility of using speech recognition (SR) technology to score HINT and to control the adaptive test protocol. If SR technology proves to be sufficiently accurate for this specialized application, it can be substituted for the human observer, maintaining or perhaps improving the consistency of SRT measures and the accuracy of the functional hearing assessment. Further research can then develop the prototype of this innovative hearing test instrument that will be marketed by Compreval, Inc.

Pediatric Aural Rehabilitation
Many deaf infants, particularly those born to hearing parents, receive Cochlear Implants (CIs) at around one year of age. At the Alfred I. duPont Hospital for Children (AIDHC), infants who receive an implant are typically followed closely by the CI team and undergo intensive auditory/verbal therapy for approximately two years. Therapy for these children involves two sessions per week of clinical A/V therapy at AIDHC, and additionally, children's parents are taught exercises and therapy techniques for use at home to ensure that children receive adequate exposure to and experience with auditory stimulation.
Throughout the two years of intense therapy, it is essential to monitor a child's progress in terms of both their speech reception and speech production, adapting therapy to the needs of each child as an individual. Unfortunately, objective measures of progress are often not well defined and quantitative measures of performance can be difficult to obtain. This is especially true of progress with exercises at home.
We see great potential for the use of computer-based techniques to assist with hearing and speech habilitation for children who have received CIs. Well-crafted interactive software may allow therapists to monitor activity of CI children more closely whether they are at home or in the clinic. Colorful programs with enjoyable activities may engage a child's attention in ways that complement standard therapeutic techniques. Multimodal software may afford objective and quantifiable measures of performance and progress by logging a child's manual responses to stimulation and recording a child's vocal responses to allow acoustic measurements of those utterances.
The present work is intended to address the expanding demand for habilitation in very young children receiving CIs by providing software for speech (and other) sound discrimination and speech production training in young CI children. We will extend software called STAR (which stood for Speech Training, Assessment, and Remediation) that was previously developed and tested in the SRL for this new application (Bunnell, Yarrington, & Polikoff, 2000). The new version, renamed Speech Training and Auditory Rehabilitation (still STAR), is intended to work as an adjunct to conventional A/V therapy by providing drill, monitoring progress, and assisting in record keeping and reporting. Our goal is to develop software a young child will be able to use--perhaps with parental assistance--on a home computer, interacting with the software in ways that are appropriate to the child's auditory, verbal, and cognitive skill levels. Children who successfully develop good auditory and spoken skills may be able to interact conversationally with the software via an animated computer character. Since the system will be constantly eliciting responses from a child and logging those responses--including vocal responses--it will be capable of extensive record keeping and report generation, further assisting clinical staff in their duties.

Applying TTS Technology to Aural Rehabilitation
In this project we will develop new tools based on the ModelTalker TTS system and affiliated speech recording software that will allow us to create synthetic voices designed for optimal use in Aural Rehabilitation (AR) applications. ModelTalker is one of a new generation of TTS systems that produce synthetic speech by concatenating snippets of recorded natural speech. The result is synthetic speech that can be very natural sounding, especially when compared to older TTS technology. Since the quality--in terms of both naturalness and intelligibility--of concatenative TTS systems is largely a function of the extent to which the recorded speech material used to generate the synthetic "voice" covers the domain of utterances to be synthesized, we will systematically explore how TTS quality affects the perception of synthetic speech by aided HI listeners and what level of TTS quality is necessary to support AR applications. These studies will inform the voice generation process described here.