Deprecated: $wgMWOAuthSharedUserIDs=false is deprecated, set $wgMWOAuthSharedUserIDs=true, $wgMWOAuthSharedUserSource='local' instead [Called from MediaWiki\HookContainer\HookContainer::run in /var/www/html/w/includes/HookContainer/HookContainer.php at line 135] in /var/www/html/w/includes/Debug/MWDebug.php on line 372
Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli - MaRDI portal

Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli (Q1424523)

From MaRDI portal





scientific article; zbMATH DE number 2058707
Language Label Description Also known as
English
Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
scientific article; zbMATH DE number 2058707

    Statements

    Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    16 March 2004
    0 references
    Summary: We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lip reading. The objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker's lip movements. We consider the case of an additive stationary mixture of decorrelated sources with no further assumptions on independence or non-Gaussian character. Firstly, we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audiovisual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input learns to quantify the audio-visual coherence, separation can be achieved by maximizing this probability. Finally, we present a number of separation results on a corpus of vowel-plosive-vowel sequences uttered by a single speaker, embedded in a mixture of other voices. We show that separation can be quite good for mixtures of 2, 3, and 5 sources. These results, while very preliminary, are encouraging, and are discussed with respect to their potential complementarity with traditional pure audio separation or enhancement techniques.
    0 references
    blind source separation
    0 references
    audio-visual speech processing
    0 references
    multiple speech signals
    0 references

    Identifiers