Publication:
Comparing phonemes and visemes with DNN-based lipreading
K. Thangthai, Helen L. Bear, R. Harvey • @arXiv • 01 September 2017
TLDR: The phoneme lipreading system word accuracy outperforms the viseme based system word word accuracy, however, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.
Citations: 26
Abstract: There is debate if phoneme or viseme units are the most effective for a lipreading
system. Some studies use phoneme units even though phonemes describe unique short
sounds; other studies tried to improve lipreading accuracy by focusing on visemes with
varying results. We compare the performance of a lipreading system by modeling visual
speech using either 13 viseme or 38 phoneme units. We report the accuracy of our
system at both word and unit levels. The evaluation task is large vocabulary continuous
speech using the TCD-TIMIT corpus. We complete our visual speech modeling via
hybrid DNN-HMMs and our visual speech decoder is aWeighted Finite-State Transducer
(WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The
phoneme lipreading system word accuracy outperforms the viseme based system word
accuracy. However, the phoneme system achieved lower accuracy at the unit level which
shows the importance of the dictionary for decoding classification outputs into words.
Related Fields of Study
loading
Citations
Sort by
Previous
Next
Showing results 1 to 0 of 0
Previous
Next
References
Sort by
Previous
Next
Showing results 1 to 0 of 0
Previous
Next