BembaSpeech: A Speech Recognition Corpus for the Bemba Language

Claytone Sikasote, Antonios Anastasopoulos • @International Conference on Language Resources and Evaluation • 09 February 2021

TLDR: Results demonstrate that model capacity significantly improves performance and that multilingual pre-trained models transfers cross-lingual acoustic representation better than monolingual pre- trained English model on the BembaSpeech for the BEMba ASR and show that the corpus can be used for building ASR systems for Bembe language.

Citations: 15

Abstract: We present a preprocessed, ready-to-use automatic speech recognition corpus, BembaSpeech, consisting over 24 hours of read speech in the Bemba language, a written but low-resourced language spoken by over 30% of the population in Zambia. To assess its usefulness for training and testing ASR systems for Bemba, we explored different approaches; supervised pre-training (training from scratch), cross-lingual transfer learning from a monolingual English pre-trained model using DeepSpeech on the portion of the dataset and fine-tuning large scale self-supervised Wav2Vec2.0 based multilingual pre-trained models on the complete BembaSpeech corpus. From our experiments, the 1 billion XLS-R parameter model gives the best results. The model achieves a word error rate (WER) of 32.91%, results demonstrating that model capacity significantly improves performance and that multilingual pre-trained models transfers cross-lingual acoustic representation better than monolingual pre-trained English model on the BembaSpeech for the Bemba ASR. Lastly, results also show that the corpus can be used for building ASR systems for Bemba language.

Related Fields of Study

15 Citations No References

Citations

Sort by

Showing results 1 to 0 of 0

References

Sort by

Showing results 1 to 0 of 0