ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment

Tarek Naous, Michael Joseph Ryan, Mohit Chandra, Wei Xu • @arXiv • 23 May 2023

TLDR: This work constructs ReadMe++, a multilingual multi-domain dataset with human annotations of 9757 sentences in Arabic, English, French, Hindi, and Russian collected from 112 different data sources, making it ideal for benchmarking multilingual and non-English language models.

Citations: 1

Abstract: We present a systematic study and comprehensive evaluation of large language models for automatic multilingual readability assessment. In particular, we construct ReadMe++, a multilingual multi-domain dataset with human annotations of 9757 sentences in Arabic, English, French, Hindi, and Russian collected from 112 different data sources. ReadMe++ offers more domain and language diversity than existing readability datasets, making it ideal for benchmarking multilingual and non-English language models (including mBERT, XLM-R, mT5, Llama-2, GPT-4, etc.) in the supervised, unsupervised, and few-shot prompting settings. Our experiments reveal that models fine-tuned on ReadMe++ outperform those trained on single-domain datasets, showcasing superior performance on multi-domain readability assessment and cross-lingual transfer capabilities. We also compare to traditional readability metrics (such as Flesch-Kincaid Grade Level and Open Source Metric for Measuring Arabic Narratives), as well as the state-of-the-art unsupervised metric RSRS (Martinc et al., 2021). We will make our data and code publicly available at: https://github.com/tareknaous/readme.

Related Fields of Study

1 Citations No References

Citations

Sort by

Showing results 1 to 0 of 0

References

Sort by

Showing results 1 to 0 of 0