NLP-KG
Semantic Search

Publication:

Phonologically Informed Edit Distance Algorithms for Word Alignment with Low-Resource Languages

R. Thomas McCoyR. Frank • @Society for Computation in Linguistics • 01 January 2018

TLDR: At this task, the cognate-based scheme outperforms the other meth-ods and the Levenshtein edit distance base-line, showing that NLP applications can ben-efit from information about cross-linguistic phonological patterns.

Citations: 13
Abstract: Edit distance is commonly used to relate cognates across languages. This technique is particularly relevant for the processing of low-resource languages because the sparse data from such a language can be significantly bolstered by connecting words in the low-resource language with cognates in a related, higher-resource language. We present three methods for weighting edit distance algorithms based on linguistic information. These methods base their penalties on (i) phonological features, (ii) distributional character embeddings, or (iii) differences between cognate words. We also introduce a novel method for evaluating edit distance through the task of low-resource word alignment by using edit-distance neighbors in a high-resource pivot language to inform alignments from the low-resource language. At this task, the cognate-based scheme outperforms our other meth-ods and the Levenshtein edit distance base-line, showing that NLP applications can ben-efit from information about cross-linguistic phonological patterns.

Related Fields of Study

loading

Citations

Sort by
Previous
Next

Showing results 1 to 0 of 0

Previous
Next

References

Sort by
Previous
Next

Showing results 1 to 0 of 0

Previous
Next