A Comparison of Lexicon-Based and ML-Based Sentiment Analysis: Are There Outlier Words?

Siddhant Jaydeep Mahajani, Shashank Srivastava, A. Smeaton • @arXiv • 10 November 2023

TLDR: The Hedonometer, a lexicon-based technique and Azure, a contemporary machine-learning based approach which is part of the Azure Cognitive Services family of APIs which is easy to use are used.

Citations: 0

Abstract: Lexicon-based approaches to sentiment analysis of text are based on each word or lexical entry having a predefined weight indicating its sentiment polarity. These are usually man-ually assigned but the accuracy of these when compared against machine leaning based approaches to computing sentiment, are not known. It may be that there are lexical entries whose sentiment values cause a lexicon-based approach to give results which are very different to a machine learning approach. In this paper we compute sentiment for more than 150,000 English language texts drawn from 4 domains using the Hedonometer, a lexicon-based technique and Azure, a contemporary machine-learning based approach which is part of the Azure Cognitive Services family of APIs which is easy to use. We model differences in sentiment scores between approaches for documents in each domain using a regression and analyse the independent variables (Hedonometer lexical entries) as indicators of each word's importance and contribution to the score differences. Our findings are that the importance of a word depends on the domain and there are no standout lexical entries which systematically cause differences in sentiment scores.

Related Fields of Study

No Citations No References

Citations

Sort by

Showing results 1 to 0 of 0

References

Sort by

Showing results 1 to 0 of 0