Sentiment Analysis for Arabizi: A multilingual jargon in Social Media
This event took place on Thursday 25 May 2017 at 12:00
Arabizi is a portmanteau for the words Arabic and Englizi (meaning English), it is a linguistic phenomenon where Arab natives express their dialectal mother tongue in Latinscript text using alphanumeral to represent Arabic phonemes that are non-existent in Latin such as the term 7abibi (my darling) where the number 7 is used as a transcription for a voiceless fricative Arabic letter that sounds like a soft 'h'. Several researchers working in Arabic NLP have filtered out Arabizi text from their datasets due to the challenges associated with the nature of this texting language. In this talk, I will mention the challenges that makes sentiment analysis for Arabizi a non-trivial task. I will discuss a pilot case study on the percentage of using Arabizi in Twitter across 2 countries. I will demonstrate a method that we created to detect Arabizi from within multilingual streams of data. Finally, I will present the results of a lexicon-based sentiment analysis approach using SenZi, a novel Arabizi lexicon.
Watch the webcast replay >>