Tech Report
Sentiment Analysis of Microblogs
In the past years, we have witnessed an increased interest in microblogs as a hot research topic in the domain of sentiment analysis and opinion mining. Through platforms like Twitter and Facebook, millions of status updates and tweet messages, which reflect peoples opinions and attitudes, are created and sent every day. This has recently brought great potentials and created unlimited opportunities where companies can detect the level of satisfaction or intensity of complaints about certain products and services and policy makers and politicians are able to detect the public opinions about their policies or political issues.
Sentiment analysis of microblogs faces several major challenges due to the unique characteristics possessed by microblogging services. One challenge is data sparsity. This is because microblogs contain a large number of irregular and ill-formed words due to the length limit. Another challenge is open-domain where users can post about any topic. This forces building sentiment classifier that work independently of the studied domain. Another serious challenge is data dynamics and evolution as microblogs are produced continuously by a large and uncontrolled number of users. This poses very strict constraints where microblogging data should be processed and analysed in real-time.
This report summarises the previous work in microblog sentiment analysis and discusses the major challenges that are yet to be overcome. It then presents my pilot work that has been undertaken so far in which I proposed a novel feature-approach to addressed the data sparsity problem of tweets data. The future plan for the remaining two years is given at the end of the report.