In the age of social media, understanding public sentiment can offer valuable insights for businesses, researchers, and individuals alike. With its vast and diverse user base, Twitter serves as an excellent sentiment analysis platform. In this tutorial, we will walk through building a Twitter sentiment analysis tool using Python. We’ll utilize the IFTTT library to interact with the Twitter API and Rapidminer for natural language processing.
Setting Up Your Environment
To monitor Twitter for keywords like #BITSPilani, #BITSGoa, #BITSHyd, #BITSDubai, #BITSAA, and #BITS, you can use IFTTT, although it may result in some noise due to the generic ‘BITS’ keyword. Since IFTTT no longer supports live Twitter searches, you can switch to Zapier to create your dataset for analysis. These services will send you emails with tweet data in an easily parsable format. You can also use the Twitter API to crawl and analyze the text directly for a list of tweets.
IFTTT email format
ifttt
via task 618721:
http://ifttt.com/tasks/618721Agam, the band, with BITSian roots http://t.co/EvIumdmJ http://twitter.com/BITSAA/status/166177110573064192
by http://twitter.com/BITSAA
Usage
- Twitter training dataset taken from Thinknook
- Parsed and formatted training datasets for 1.5M and .1M tweets has been included.
- BITS Pilani Dataset containing tweets for the duration January 20, 2012 to September 27, 2012
- Use Rapidminer 5.3 with -Xms2048m -Xmx3072m for faster calculations. Though other models are faster, SVM is really slow and so avoid using more than 0.1 Million dataset.
Results
The SVM model, though slow, shows a balanced performance with a precision of 70.79% for negative tweets and 70.49% for positive tweets. The Naive Bayes model, faster but less precise, shows a precision of 48.27% for negative tweets and 68.24% for positive tweets. Positive tweets generally outnumber negative ones.
Code available at https://github.com/vineetdhanawat/twitter-sentiment-analysis
SVM model (~20 hours)
Performance Vector
true 0 | true 1 | class precision | |
---|---|---|---|
pred. 0 | 24042 | 9922 | 70.79% |
pred. 1 | 19482 | 46537 | 70.49% |
class recall | 55.24% | 82.43% |
Stats
Top 10 Positive and Negative words
word | weight | word | weight |
---|---|---|---|
thank | 0.06800427050495744 | sad | 0.06904954519705979 |
love | 0.04238921785592977 | miss | 0.06799716497097386 |
good | 0.03864780316342833 | sorri | 0.06447410364223946 |
great | 0.03332699835307452 | wish | 0.04964308132602499 |
quot | 0.028049576202737663 | suck | 0.04549754050714666 |
welcom | 0.028045093611976712 | bad | 0.03882145370669514 |
awesom | 0.027883840586310205 | hate | 0.038814744730334146 |
haha | 0.027711586964757735 | work | 0.038456277249749565 |
nice | 0.026502431781819224 | poor | 0.03537374379337165 |
happi | 0.024842171425360552 | want | 0.03312521661076012 |
Sentiment Ratio
Positive Tweets | 4759 |
Negative Tweets | 1552 |
Naive Bayes (~4 hours)
Performance Vector
true 0 | true 1 | class precision | |
---|---|---|---|
pred. 0 | 34413 | 36884 | 48.27% |
pred. 1 | 9111 | 19575 | 68.24% |
class recall | 79.07% | 34.67% |
Sentiment Ratio
Positive Tweets | 3436 |
Negative Tweets | 2875 |