In the age of social media, understanding public sentiment can offer valuable insights for businesses, researchers, and individuals alike. With its vast and diverse user base, Twitter serves as an excellent sentiment analysis platform. In this tutorial, we will walk through building a Twitter sentiment analysis tool using Python. We’ll utilize the IFTTT library to interact with the Twitter API and Rapidminer for natural language processing.

Setting Up Your Environment

To monitor Twitter for keywords like #BITSPilani, #BITSGoa, #BITSHyd, #BITSDubai, #BITSAA, and #BITS, you can use IFTTT, although it may result in some noise due to the generic ‘BITS’ keyword. Since IFTTT no longer supports live Twitter searches, you can switch to Zapier to create your dataset for analysis. These services will send you emails with tweet data in an easily parsable format. You can also use the Twitter API to crawl and analyze the text directly for a list of tweets.

IFTTT email format

ifttt

via task 618721:
http://ifttt.com/tasks/618721

Agam, the band, with BITSian roots http://t.co/EvIumdmJ http://twitter.com/BITSAA/status/166177110573064192
by http://twitter.com/BITSAA

Usage

  1. Twitter training dataset taken from Thinknook
  2. Parsed and formatted training datasets for 1.5M and .1M tweets has been included.
  3. BITS Pilani Dataset containing tweets for the duration January 20, 2012 to September 27, 2012
  4. Use Rapidminer 5.3 with -Xms2048m -Xmx3072m for faster calculations. Though other models are faster, SVM is really slow and so avoid using more than 0.1 Million dataset.

Results

The SVM model, though slow, shows a balanced performance with a precision of 70.79% for negative tweets and 70.49% for positive tweets. The Naive Bayes model, faster but less precise, shows a precision of 48.27% for negative tweets and 68.24% for positive tweets. Positive tweets generally outnumber negative ones.

Code available at https://github.com/vineetdhanawat/twitter-sentiment-analysis

SVM model (~20 hours)

Performance Vector

true 0true 1class precision
pred. 024042992270.79%
pred. 1194824653770.49%
class recall55.24%82.43%

Stats

Top 10 Positive and Negative words

wordweightwordweight
thank0.06800427050495744sad0.06904954519705979
love0.04238921785592977miss0.06799716497097386
good0.03864780316342833sorri0.06447410364223946
great0.03332699835307452wish0.04964308132602499
quot0.028049576202737663suck0.04549754050714666
welcom0.028045093611976712bad0.03882145370669514
awesom0.027883840586310205hate0.038814744730334146
haha0.027711586964757735work0.038456277249749565
nice0.026502431781819224poor0.03537374379337165
happi0.024842171425360552want0.03312521661076012

Sentiment Ratio

Positive Tweets4759
Negative Tweets1552

Naive Bayes (~4 hours)

Performance Vector

true 0true 1class precision
pred. 0344133688448.27%
pred. 191111957568.24%
class recall79.07%34.67%

Sentiment Ratio

Positive Tweets3436
Negative Tweets2875