Twitter Sentiment Analysis - BITS Pilani

In the age of social media, understanding public sentiment can offer valuable insights for businesses, researchers, and individuals alike. With its vast and diverse user base, Twitter serves as an excellent sentiment analysis platform. In this tutorial, we will walk through building a Twitter sentiment analysis tool using Python. We’ll utilize the IFTTT library to interact with the Twitter API and Rapidminer for natural language processing.

Setting Up Your Environment

To monitor Twitter for keywords like #BITSPilani, #BITSGoa, #BITSHyd, #BITSDubai, #BITSAA, and #BITS, you can use IFTTT, although it may result in some noise due to the generic ‘BITS’ keyword. Since IFTTT no longer supports live Twitter searches, you can switch to Zapier to create your dataset for analysis. These services will send you emails with tweet data in an easily parsable format. You can also use the Twitter API to crawl and analyze the text directly for a list of tweets.

IFTTT email format

ifttt

via task 618721:
http://ifttt.com/tasks/618721

Agam, the band, with BITSian roots http://t.co/EvIumdmJ http://twitter.com/BITSAA/status/166177110573064192
by http://twitter.com/BITSAA

Usage

Twitter training dataset taken from Thinknook
Parsed and formatted training datasets for 1.5M and .1M tweets has been included.
BITS Pilani Dataset containing tweets for the duration January 20, 2012 to September 27, 2012
Use Rapidminer 5.3 with -Xms2048m -Xmx3072m for faster calculations. Though other models are faster, SVM is really slow and so avoid using more than 0.1 Million dataset.

Results

The SVM model, though slow, shows a balanced performance with a precision of 70.79% for negative tweets and 70.49% for positive tweets. The Naive Bayes model, faster but less precise, shows a precision of 48.27% for negative tweets and 68.24% for positive tweets. Positive tweets generally outnumber negative ones.

Code available at https://github.com/vineetdhanawat/twitter-sentiment-analysis

SVM model (~20 hours)

Performance Vector

	true 0	true 1	class precision
pred. 0	24042	9922	70.79%
pred. 1	19482	46537	70.49%
class recall	55.24%	82.43%

Stats

Top 10 Positive and Negative words

word	weight	word	weight
thank	0.06800427050495744	sad	0.06904954519705979
love	0.04238921785592977	miss	0.06799716497097386
good	0.03864780316342833	sorri	0.06447410364223946
great	0.03332699835307452	wish	0.04964308132602499
quot	0.028049576202737663	suck	0.04549754050714666
welcom	0.028045093611976712	bad	0.03882145370669514
awesom	0.027883840586310205	hate	0.038814744730334146
haha	0.027711586964757735	work	0.038456277249749565
nice	0.026502431781819224	poor	0.03537374379337165
happi	0.024842171425360552	want	0.03312521661076012

Sentiment Ratio

Positive Tweets	4759
Negative Tweets	1552

Naive Bayes (~4 hours)

Performance Vector

	true 0	true 1	class precision
pred. 0	34413	36884	48.27%
pred. 1	9111	19575	68.24%
class recall	79.07%	34.67%

Sentiment Ratio

Positive Tweets	3436
Negative Tweets	2875

BOTS World

Twitter Sentiment Analysis – BITS Pilani

Setting Up Your Environment

Usage

Results

SVM model (~20 hours)

Performance Vector

Stats

Sentiment Ratio

Naive Bayes (~4 hours)

Performance Vector

Sentiment Ratio