This is first in series of evaluations of sentiment scores and applying these in useful applications. In coming articles you can see some applications of the topics introduced. Todays topic is of introduction of Sentiment Analysis. This is not a basic’s session but a review session.
Sentiment Analysis can be defined in several ways. The limits to its use has not been set. Basically, for sentiment analysis, the following holds
1. the sentiment can itself be processed in several ways
2. find the sentiment or opinion orientation as a category
3. the measure of positivity and negativity of text as a measured numerical score
4. to measure the neutrality, as a tag, its measurement and emphasis
5. the text under consideration can be something as short as a tweet or as large as a textual story as well
6. with each sentiment is set a value which itself can be composed in various ways
7. sentiment lexion do contribute in unsupervised approach, but the way this is constructed can be unsupervised or itself supervised, as a possibility
8. sentiments can be used in other computations both in NLP, image data and other textual analytic role
9. sentiment methods once learned, can be used handy in several applications
10. sentiment classifier once learned, can be used to give the sentiment orientations of textual components in short time, as using sentiment lexicon can be a bit clumsy and hence costly in computational time
11. Keyword reductions can fasten the computations of sentiment evaluations
Popular Toolkits in Python for Sentiment Analysis
NLTK has pretrained sentiment measuring toolkit. It is named as Valence Aware Dictionary and Sentiment Reasoner (VADER). VADER libraries have the class named SentimentIntensityAnalyzer, which looks in sentiment related classes, objects and funtions. The method named polarity_scores in class SentimentIntensityAnalyzer, measures the sentiment score of the text fragments send to it. The output is in form of list of values, in pairs, here is a sample output:
{‘neg’: 0.0, ‘neu’: 0.494, ‘pos’: 0.506, ‘compound’: 0.6249}
The output was for the string “This is an awesome story.”
The outputs reveal the following computations:
- Neg – The negativity in text fragment
- Pos – The positivity in this text
- Neu – The degree of neutrality in this text
- Compound – The overall sentiment of the text
But we must know, how these values came in ? VADER was build with help of Amazon Mechanical Turk to get the values of sentiments per word in dictionary.
Sentiment Analysis – Unsupervised
The technique of obtaining the values of sentiment attributes for a words in current scheme of application, wherein no training is provided to the algorithm is referred to as unsupervised technique of sentiment assignments. The method require human based tagging, judgements and can be biased, depending on the human evaluators who do this. These methods typically amounts to generation of sentiment lexicon, these lexions can be bulky and the lexicons can take time in retrieving.
Sentiment Analysis – Supervised
The supervised techniques work on a different platform alltogether. Sentiment Analysis with a supervised algorithm, require the following:
- Supervised algorithm- The supervised algorithm which can learn the classification task
- The classification task- Is it a two class classification for good sentiment or bad sentiment, or positive, negative and neutral sentiments, viz. a three class classification problem, or is it classification of aggregate measure of sentiment taking text as a whole, along with scores.
- The training corposa- These are set of pre-labelled data which are used to train and built a supervised model.
- The testing corpora- These are set of testing data, which are tagged but are used to compute the accuracy of training when the evaluation of supervised model is performed.
Finally evaluation can be performed on any new text fragment. The sentiment values shown in algorithm are subject classification accuries
to p% accuracy, as is tested on the data.
The classification algorithm can be a supervised algorithm such as Naive Bayes or even SVM. Additional as the data is often sparse and high dimensional, often dimentionality reduction is applied as well.
Sentiment Analysis – Semi-Supervised
Sentiment analysis can be performed in a semi supervised technique as well. Here, a part of algorithm that runs use supervised tagged textual data, while the rest of it is use to measure the classification accuracy as well as to assist in more classification abilities.