Sentiment-Oriented Retrieval on Climate Textual Data

#AI, #ArtificialIntelligence #NLP

Sentiment oriented Retrieval/ Opinionated Retrievals

In this article we apply the skills of sentiment analysis on the climate corpus we collected sometime back.

Question: Why do we need to do this?

Answer: Once we get documents and text fragments with high positive sentiment and high negative sentiment scores, we get a kind of retrieval and rankings. The ranking is two fold, first we get ranking of sentences in corpus, then we get ranking of documents. This kind of refines the search for most important parts to be the studied and further analyzed. This can be understood as sentiment-oriented Information Retrieval. However, this is not same as Information Retrieval. These retrievals are opinionated retrievals.

Question: What are the outputs of climate analysis with sentiment scores?

Answer: The outputs are filenames and text fragments which have the most sentiment accumulations. These files carry highly opinionated sentences and text fragments and hence are an essential part of retrieval when retrieval is considered from point of view of sentiments. There are two kinds of ranking produced here, first ranking of positive kind and second ranking of negative kind. These parts can be studied further and can be processed again with other NLP tools.

Sentiment Analysis of Climate Corpus

Steps Involved in analyzing the sentiments of documents in climate corpus are:

Step 1. Read in the files from the repository. You can upload the files or read them from drive.

Step 2. For each file read each sentence.

Step 3. For each sentence read each word.

Step 4. Determine the sentiment of each word for the POS tag detected by the sentence tagger in Step 3.

Step 5. Compute the total positive score and total negative score in a sentence

Step 6. Save the details in a dictionary of all sentences, their total positive score, their total negative score, and filename.

Step 7. Sort the dictionary for the most positive scores, and most negative scores.

Step 8. Output the file names and highlight the sentences.


Here are the outputs on the corpus of 66 climate files we are working on. The reason for only 66 files is computation limits as of now.

The outputs are in the following form:

[positive sentiment score, negative sentiment score, sentence/text fragment, filename]

The files in order of importance sentiments of positive kind:

[0.2291, 0.0625, ‘But they are very important.’, ‘/content/drive/MyDrive/IR/32.txt’]

[0.2166, 0.025, ‘Plans must be ambitious, have integrity and transparency, be credible and fair.’, ‘/content/drive/MyDrive/IR/54.txt’]

[0.2142, 0.0, ‘Different places can have different climates.’, ‘/content/drive/MyDrive/IR/1.txt’]

[0.1875, 0.01562, ‘Climate can be different for different seasons.’, ‘/content/drive/MyDrive/IR/1.txt’]

[0.1875, 0.02083, ‘Another important part is information.’, ‘/content/drive/MyDrive/IR/29.txt’]

[0.1666, 0.0, ‘Absolutely yes.’, ‘/content/drive/MyDrive/IR/28.txt’]

[0.1666, 0.02083, ‘That is a good development.’, ‘/content/drive/MyDrive/IR/30.txt’]

[0.16071, 0.0714, ‘There is also a moral imperative.’, ‘/content/drive/MyDrive/IR/26.txt’]

[0.1590, 0.0, ‘And they all have a really important role to play.’, ‘/content/drive/MyDrive/IR/22.txt’]

The files in order of importance sentiments of a negative kind:

[0.02777, 0.1944, ‘Global climate change is not a future problem.’, ‘/content/drive/MyDrive/IR/6.txt’]

[0.0416, 0.1805, ‘Unfortunately, we are in a climate emergency.’, ‘/content/drive/MyDrive/IR/30.txt’]

[0.05, 0.175, ‘Sometimes it is cold.’, ‘/content/drive/MyDrive/IR/1.txt’]

[0.04545 , 0.1590 , ‘Similarly, deforestation and other environmentally destructive activities are disqualifying.’, ‘/content/drive/MyDrive/IR/53.txt’]

[0.025, 0.15, ‘Non-economic loss and damage are negative impacts where it is difficult or infeasible to assign a monetary value to.’, ‘/content/drive/MyDrive/IR/20.txt’]

[0.05, 0.15, ‘What is energy poverty?’, ‘/content/drive/MyDrive/IR/27.txt’]

Analysis & Future Directions

These are documents, the text fragments (sentences), and their associated positive and negative sentiments that are yielded in outputs. Here, someone like you, may be interested to read opinionated documents and a typical Information Retrieval Model may miss out on key sentiments. The documents can be read to be analyzed further. The results can be accumulated with other Retrieval engines to get the best of both sentiment scores and the retrieval algorithms.

Published by Nidhika

Hi, Apart from profession, I have inherent interest in writing especially about Global Issues of Concern, fiction blogs, poems, stories, doing painting, cooking, photography, music to mention a few! And most important on this website you can find my suggestions to latest problems, views and ideas, my poems, stories, novels, some comments, proposals, blogs, personal experiences and occasionally very short glimpses of my research work as well.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: