#AI, #ArtificialIntelligence #NLP
Sentiment oriented Retrieval/ Opinionated Retrievals
In this article we apply the skills of sentiment analysis on the climate corpus we collected sometime back.
Question: Why do we need to do this?
Answer: Once we get documents and text fragments with high positive sentiment and high negative sentiment scores, we get a kind of retrieval and rankings. The ranking is two fold, first we get ranking of sentences in corpus, then we get ranking of documents. This kind of refines the search for most important parts to be the studied and further analyzed. This can be understood as sentiment-oriented Information Retrieval. However, this is not same as Information Retrieval. These retrievals are opinionated retrievals.
Question: What are the outputs of climate analysis with sentiment scores?
Answer: The outputs are filenames and text fragments which have the most sentiment accumulations. These files carry highly opinionated sentences and text fragments and hence are an essential part of retrieval when retrieval is considered from point of view of sentiments. There are two kinds of ranking produced here, first ranking of positive kind and second ranking of negative kind. These parts can be studied further and can be processed again with other NLP tools.
Sentiment Analysis of Climate Corpus
Steps Involved in analyzing the sentiments of documents in climate corpus are:
Step 1. Read in the files from the repository. You can upload the files or read them from drive.

Step 2. For each file read each sentence.

Step 3. For each sentence read each word.

Step 4. Determine the sentiment of each word for the POS tag detected by the sentence tagger in Step 3.

Step 5. Compute the total positive score and total negative score in a sentence

Step 6. Save the details in a dictionary of all sentences, their total positive score, their total negative score, and filename.
Step 7. Sort the dictionary for the most positive scores, and most negative scores.
Step 8. Output the file names and highlight the sentences.

Outputs
Here are the outputs on the corpus of 66 climate files we are working on. The reason for only 66 files is computation limits as of now.
The outputs are in the following form:
[positive sentiment score, negative sentiment score, sentence/text fragment, filename]
The files in order of importance sentiments of positive kind:
[0.2291, 0.0625, ‘But they are very important.’, ‘/content/drive/MyDrive/IR/32.txt’]
[0.2166, 0.025, ‘Plans must be ambitious, have integrity and transparency, be credible and fair.’, ‘/content/drive/MyDrive/IR/54.txt’]
[0.2142, 0.0, ‘Different places can have different climates.’, ‘/content/drive/MyDrive/IR/1.txt’]
[0.1875, 0.01562, ‘Climate can be different for different seasons.’, ‘/content/drive/MyDrive/IR/1.txt’]
[0.1875, 0.02083, ‘Another important part is information.’, ‘/content/drive/MyDrive/IR/29.txt’]
[0.1666, 0.0, ‘Absolutely yes.’, ‘/content/drive/MyDrive/IR/28.txt’]
[0.1666, 0.02083, ‘That is a good development.’, ‘/content/drive/MyDrive/IR/30.txt’]
[0.16071, 0.0714, ‘There is also a moral imperative.’, ‘/content/drive/MyDrive/IR/26.txt’]
[0.1590, 0.0, ‘And they all have a really important role to play.’, ‘/content/drive/MyDrive/IR/22.txt’]
The files in order of importance sentiments of a negative kind:
[0.02777, 0.1944, ‘Global climate change is not a future problem.’, ‘/content/drive/MyDrive/IR/6.txt’]
[0.0416, 0.1805, ‘Unfortunately, we are in a climate emergency.’, ‘/content/drive/MyDrive/IR/30.txt’]
[0.05, 0.175, ‘Sometimes it is cold.’, ‘/content/drive/MyDrive/IR/1.txt’]
[0.04545 , 0.1590 , ‘Similarly, deforestation and other environmentally destructive activities are disqualifying.’, ‘/content/drive/MyDrive/IR/53.txt’]
[0.025, 0.15, ‘Non-economic loss and damage are negative impacts where it is difficult or infeasible to assign a monetary value to.’, ‘/content/drive/MyDrive/IR/20.txt’]
[0.05, 0.15, ‘What is energy poverty?’, ‘/content/drive/MyDrive/IR/27.txt’]
Analysis & Future Directions
These are documents, the text fragments (sentences), and their associated positive and negative sentiments that are yielded in outputs. Here, someone like you, may be interested to read opinionated documents and a typical Information Retrieval Model may miss out on key sentiments. The documents can be read to be analyzed further. The results can be accumulated with other Retrieval engines to get the best of both sentiment scores and the retrieval algorithms.