#AI

In this article, a research-based approach to solving information retrieval of Climate Documents is provided.

Several places suggestions are provided for what can be done to improve the work.

This is an AI Exercise and can be enhanced to be a full project with minor-major changes and workouts. This has been explained in python.

Here, the code starts

pip install nltk
nltk.download('all')

Rest pre-processing the files as in my previous articles. This part is skipped as has been covered in prior writings.

Here, compute the tf-idf vectors for each file in dataset. The dataset has 66 files from the climate dataset.

import pandas as pd 
from sklearn.feature_extraction.text import TfidfVectorizer

#read the documents
vecTfidf = TfidfVectorizer(analyzer='word',stop_words= 'english')
tf_idf =  vecTfidf.fit_transform(docs_vector)

tfidf_features = vecTfidf.get_feature_names()

tf_idfArray = pd.DataFrame(tf_idf.toarray())
print(pd.DataFrame( tf_idfArray))
print(tfidf_features)

The output looks like:

Compute dot product with the query:



import numpy as np
from numpy.linalg import norm
 
 
def cosinesim(array1, array2):  
  # compute cosine similarity
  cosine = np.dot(array1,array2)/(norm(array1)*norm(array2))
  return cosine 

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

queryText = ["No country today is immune from the impacts of climate change."] 

queryVectorArray = queryVector.toarray()
test_document = np.array([tf_idfArray[1]])
one_d_Array = queryVectorArray[0]
 


maxLoc = 0
cos_Sim_Query = []
for i in range(66):
  array_tfidf_ = (tf_idf[i]).toarray()
  cos_Sim_Query.append( cosinesim(array_tfidf_, one_d_Array)[0])

Inputs of Fuzzy System are:

Similarity of query to document
Sentiment score of the document

Output of Fuzzy System is:

Ranking of documents, here rank from 1 to 10 is allocated

From this rank, output is generates.

These are all Fuzzy linguistic variables and they are defined as below.

The Fuzzy Inference Engine is described for the problem as follows:

import numpy as np
import skfuzzy as fuzz
from skfuzzy import control as ctrl

# Antecedent/Consequents functions
similarity_query = ctrl.Antecedent(np.arange(0, 1, .1), 'similarity_query')
sentiment_score = ctrl.Antecedent(np.arange(0, 10, 1), 'sentiment_score')
rank = ctrl.Consequent(np.arange(0, 10, 1), 'rank')

 
similarity_query['low'] = fuzz.trimf(similarity_query.universe, [0, 0, 0.3])
similarity_query['average'] = fuzz.trimf(similarity_query.universe, [0.1, 0.5, 0.7])
similarity_query['high'] = fuzz.trimf(similarity_query.universe, [0.5, 0.8, 1])

 
sentiment_score['low'] = fuzz.trimf(sentiment_score.universe, [0, 0, 2])
sentiment_score['medium'] = fuzz.trimf(sentiment_score.universe, [0 , 4, 6])
sentiment_score['high'] = fuzz.trimf(sentiment_score.universe, [6, 9, 10])

rank['low'] = fuzz.trimf(rank.universe, [0, 0, 5])
rank['medium'] = fuzz.trimf(rank.universe, [1, 5, 7])
rank['high'] = fuzz.trimf(rank.universe, [5, 8, 10])


similarity_query['average'].view()
sentiment_score.view()
rank.view()


rule1 = ctrl.Rule(similarity_query['low'] | sentiment_score['low'], rank['low'])
rule2 = ctrl.Rule(similarity_query['average'] | sentiment_score['medium'], rank['medium'])
rule3 = ctrl.Rule(sentiment_score['high'] | similarity_query['high'], rank['high'])
rule4 = ctrl.Rule(similarity_query['high'] & sentiment_score['low'], rank['medium'])
rule5 = ctrl.Rule(similarity_query['high'] & sentiment_score['medium'], rank['medium'])
rule6 = ctrl.Rule(similarity_query['average'] & sentiment_score['low'], rank['medium'])
rule6 = ctrl.Rule(similarity_query['average'] & sentiment_score['low'], rank['medium'])
rule7 = ctrl.Rule(similarity_query['high'] & sentiment_score['low'], rank['high'])
rule8 = ctrl.Rule(similarity_query['low'] & sentiment_score['high'], rank['high'])
rule9 = ctrl.Rule(similarity_query['low'] & sentiment_score['medium'], rank['medium'])
rule10 = ctrl.Rule(similarity_query['low'] & sentiment_score['low'], rank['low'])

rule1.view()

rankFIS = ctrl.ControlSystem([rule1, rule2, rule3, rule4, rule5, rule6, rule7, rule8, rule9, rule10])
rankFIS = ctrl.ControlSystemSimulation(rankFIS)

The memberships are defined as follows:

Compute the sentiments of each document and store the values per document in an array

from textblob import TextBlob

fileNames = []
docs_vector = []
sentiment_doc= []

for file in document_text:
  fileNames.append(file)
  docs_vector.append(document_text[file])
  sentimentObj = TextBlob(document_text[file])
  print(file)
  print(sentimentObj.sentiment.polarity)
  sentiment_doc.append(sentimentObj.sentiment.polarity)

Sample output:

Computing the rank from Fuzzy Inference Engine

rankFIS_Results = []

for i in range(66):
  rankFIS.input['similarity_query'] = cos_Sim_Query[i]  
  rankFIS.input['sentiment_score'] = sentiment_doc[i]   
  rankFIS.compute()
  print ("the answer is") 
  print(rankFIS.output['rank'])
  rank.view(sim=rankFIS)
  rankFIS_Results.append(rankFIS.output['rank'])

Generating the information retrieval results:

Ranked Documents in outputs are as follows:

Suggestions

The Fuzzy Inference Engine needs to be self-learning from the dataset.
More combinations of Fuzzy Inference System memberships need to be tried.
The sentiment engine needs to be more elaborate in the computations of sentiments.
Rules need to be computed in more extensive way in which more testing can be done.
Rules can be learned automatically from data.
More inputs can be included in this technique.
Other ways to represent documents can be considered. Other than tf-idf.

Fuzzy Inference Engine-based Information Retrieval in Python- Research Exercise

Inputs of Fuzzy System are:

Output of Fuzzy System is:

The Fuzzy Inference Engine is described for the problem as follows:

Suggestions

Published by Nidhika

Leave a comment Cancel reply

Inputs of Fuzzy System are:

Output of Fuzzy System is:

The Fuzzy Inference Engine is described for the problem as follows:

Suggestions

Share this:

Published by Nidhika

Leave a comment Cancel reply