Fuzzy Inference Engine-based Information Retrieval in Python- Research Exercise


In this article, a research-based approach to solving information retrieval of Climate Documents is provided.

Several places suggestions are provided for what can be done to improve the work.

This is an AI Exercise and can be enhanced to be a full project with minor-major changes and workouts. This has been explained in python.

Here, the code starts

pip install nltk

Rest pre-processing the files as in my previous articles. This part is skipped as has been covered in prior writings.

Here, compute the tf-idf vectors for each file in dataset. The dataset has 66 files from the climate dataset.

import pandas as pd 
from sklearn.feature_extraction.text import TfidfVectorizer

#read the documents
vecTfidf = TfidfVectorizer(analyzer='word',stop_words= 'english')
tf_idf = vecTfidf.fit_transform(docs_vector)

tfidf_features = vecTfidf.get_feature_names()

tf_idfArray = pd.DataFrame(tf_idf.toarray())
print(pd.DataFrame( tf_idfArray))

The output looks like:

Compute dot product with the query:

import numpy as np
from numpy.linalg import norm

def cosinesim(array1, array2):
# compute cosine similarity
cosine = np.dot(array1,array2)/(norm(array1)*norm(array2))
return cosine

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

queryText = ["No country today is immune from the impacts of climate change."]

queryVectorArray = queryVector.toarray()
test_document = np.array([tf_idfArray[1]])
one_d_Array = queryVectorArray[0]

maxLoc = 0
cos_Sim_Query = []
for i in range(66):
array_tfidf_ = (tf_idf[i]).toarray()
cos_Sim_Query.append( cosinesim(array_tfidf_, one_d_Array)[0])

Inputs of Fuzzy System are:

  1. Similarity of query to document
  2. Sentiment score of the document

Output of Fuzzy System is:

  1. Ranking of documents, here rank from 1 to 10 is allocated

From this rank, output is generates.

These are all Fuzzy linguistic variables and they are defined as below.

The Fuzzy Inference Engine is described for the problem as follows:

import numpy as np
import skfuzzy as fuzz
from skfuzzy import control as ctrl

# Antecedent/Consequents functions
similarity_query = ctrl.Antecedent(np.arange(0, 1, .1), 'similarity_query')
sentiment_score = ctrl.Antecedent(np.arange(0, 10, 1), 'sentiment_score')
rank = ctrl.Consequent(np.arange(0, 10, 1), 'rank')

similarity_query['low'] = fuzz.trimf(similarity_query.universe, [0, 0, 0.3])
similarity_query['average'] = fuzz.trimf(similarity_query.universe, [0.1, 0.5, 0.7])
similarity_query['high'] = fuzz.trimf(similarity_query.universe, [0.5, 0.8, 1])

sentiment_score['low'] = fuzz.trimf(sentiment_score.universe, [0, 0, 2])
sentiment_score['medium'] = fuzz.trimf(sentiment_score.universe, [0 , 4, 6])
sentiment_score['high'] = fuzz.trimf(sentiment_score.universe, [6, 9, 10])
rank['low'] = fuzz.trimf(rank.universe, [0, 0, 5])
rank['medium'] = fuzz.trimf(rank.universe, [1, 5, 7])
rank['high'] = fuzz.trimf(rank.universe, [5, 8, 10])


rule1 = ctrl.Rule(similarity_query['low'] | sentiment_score['low'], rank['low'])
rule2 = ctrl.Rule(similarity_query['average'] | sentiment_score['medium'], rank['medium'])
rule3 = ctrl.Rule(sentiment_score['high'] | similarity_query['high'], rank['high'])
rule4 = ctrl.Rule(similarity_query['high'] & sentiment_score['low'], rank['medium'])
rule5 = ctrl.Rule(similarity_query['high'] & sentiment_score['medium'], rank['medium'])
rule6 = ctrl.Rule(similarity_query['average'] & sentiment_score['low'], rank['medium'])
rule6 = ctrl.Rule(similarity_query['average'] & sentiment_score['low'], rank['medium'])
rule7 = ctrl.Rule(similarity_query['high'] & sentiment_score['low'], rank['high'])
rule8 = ctrl.Rule(similarity_query['low'] & sentiment_score['high'], rank['high'])
rule9 = ctrl.Rule(similarity_query['low'] & sentiment_score['medium'], rank['medium'])
rule10 = ctrl.Rule(similarity_query['low'] & sentiment_score['low'], rank['low'])


rankFIS = ctrl.ControlSystem([rule1, rule2, rule3, rule4, rule5, rule6, rule7, rule8, rule9, rule10])
rankFIS = ctrl.ControlSystemSimulation(rankFIS)

The memberships are defined as follows:

Compute the sentiments of each document and store the values per document in an array

from textblob import TextBlob

fileNames = []
docs_vector = []
sentiment_doc= []

for file in document_text:
sentimentObj = TextBlob(document_text[file])

Sample output:


Computing the rank from Fuzzy Inference Engine

rankFIS_Results = []

for i in range(66):
rankFIS.input['similarity_query'] = cos_Sim_Query[i]
rankFIS.input['sentiment_score'] = sentiment_doc[i]
print ("the answer is")

Generating the information retrieval results:

Ranked Documents in outputs are as follows:


  1. The Fuzzy Inference Engine needs to be self-learning from the dataset.
  2. More combinations of Fuzzy Inference System memberships need to be tried.
  3. The sentiment engine needs to be more elaborate in the computations of sentiments.
  4. Rules need to be computed in more extensive way in which more testing can be done.
  5. Rules can be learned automatically from data.
  6. More inputs can be included in this technique.
  7. Other ways to represent documents can be considered. Other than tf-idf.

Published by Nidhika

Hi, Apart from profession, I have inherent interest in writing especially about Global Issues of Concern, fiction blogs, poems, stories, doing painting, cooking, photography, music to mention a few! And most important on this website you can find my suggestions to latest problems, views and ideas, my poems, stories, novels, some comments, proposals, blogs, personal experiences and occasionally very short glimpses of my research work as well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: