Genetic Algorithm for Information Retrieval (Python): Introduction

Genetic Algorithm (GA)

Genetic Algorithm is one of the nature inspired algorithm which works on Darwins theory of survival of fittest. The key operators that we have in Genetic Algorithms are

  1. Selection
  2. Crossover
  3. Mutation

These three operators also forms the basis reproduction and subsequent workings in natural process be it be in human biology or in botony and so on.

The terms are self explanatory, but I can explain them in one line each for completion.

  1. Selection: Select the best parents form a population for reproduction
  2. Crossover: Choose two new offsprings from parents by one of the several crossover techniques
  3. Mutation: Flip some bits to explore unexplored solutions to problems or in biological terms to get new features not in parents

Now how we can use these terms in solving the problem of Information Retrieval.

Information Retrieval (IR)

Here, from a set of documents, documents related to a query is to be found. This can be used in search engines or desktop search as well. The documents which are most relevant to the query are produced as outputs.

There are various algorithms to compute similarity between a document and query and then to rank the document based on similarity score. Many kind of similarity are applied here, most popular one in cosine similarity. Several Graph based solutions are produced as well, one that is very famous is PageRank Algorithm.

Information Retrieval with Genetic Algorithm

Here we take the following points:

  1. The documents numbered from 1 to N
  2. Chromosome represented to solution, where 1 represents that gene is a 1, that is a document selected for ranking
  3. The similarity score between query and document, computed per chromosome
  4. Running iterations to get best chromosome, the output, where 1 represents a document is selected and 0 represents that document is not selected.

Code and steps in IR with GA

Similarity Computation is the key factor that drives the GA algorithm and selection of next chromosome.

Here is how we compute similarity between text and query

pip install semantic-text-similarity
pip install geneticalgorithm
from semantic_text_similarity.models import WebBertSimilarity
web_model = WebBertSimilarity(device='cpu', batch_size=10)
def similarityBetweenText(text1, text2):
return web_model.predict([(text1,text2)])

Query initialization

Initialize the query and call the function to compute similarity between query and document

query = "With climate change glaciers are melting"

def similarity(doc):
return similarityBetweenText(query, doc)

Call to this method:

Assuming all documents are loaded in document_text third dimention, we can compute similarity as follows.

You can load documents in any other way that is convenient to you.

simly = similarity(document_text[0][2])
print(simly)

Code to Run GA for IR

The optimization objective function is defined as follows

The following is demo fitness function, this can be updated

import numpy as np
from geneticalgorithm import geneticalgorithm as ga

length = len(document_text)
similarity_i = np.zeros(length)
print("length is : ", length)

def f(combinational_array):
for i in range(20):

if combinational_array[i] >0:
similarity_i[i] = similarity(document_text[i][2])
else:
similarity_i[i] = 0
sum = np.sum(similarity_i)
return -1*sum


The penalty function is here, this will make sure how many documents we have to retrieve. This is a demo penalty function, it can be optimised.

def p(combinational_array):
penality=0
M= 50
N=length/4
if sum(combinational_array) < N:
penality= np.sum(combinational_array))
else:
penalty = M
return penality

Now final call.

combinational_array=np.array([[0,1]]*length)

model=ga(function=f+p,dimension=length,variable_type='int',variable_boundaries=varbound)

model.run()
print(combinational_array)

Published by Nidhika

Hi, Apart from profession, I have inherent interest in writing especially about Global Issues of Concern, fiction blogs, poems, stories, doing painting, cooking, photography, music to mention a few! And most important on this website you can find my suggestions to latest problems, views and ideas, my poems, stories, novels, some comments, proposals, blogs, personal experiences and occasionally very short glimpses of my research work as well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: