Genetic Algorithm (GA)

Genetic Algorithm is one of the nature inspired algorithm which works on Darwins theory of survival of fittest. The key operators that we have in Genetic Algorithms are

Selection
Crossover
Mutation

These three operators also forms the basis reproduction and subsequent workings in natural process be it be in human biology or in botony and so on.

The terms are self explanatory, but I can explain them in one line each for completion.

Selection: Select the best parents form a population for reproduction
Crossover: Choose two new offsprings from parents by one of the several crossover techniques
Mutation: Flip some bits to explore unexplored solutions to problems or in biological terms to get new features not in parents

Now how we can use these terms in solving the problem of Information Retrieval.

Information Retrieval (IR)

Here, from a set of documents, documents related to a query is to be found. This can be used in search engines or desktop search as well. The documents which are most relevant to the query are produced as outputs.

There are various algorithms to compute similarity between a document and query and then to rank the document based on similarity score. Many kind of similarity are applied here, most popular one in cosine similarity. Several Graph based solutions are produced as well, one that is very famous is PageRank Algorithm.

Information Retrieval with Genetic Algorithm

Here we take the following points:

The documents numbered from 1 to N
Chromosome represented to solution, where 1 represents that gene is a 1, that is a document selected for ranking
The similarity score between query and document, computed per chromosome
Running iterations to get best chromosome, the output, where 1 represents a document is selected and 0 represents that document is not selected.

Code and steps in IR with GA

Similarity Computation is the key factor that drives the GA algorithm and selection of next chromosome.

Here is how we compute similarity between text and query

pip install semantic-text-similarity
pip install geneticalgorithm

from semantic_text_similarity.models import WebBertSimilarity
web_model = WebBertSimilarity(device='cpu', batch_size=10)

def similarityBetweenText(text1, text2):
  return web_model.predict([(text1,text2)])

Query initialization

Initialize the query and call the function to compute similarity between query and document

query = "With climate change glaciers are melting"

def similarity(doc):
  return  similarityBetweenText(query, doc)

Call to this method:

Assuming all documents are loaded in document_text third dimention, we can compute similarity as follows.

You can load documents in any other way that is convenient to you.

simly = similarity(document_text[0][2])
print(simly)

Code to Run GA for IR

The optimization objective function is defined as follows

The following is demo fitness function, this can be updated

import numpy as np
from geneticalgorithm import geneticalgorithm as ga

length = len(document_text)
similarity_i = np.zeros(length)
print("length is : ", length)

def f(combinational_array): 
  for i in range(20):
   
    if combinational_array[i] >0:
      similarity_i[i] = similarity(document_text[i][2])
    else:
      similarity_i[i] = 0
  sum = np.sum(similarity_i)
  return  -1*sum

The penalty function is here, this will make sure how many documents we have to retrieve. This is a demo penalty function, it can be optimised.

def p(combinational_array):
    penality=0 
    M= 50
    N=length/4
    if sum(combinational_array) < N:
        penality= np.sum(combinational_array))
    else:
        penalty = M
    return penality

Now final call.

combinational_array=np.array([[0,1]]*length)

model=ga(function=f+p,dimension=length,variable_type='int',variable_boundaries=varbound)

model.run()
print(combinational_array)

Genetic Algorithm for Information Retrieval (Python): Introduction

Genetic Algorithm (GA)

Information Retrieval (IR)

Information Retrieval with Genetic Algorithm

Code and steps in IR with GA

Call to this method:

Code to Run GA for IR

Published by Nidhika

Leave a comment Cancel reply

Genetic Algorithm (GA)

Information Retrieval (IR)

Information Retrieval with Genetic Algorithm

Code and steps in IR with GA

Call to this method:

Code to Run GA for IR

Share this:

Published by Nidhika

Leave a comment Cancel reply