# Genetic Algorithm (GA)

Genetic Algorithm is one of the nature inspired algorithm which works on Darwins theory of survival of fittest. The key operators that we have in Genetic Algorithms are

1. Selection
2. Crossover
3. Mutation

These three operators also forms the basis reproduction and subsequent workings in natural process be it be in human biology or in botony and so on.

The terms are self explanatory, but I can explain them in one line each for completion.

1. Selection: Select the best parents form a population for reproduction
2. Crossover: Choose two new offsprings from parents by one of the several crossover techniques
3. Mutation: Flip some bits to explore unexplored solutions to problems or in biological terms to get new features not in parents

Now how we can use these terms in solving the problem of Information Retrieval.

# Information Retrieval (IR)

Here, from a set of documents, documents related to a query is to be found. This can be used in search engines or desktop search as well. The documents which are most relevant to the query are produced as outputs.

There are various algorithms to compute similarity between a document and query and then to rank the document based on similarity score. Many kind of similarity are applied here, most popular one in cosine similarity. Several Graph based solutions are produced as well, one that is very famous is PageRank Algorithm.

# Information Retrieval with Genetic Algorithm

Here we take the following points:

1. The documents numbered from 1 to N
2. Chromosome represented to solution, where 1 represents that gene is a 1, that is a document selected for ranking
3. The similarity score between query and document, computed per chromosome
4. Running iterations to get best chromosome, the output, where 1 represents a document is selected and 0 represents that document is not selected.

# Code and steps in IR with GA

Similarity Computation is the key factor that drives the GA algorithm and selection of next chromosome.

Here is how we compute similarity between text and query

`pip install semantic-text-similaritypip install geneticalgorithm`
`from semantic_text_similarity.models import WebBertSimilarityweb_model = WebBertSimilarity(device='cpu', batch_size=10) `
`def similarityBetweenText(text1, text2):  return web_model.predict([(text1,text2)])`

Query initialization

Initialize the query and call the function to compute similarity between query and document

`query = "With climate change glaciers are melting"def similarity(doc):  return  similarityBetweenText(query, doc)`

## Call to this method:

Assuming all documents are loaded in document_text third dimention, we can compute similarity as follows.

You can load documents in any other way that is convenient to you.

`simly = similarity(document_text)print(simly)`

## Code to Run GA for IR

The optimization objective function is defined as follows

The following is demo fitness function, this can be updated

`import numpy as npfrom geneticalgorithm import geneticalgorithm as galength = len(document_text)similarity_i = np.zeros(length)print("length is : ", length)def f(combinational_array):   for i in range(20):       if combinational_array[i] >0:      similarity_i[i] = similarity(document_text[i])    else:      similarity_i[i] = 0  sum = np.sum(similarity_i)  return  -1*sum `

The penalty function is here, this will make sure how many documents we have to retrieve. This is a demo penalty function, it can be optimised.

`def p(combinational_array):    penality=0     M= 50    N=length/4    if sum(combinational_array) < N:        penality= np.sum(combinational_array))    else:        penalty = M    return penality`

Now final call.

`combinational_array=np.array([[0,1]]*length)model=ga(function=f+p,dimension=length,variable_type='int',variable_boundaries=varbound)model.run()print(combinational_array)`