In this article we discuss Introduction to Text Summarization. Text Summarization as an AI Application and in particular as an NLP application. It is a field in NLP and it has huge applications. Here I will be discussing about:
- What is Text Summarization.
- What is scope of Text Summarization
- Types of Text Summarization
- Characterization of Text Summarization
- Future applications of Text Summarization- Summarization of Books, Scriptures, and really huge data.
- Brief overview of how we can solve problems of Text Summarization at hand – In next Article
Briefly Text Summarization as phrase is consisting of two words – Text and Summarization. So it means we are summarizing the text in a shorter form, which means condensing the text. We are reducing the amount of original text by a great amount with essential information retained. The given input text is reduced to a lesser amount, in other words is compressed as per the requirement at the user end.
Now, why are we reducing it? The aim is to save our time. The time of ours that we save, can go in other essential tasks. There is plethora of information available on internet, severs, disks, and data centres, a text summarizing system can really help in reducing the amount of text to be read. The text to be read can be of attention to the following:
- Saving human time to read text
- Saving machine time to process the information- if a summary is instead fed into machine.
- Even the communication time can be saved, given the information is now in form of a shorter version, hence low on many other factors, communication being one. This includes communication of any forms.
There is plethora of information available all over books, datacentres, internet…We need to know in short what is the summary of a particular book, or a particular movie, not limited to these.
This text summarization can be summarizing one-page, multiple documents, one book, or multiple books. A good example of why we need Text Summarization, which compresses text. This condensation is not just picking any sentence from the text. No, it means reducing the data size which is input but which is: concise, precise and covers (hence, spans) the whole document, or collection of documents/books given to it as input.
Text Summarization can be characterized in following ways:
- User Specific and Generic
This gives user specific view of input text data, while generic Text Summarization toolkits do not take any bias in considerations. For example, given an input one person may be a History Professor while another person may be a Science person, and we have in our software method to store the user preferences, which are manually added or are taken from social media profile, or may be from the browsing history data. This way user-specific information can be taken into account.
2. Keyword Based or Generic
Keywords are explicitly provided to the summarization toolkit to make the outputs vary as per keywords mentioned in the input, for example, in a football match-based documents what does your software want to focus on – the audience, the goals, or the field, may be some exciting parts in soccer match? So, these can be provided as keywords say “the audience”.
3. Indicative or Informative
Here either a glimpse of the text is provided, most important parts and in informative, most of the text data’s essential and complete summary is provided, one that covers(hence, spans) the text data.
4. Single Document versus Multi Document Summarization (MDS)
When a single document is there the summary produced has typically the same content. However, when multiple documents are present then there are various concerns. Where can we have used MDS? For example, in case of a news say latest news on “Olympics Skating Issue in China, Feb, 2022” Then each reporter will report a news on it. Each reporter may ask a different person and may get a new view which is presented in the News forums, we need to find the crux, so in this case, we shall feed all these articles by all news reports, editors and then this shall be fed into the Text Summarization system (MDS) to get the final output that says it all in short, which is to say, it summarizes it all. Now, mostly in Multi document summarization, we take similar inputs to get the summary, however this is not restricted to similar documents, the users can use any input. But the logic goes this way only.
5. Extractive or Abstractive
In extraction-based summaries the parts of text are selected as it is from the input text, while in abstraction-based summaries, many sentences are reframed. Abstraction based summarization can work on top of an Extraction based Summarization or on its own, from scratch, understanding all the essential relations in the texts. This typically, involves sentence fusion, and revalidation of sentence formed is correct of not using large language models verifiers. A popular application of extractive summarization is through highlighting of texts.
However, the formation of text extractive summarization is itself a complex tasks, but an abstraction based model can be build on top of an extraction based model for making the extractive summary more concise and more short.
Currently, the scope of Text Summarization is on one page, or single document summarization or multi document summarization. However, consider an example where we need to summarize multiple books. For example, certain old scriptures were found some decades back in an urn, which were somehow found in West though some channels. Now, we don’t have time for reading all those scriptures but we want to know what is summary of these one 2000 year or so old book (I am not an Historian) and may be summarising all those scriptures that were found then. In this view you can say, that this is a one-time process, I say no if you use user specific summarization, all people summarising shall get the insight of the same scriptures from point of view of their interests, with a focus on user specific information being given to them, another use is different keywords can be fed by different users, to get new insight in the same scriptures, once these are processed in English readable format. This is just for an illustration of future work of Text Summarization, and how it shall look as. The application is not limited to just studying some of the oldest scriptures. Understanding history is a very interesting topic.