Keyword selection is about selection of important keywords from a text, a collection of texts or even in books, collection of catalogues.
Here I present futuristic view of keyword selection, it applications and proposed uses. Not the past uses we all are familiar with – the future uses of keywords selection.
-Actually, keyword extraction reduces dimensionality as well.
-Hence reducing search space dimensions and overload.
-Further, it also projects key terms and key phrases of essence in any text as per a user, as we have learned the user preferences.
-For example, you may like sports news more and I may like more of tech news.
-So when the keywords are extracted, and updated with time, by same mechanism, then, you shall get the essential highlights in a text as per your likings especially highlighting the, number of wickets taken apart from other important terms.
-And, I can get more of highlights in sane article, but highlights on, the company sponsors were ABC or so…
–And a combination of both, supervised and unsupervised can be even more beneficial.
–Further, per target class too, these things can be performed…. Only unsupervised techniques, in graph based needs weights, interconnection, hence should be used with synonyms and lexicon, for better understanding.
–Same for other statistical measure, they should be used in random with lexicons, to engage semantics here.
There are many future use cases for keywords selections, for example,
(1) User local system recommendations filters, or kind of filtering system, this learns user, browsing history, and seeks the one time feedback from user, else compute, the importance based on time spend reading the articles. This is client side, hence no information is shared on internet servers. Based on these we get a target class, which is importance level of articles. Say 3, importance levels, or two. This can help generate the distinguishing features, now, as a supervised technique, for feature selection, shall take out distinguished features, these can be id3, or C4.5 algorithms or Rough-Fuzzy based algorithms, any other.. All this to select features most useful to users. This can help in ranking any new website on Google search, for a customised user view. This same thing can be done on servers as well, just that privacy issues shall be there then. I hope I am clearer now.
(2) Next is, text categorisation, using same approach
(3) Spam detection……so on, using same approach
All these in coming articles…