Posted on May 12, 2010.
How Google adopted LSI "A standard way of representing is the grid or the matrix form, which is the
why experts call the LSI method as "the thinking inside the network'..."
semantic word processing includes mainly linguists. Consider a statement, saying that I am optimizing a point for the search engine. At least three to four words (I, h, k are, for) in the state excesses, in the sense that they do not contribute actively to the meaning of the sentence. They simply add the value of the sentence grammatically. In this way, natural language contains many words redundant and unnecessary, in terms of search engines or semantic values. functional words, conjunctions, prepositions, auxiliary verbs, and several other forms of words is enough to add meaning to a sentence, but does not add much content. Ironically, these are the words most frequently used in English.
In the same first stage of LSI, these words are taken up and ignored. The paper is then left with words that can have a semantic meaning. We can not undo:
Articles, prepositions and conjunctions
common verbs and pronouns
common adjectives (big, late, high)
Frilly words (therefore, which, however, although, etc.)
Any word that appears in every single document or a particular document
Inside the gate, however, the document has a small collection of words on which we can apply our statistical methodology. We can now start to index this collection of words in the document. A representative is normally the grid or the matrix form, which is why experts call the LSI method as "the thinking within the grid. The grid or matrix contains the documents listed on the horizontal axis and the words contained in the documents on the vertical axis.
To search by keyword classic, we just put a cross (X) in the column for all
12 www.sem.mosaic-service.com document in which a particular word (see the list on the line) appears or leave the column blank if the word does not appear. The grid then shows like this:
Document Name / Altitude Topography Height Tiger
Keywords contained
GIS Mapping XXX
Topology XXX
Rainfall harvesting XX
Poems of William Blake X
Clearly, a grid may contain a cross or a vacuum. There is no middle way and that way we can have an analysis of our paper on the keyword search. Note that we have left out any word or perhaps included in the other columns of the head if the form of the word varies, say it is "topology.