Google Semantically Related Words & Latent Semantic Indexing Technology
Many people have been noticing a wide shuffle in search relevancy scores recently. Some of those well in the know attribute this to latent semantic indexing. Even if they are not using LSI, Google has likely been using other word relationship technologies for a while, but recently increased its weighting. How Does Latent Semantic Indexing Work?
Latent semantic indexing allows a search engine to determine what a page is about outside of specifically matching search query text.
A page about Apple computers will likely naturally have terms such as iMac or iPod on it.
Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent. source
By placing additional weight on related words in content, or words in similar positions in other related documents, LSI has a net effect of lowering the value of pages which only match the specific term and do not back it up with related terms.
LSI vs Semantically Related Words:
After being roasted by a few IR students and scientists I realized that many SEOs (like me) blended the concepts of semantically related words with latent semantic indexing, and due to constraints of the web it is highly unlikely that large scale search engines are using LSI on their main search indexes.
Nonetheless, it is overtly obvious to anyone who studies search relevancy algorithms by watching the results and ranking pages that the following are true for Google:
- search engines such as Google do try to figure out phrase relationships when processing queries, improving the rankings of pages with related phrases even if those pages are not focused on the target term
- pages that are too focused on one phrase tend to rank worse than one would expect (sometimes even being filtered out for what some SEOs call being over-optimized)
- pages that are focused on a wider net of related keywords tend to have more stable rankings for the core keyword and rank for a wider net of keywords
Given the above, here are tips to help increase your page relevancy scores and make your rankings far more stable...
Mix Your Anchor Text!
Latent semantic indexing (or similar technologies) can also be used to look at the link profile of your website. If all your links are heavy in a few particular phrases and light on other similar phrases then your site may not rank as well.
Example Related Terms:
Many of my links to this site say "SEO Book" but I also used various other anchor text combinations to make the linkage data appear less manipulative.
Instead of using SEO in all the links some of them may use phrases like
Seo Updates
search engine optimization
search engine marketing
search engine placement
search engine positioning
search engine promotion
search engine ranking
etc.
Instead of using book in all the links some other good common words might be
ebook
manual
guide
tips
report
tutorial
etc.
How do I Know What Words are Related?
There are a variety of options to know what words are related to one another.
- Search Google for search results with related terms using a ~. For example, Google Search: ~seo will return pages with terms matching or related to seo and will highlight some of the related words in the search results.
- Use a lexical database
- Look at variations of keywords suggested by various keyword suggestion tools.
- write a page and use the Google AdSense sandbox to see what type of ads they would try to deliver to that page.
- Read the page copy and analyze the backlinks of high ranking pages.
Google Sandbox and Semantic Relationships:
The concept of "Google Sandbox" has become synonymous with "the damn thing won't rank" or whatever. The Sandbox idea is based upon sites with inadequate perceived trust taking longer to rank well.
Understanding the semantic relationships of words is just another piece of the relevancy algorithms, though many sites will significantly shift in rankings due to it. The Google sandbox theory typically has more to do with people getting the wrong kinds of links or not getting enough links than it does with semantic relationships. Some sites and pages are hurt though by being too focused on a particular keyword or phrase.
Where do I learn more about Latent Semantic Indexing?
A while ago I read Patterns in Unstructured Data and found it was wrote in a rather plain english easy to understand manner.
Brian Turner also listed a good number of research papers in this thread.
The Hidden or Not so Hidden Messages:
- If you are entirely dependant on any single network and a single site for the bulk of your income then you are taking a big risk. Most webmasters would be best off to have at least a couple of income streams to shield themselves from algorithm changes.
- If you are new to SEO you are best off optimizing your site for MSN and Yahoo! off the start and then hoping to later rank well in Google.
- Make sure you mix your anchor text to minimize your risk profile. Even if you are generally just using your site name as your anchor text eventually that too can hurt you.
- Search algorithms and SEO will continue to get more complicated. But that makes for many fun posts ;)
No comments:
Post a Comment