Pages: 1

Do You Have a Search Engine `Spider Trap?`

(Click here to view the original thread with full colors/images)


Posted by: forwardone

A "spider trap" - it probably sounds like a good thing, but its not.

In order to get maximum exposure in the search engines, you want to ensure that all of your pages get indexed. After all, the more pages of unique content that is indexed, the greater your chances of meeting someone's needs when they search.

When the search engines index your website, they use automated software known as a "bot" (e.g. GoogleBot). This software will visit your site and crawl or "spider" every link it can find. The list of links it finds is then indexed either immediately or at some point in the future depending on a variety of factors.

One of the main tasks in search engine optimization is to make sure that a site is "crawlable". This basically means that a search engine spider is able to easily navigate the site and find as much of the content as possible. In recent years, XML site maps have really helped this process.

The real problems occur when your web site becomes a "spider trap".

A "spider trap" is when your linking structure can cause a spider to get stuck in an infinite loop and usually happens with dynamic sites. Here's a few examples of how this could happen:
  1. Links created dynamically such as calendars or blog archives that provide an infinite number of links to follow
  2. Dynamic directory structures
  3. Poor or inconsistent linking architecture that is "cleaned up" by tools such as MOD_REWRITE
  4. Badly written dynamic sites with that rely on session id's and query strings
Most search engine spiders are capable of detecting these sorts of issues and so it is unlikely that your site will cause harm or result in too much of a waste in resources to the search engines. The impact to your site, however, is that your content may not get indexed completely. This means that it defeats a major objective for your seo campaign to maximize your exposure.

In order to prevent this from happening, we advise the following:
  1. Use XML site maps to ensure that you are presenting all of your content in an easier way to the search engines than relying on a "bot" to crawl your site.
  2. Eliminate the potential for any spider getting trapped in an infinite loop on your site by using crawling tools such as Fast Links Checker or Xenu to see how many pages they find compared with the actual number of pages on your site (if the crawler is reporting more pages than you know you should have from a combination of products, pages, etc then you know you have a spider trap problem).

Matthew Hopkins

http://www.vertical-leap.net/blog/D...spider-trap.asp



Posted by: clifton

Good article, F1 I use sitemaps since a couple of years




eXTReMe Tracker