Using 301 redirects to avoid duplicate content

August 26, 2008 · 87 Views · Filed Under Google/SEO, Web Development · Comment 

It has been brought to my attention that having a website that is accessible with and without the “www.” before the domain name is BAD!!! The reason for this is that search engines will consider this as two seperate sites that have duplicate content. The other problem is that an index.html file would also be seen as seperate page to a search engine. In order to understand this a little better, see the example below:

  • http://bfxmedia.com
  • http://bfxmedia.com/index.html
  • http://www.bfxmedia.com
  • http://www.bfxmedia.com/index.html

In the example above, all of those different URLs would have displayed the same page. In order to resolve this problem, use this Redirect Check SEO Tool. After you submit your website for testing, it will run through all the variations of the URL and give you the status code for each one.

To redirect all traffic from http://bfxmedia.com to http://www.bfxmedia.com, you would add the lines below to your .htaccess file:

RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\.bfxmedia\.com
RewriteRule (.*) http://www.bfxmedia.com/$1 [R=301,L]

Although, that didn’t work for me because of the way my subdomains are redirected. I had to use this instead:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^bfxmedia.com [NC]
RewriteRule ^(.*)$ http://www.bfxmedia.com/$1 [R=301]

To redirect http://www.bfxmedia.com/index.html (and all other default page variations) to http://www.bfxmedia.com, you would add the lines below to your .htaccess file:

RewriteEngine on
Options +FollowSymLinks
RewriteCond %{THE_REQUEST} ^.*/index\.html
RewriteRule ^(.*)index.html$ http://www.bfxmedia.com/$1 [R=301]
RewriteCond %{THE_REQUEST} ^.*/index\.htm
RewriteRule ^(.*)index.htm$ http://www.bfxmedia.com/$1 [R=301]
RewriteCond %{THE_REQUEST} ^.*/index\.php
RewriteRule ^(.*)index.php$ http://www.bfxmedia.com/$1 [R=301]
RewriteCond %{THE_REQUEST} ^.*/index\.shtml
RewriteRule ^(.*)index.shtml$ http://www.bfxmedia.com/$1 [R=301]
RewriteCond %{THE_REQUEST} ^.*/index\.asp
RewriteRule ^(.*)index.asp$ http://www.bfxmedia.com/$1 [R=301]
RewriteCond %{THE_REQUEST} ^.*/index\.aspx
RewriteRule ^(.*)index.aspx$ http://www.bfxmedia.com/$1 [R=301]
RewriteCond %{THE_REQUEST} ^.*/index\.cfm
RewriteRule ^(.*)index.cfm$ http://www.bfxmedia.com/$1 [R=301]
RewriteCond %{THE_REQUEST} ^.*/index\.pl
RewriteRule ^(.*)index.pl$ http://www.bfxmedia.com/$1 [R=301]
RewriteCond %{THE_REQUEST} ^.*/default\.asp
RewriteRule ^(.*)default.asp$ http://www.bfxmedia.com/$1 [R=301]
RewriteCond %{THE_REQUEST} ^.*/default\.htm
RewriteRule ^(.*)default.htm$ http://www.bfxmedia.com/$1 [R=301]

For the example site bfxmedia.com, I have corrected all the problems and now only one URL displays my “home” page. All the other variations are safely redirected.

Useful Links:
Redirect Check SEO Tool
SEO advice: url canonicalization
Canonical and Duplicate Versions of Content

Cuil Launches Biggest Search Engine

July 29, 2008 · 110 Views · Filed Under Google/SEO · 1 Comment 

MENLO PARK, Calif. - July 28, 2008 - Cuil, a technology company pioneering a new approach to search, unveils its innovative search offering, which combines the biggest Web index with content-based relevance methods, results organized by ideas, and complete user privacy. Cuil (www.cuil.com) has indexed 120 billion Web pages, three times more than any other search engine.

Cuil (pronounced COOL) provides organized and relevant results based on Web page content analysis. The search engine goes beyond today’s search techniques of link analysis and traffic ranking to analyze the context of each page and the concepts behind each query. It then organizes similar search results into groups and sorts them by category.

121,617,892,992 web pages and this is what I get: We didn’t find any results for “0092ff”

I may have to keep an eye on this search engine. It might just be the next big thing.
We’ll see…

Google PageRank is updating soon

July 28, 2008 · 727 Views · Filed Under Google/SEO · Comment 

Over at his blog today, Google staffer Matt Cutts has revealed that Google is about to roll out an update to their Toolbar PageRank values.

Hey folks, I wanted to let you know that new toolbar PageRank values should become visible over the next few days. I’m expecting that also in the next few days that we’ll be expiring some older penalties on websites.

Read Matt’s Blog

CoffeeCup Google SiteMapper

July 8, 2008 · 109 Views · Filed Under Google/SEO, Software · 1 Comment 

Today we’re going to take a look at the Google SiteMapper by CoffeeCup.

By placing a sitemap file on your Website, you enable Google and other search engines to find out what pages are present, which have recently changed and have them spider your Website accordingly. Using Sitemaps allow you to inform and direct Google and other search engines what content you have available on your Website.

Google Sitemaps are an easy way for you to help improve your coverage in Google search results. It’s a system that enables you to communicate directly with Google to keep them informed of all your web pages, and when you make changes to these pages. CoffeeCup Google Sitemapper takes all the work out of creating a Google Sitemap. Just enter your Website Address and click go… Presto! The Sitemap is made and ready to upload.

When you first open the program, you will see the screen below.CoffeeCup Google SiteMapper open screen

For this example, I’m going to create a sitemap for one of my other sites. In the “Website Address:” field, I just type in the address. If you wanted to create a sitemap based off of local files on your computer, you would click the top radio button that says “Create SiteMap based on my Local Files”.

By clicking the “Options…” button in the middle, you are given the opportunity to ignore folders/files, change page colors, set different output options or ignore certain queries. On the example site, I always set it to ignore the blog directory, because my blog on that site uses a different sitemap tool that is designed for Wordpress.
CoffeeCup Google SiteMapper options

Once you’re satisfied with your configuration options, press “OK” and then “Next”. After that, the CoffeeCup super-spiders will crawl your website and start collecting page data. Once that’s complete, you’ll see a list of all your pages that it found. It will only find pages that are interlinked together. If you have a page that isn’t connected by linking, it will not appear on your sitemap by default. Here is what mine looks like:CoffeeCup Google SiteMapper results page

If you’re satisfied with the results, choose the desired output location and hit “Save”. You will then be given the option to upload your sitemap or submit it to Google. Normally, I just close the window and upload it myself. The program also creates a .csm file that doesn’t need to be uploaded. That’s a saved program file that stores your sitemap info for later.
CoffeeCup Google SiteMapper saved files

Overall, I think CoffeeCup is correct when they say “Just enter your Website Address and click go… Presto!” It really is that easy. You can see a live sitemap created with CoffeeCup Google SiteMapper here.

If you would like to try the software out for yourself, you can download it here.