Archive for seo

Avoiding duplicate content with your site or blog

One of the most important rules in SEO (Search engine optimization) is avoiding duplicate content. Google has some information on their page about how they handle duplicate content. Unfortunately, the Googlebot is rarely smart enough to know which content is original. Google wants to avoid users that copy and/or republish someone else’s work simply to get content for their site.

You also want Google find pages on your site that have substance, and that are not just a copy of content from one of your other pages.

printer

So how do you avoid it on your site? The first step is to identify potential pages that have duplicate content. It’s probably happening without you even being aware of it.

Type this into Google: site:http://www.yoursite.com

I’m using blogger, and by default here are some pages that are indexed that should not be:

  • http://www.ytechie.com/2008/04/aspnet-linkbutton-and-seo.html?widgetType=BlogArchive&widgetId=BlogArchive1&action=toggle&dir=close&toggle=YEARLY-1199167200000&toggleopen=MONTHLY-1207026000000
  • http://www.ytechie.com/2008_03_01_archive.html

Now that we’ve identified the offending pages, we can create or modify our robots.txt file, at the root of our site.

Here is what I could add to my robots.txt to block those pages:

Disallow: /*?
Disallow: /*_archive.html$

Once you’ve updated your robots.txt file, you can use the Google webmaster tools to test it. For more information on how to edit your robots file, including syntax, consult Google.

There is one big problem. If you’re using a service like Blogger (like this blog), you can’t edit your robots file. There has been talk of adding support, but we have to deal with what is available.

The best I’ve been able to come up with, is adding this into the head (look for <head>) of my template code:

<b:if cond='data:blog.pageType == "archive">
 <meta name="robots" content="noindex, nofollow" />
</b:if>

This adds a noindex and nofollow meta tag to the generated archive pages. I have not yet figured out how to remove pages that contain parameters (?param=value). If anyone has a way to do it, please let me know! I’ve actually been considering removing the archive widget to solve it.

ASP.NET LinkButton and SEO

A common question that comes up, is what do LinkButton’s do for SEO (Search Engine Optimization)? Well, let’s take a look what a LinkButton actually renders for HTML:

<a href="javascript:__doPostBack('ctl01','')">Click me!</a>

Notice that it’s simply a standard hyperlink with a JavaScript call. Typically, the search engines are only going to look at your HTML. They’re not going to evaluate the JavaScript. Doing so would be a big can of worms.

WWW Web

So basically, the LinkButton is going to be invisible to the search engines. At most, they might look at the words in the link text, and consider them as part of the rest of the content.

Remember, the purpose of the LinkButton to be a replacement for the ASP.NET Button control, but with the look of a hyperlink.