Archive for asp.net

ASP.NET MVC, What about SEO?

I’ve started working the the latest preview of the ASP.NET MVC framework. I’m completely converting one of my sites, because learning by doing is typically the best way. Unfortunately, I’ve run into some alarming SEO (Search Engine Optimization) issues with this new paradigm (or more specifically, the Microsoft implementation).

Duplicate Ducks!

Duplicate Content

Duplicate content is a major issue. If a search engine (Google, which we’re primarily concerned with) finds multiple identical pages, it could be seen as a spam technique. Google likes original content, and penalizes duplicate content.

The problem is that the ASP.NET MVC default routing is too forgiving. If I have a page with this address: "/controller/action/id", the routing engine happily serves it up at "/controller/action/id/". There is no reason to not be strict on this. In ASP.NET WebForms, if you forget the trailing slash, it will automatically perform a 301 (permanent) redirect to the version with the trailing slash.

ASP.NET MVC has a bug (I’m calling it that) that won’t let you define a URL as requiring a trailing slash. Below, I’ve defined a route as a sample. In the URL to match, I have a trailing slash. In the routing code, the trailing slash is removed when it’s added to the routing table. This also has the side-effect of generating the URL’s without a trailing slash.

routes.MapRoute(
	"Legacy-Firefox",
	"Firefox-Extension/",
	new { controller = "Home", action = "Firefox", id = "" } );

Since routes can be configured to reuse actions and controllers, it makes more paths to follow. If I again use the route defined above, I end up with all of these valid addresses that could potentially be linked to and indexed:

  • a.com/Firefox-Extension
  • a.com/Firefox-Extension/
  • a.com/Home/Firefox
  • a.com/Home/Firefox/
  • a.com/Home/Firefox/anythingyouwant

If you’re lucky, Google won’t penalize duplicate content. However, if Google indexes the same content using multiple URL’s, you won’t get the benefit of focusing the PageRank. A similar situation occurs with you have a site that can be addressed like "cnn.com" and "www.cnn.com". They counted as separate pages, that end up competing for good rank.

Legacy URL’s

I’m sure there is a large group of people that are anxiously awaiting to take advantage of the new MVC style development. Many of them will undoubtedly have existing URL’s they want to preserve.

There are a couple of ways to handle this issue. The search engines would prefer that your URL’s simply remain the same. This is possible, but requires some fancy routing. The SEO community highly recommends this approach (with good cause).

Another way to handle it is to adopt the new REST style URL’s that typically make the most sense with an MVC approach: "/controller/action/id". Then, setup 301 redirects from the old addresses to the new one’s. This article discusses the technical details. In theory, this should be the best scenario. However, Google themselves say to get the incoming links pointed to the new addresses ASAP. The truth is, this solution sucks. I’ve actually done this with a site. It was search engine suicide for a couple of months. I eventually got my old position back, but lost a significant amount of revenue because of it.

Yet another way that I found is to set up multiple routes so that the content is accessible with both the old and new addresses. If you’ve been paying attention, you’ll know that this counts as duplicate content, and is very, very bad. I was in shock when I found this approach being advocated.

Conclusion

I’m not saying the routing system is completely wrong, I just think it would be set up so that the easy way of migrating a site is the correct way, or as close to it as possible. I don’t want to have to write custom routing. At the very least, come up with a way to designate that a particular action has a single path (and cancel out additional paths in other routes). It would also be nice if there was a way to use the old style urlMapping section in the web.config for legacy URL’s.

If I’m completely wrong about how the routing works, let me know. It’s difficult to find good information (which is understandable right now), and I’m admittedly still in an early learning stage.

ASP.NET Changing Session ID’s for each request

I ran into an issue where ASP.NET was changing the Session.SessionId for every request from the same user. A quick Google search revealed 2.3 million pages. I’ll summarize one of the main reasons this can happen, and discuss 2 ways to fix it.

Hand-Counting

I’ve been working on a search function for a website I’m working on. We’re taking the Lean software approach and implementing an extremely basic search for now. We’re going to track the searches that users are making, and will have the data we’ll need to make a better search in the next version.

In order to know if users are making multiple searches, we’re storing the ASP.NET session ID with the search record in the database. Much to my dismay, every search request resulted in a different value in Session.SessionId.

The problem lies in the fact that ASP.NET is trying to be extremely efficient storing sessions for users. If ASP.NET doesn’t have a reason to remember who you are, it won’t. If you think about it, that can save a tremendous amount of work by avoiding session management.

If you want to tell ASP.NET that you want it to track user sessions, you can do one of 2 things:

  1. Store something in the session. If you store something in the users session, ASP.NET will be forced to associate that data with your current visit. Example code:
    Session["foo"] = "bar";
  2. Simply by handling the Session_Start event in your Global.asax. The presence of this method will tell ASP.NET to track sessions, even if there is no data in the session.
    public void Session_Start(object sender, EventArgs e)
    {
    }

Locking sessions for multi-threaded access

I recently ran into a situation where I needed to upload some small files from a Flex client application to an ASP.NET web server. I decided to store the uploaded files in the users session while they were in the checkout process. Once the user confirms their order, the images are read from the session and stored to the database.

Here is the original code from the page that accepts each uploaded file, and adds it to a Dictionary in the collection:

if (Session[SESSION_ORDER_FILES] == null)
{
	//Our dictionary hasn't been created, so we do it now
	files = new Dictionary<string, byte[]>();
	Session[SESSION_ORDER_FILES] = files;
}
else
{
	//The dictionary has already been created, just load it
	files = (Dictionary<string , byte[]>) Session[SESSION_ORDER_FILES];
}

//If we have the "_clearPrevious" flag, that means all
//of the files should be removed from this users session
if (_clearPrevious)
	files.Clear();

//If the file name is the same, replace it
if (files.ContainsKey(_fileName))
	files.Remove(_fileName);

files.Add(_fileName, bytes);

The problem is that we ended up with missing images. The client was sending them, but when the user confirmed their order they were missing images in the session. Since ASP.NET will process page requests in multiple threads, the session can be accessed in multiple threads!

Now, we need to find a way to lock them. I questioned whether ASP.NET would give me the same session object each time, or a new instance representing the same session. I whipped up this code in a test page. It saves the previous session reference to the session. I know it’s a little strange, but since no serialization happens with the session, it gave me a good way to know if the previous session object and the current session object were the same instance.

const string SESS_SESS = "test";
var currSessionObj = Session[SESS_SESS];

if(currSessionObj == null)
	//First page load
	Session[SESS_SESS] = Session;
else
	lblText.Text = (Session[SESS_SESS] == Session).ToString();

The result of this page was false. That means you most certainly do get a new session instance each time. Keep in mind that I’m not saying it’s a different session, the object you’re accessing the session with simply changes.

What does this mean?

This means that you have to be careful when there is a chance that you’re working with session objects in multiple pages, or in a page that could be accessed multiple times simultaneously. Thankfully, there are only a few real-world scenarios where this would be a large concern.

As with any other kind of multi-threaded code, be careful if you’re checking the session, and then performing an action based on the result. In that case, you’ll need to lock a global object that is available to all threads that could access that code. Here is an example:

lock(Global.SessionLock)
{
	if(Session["foo"] == null)
		Session["foo"] = new Bar();
}

In your Global class, you’ll need this field:

static object SessionLock = new object();

Object does not match target type in GridView

I created a shopping cart for a website that can display multiple types of items that implement IShoppingCartItem. When the GridView would display items that were different type, I would get this exception:

Exception Details: System.Reflection.TargetException: Object does not match target type.

I found a lot of solutions that I didn’t really like. For example, every shopping cart item type could implement ITypedList.

What I ended up doing was creating a CartGridItem class that implements IShoppingCartItem:

public class CartGridItem : IShoppingCartItem
{
	private readonly IShoppingCartItem _baseCartItem;
	public CartGridItem(IShoppingCartItem baseCartItem)
	{
		_baseCartItem = baseCartItem;
	}
	#region IShoppingCartItem Members
	public string ProductCode
	{
		get { return _baseCartItem.ProductCode; }
	}
//...end of code sample...

This has worked great, and I don’t have to make any changes when I create a new item that implements that interface!

image

Speeding up your ASP.NET Application

I read a post titled "Improve Web Application Performance". I expected to have a lot of information to add, but it’s actually pretty comprehensive. I recommend reading it if you’re developing a web application. One of my biggest complaints about most websites is that they’re way too slow. In this day and age there is really no reason for that.

Analog Stopwatch

The only other optimization I have used in the past that isn’t mentioned in the post, is removing whitespace. While it can make your life a little harder because of some layout glitches it might cause (which are fixable), it can often have a pretty big effect on your page sizes. On one of my websites, I saw an average of a 10-20% decrease in page sizes, even when gzip compression was turned on.

The easiest way to remove whitespace from your pages is to use an HTTP module. One solution is available here for free. You simply add the binary DLL, and reference it in your web.config file. It couldn’t hurt to give it a try the next time you’re writing an ASP.NET application.