Archive for February, 2009

Azure – Performance, IoC, and Instances

Ever since the Google App Engine was released, I’ve been fascinated with cloud computing frameworks. The vision is to have a website that can scale from nothing to infinity, without having to worry about servers, viruses, uptime, etc. I’ve finally gotten a chance to play around with Azure, and I must say that I’m in love with the concept, but disappointed by the current reality.

Azure

Performance

I’ve taken a site that I consider a “playground site”, and converted it over to run in Azure. One of the metrics I wanted to look at was the responsiveness of the deployed application. I run the main version of the site on a dedicated server, and I don’t think it’s unreasonable to use that as a baseline. After all, the purpose of Azure is to have the advantages of all the different types of hosting, yet have less to worry about.

To gauge performance, I used the Firefox add-in called Firebug. This let me see the amount of time that each requested element took to be transferred from the server. It also gives some insight into the amount of time it takes for the page to render. In the future, I’m going to use some server tracing to find specific operations that may be taking longer.

This is the baseline data from http://www.simpletracking.com. As you can see, the page is served up very quickly. The page takes less than 100ms to render (1/10 of a second), and the entire page comes through in less than half of a second.

simpletracking.com

Now take a look at the same code running on Azure:

simpletracking.cloudapp.net

To render the page, instead of 89ms, it now takes ~650ms. It takes a full second for the entire page and its elements to be sent down to the client.

Running both pages several times started to give me interesting results. The dedicated server was giving me extremely consistent results (even with other users hitting it). Azure however, was all across the board. It was typically around 1 second for the entire page to render, but would spike up to 5 seconds occasionally. Personally, I think this is completely unacceptable performance. Hopefully this is not indicative of the performance I can expect once it’s released.

IoC

Azure is designed so that if you have an application that runs in medium trust, it shouldn’t require any conversion to run straight in Azure (in most cases). If you’re using a database, there are other restrictions because Azure doesn’t use a standard SQL database. In addition to these obvious issues, a non-obvious issue is that if you’re using an IoC container, it probably won’t run in medium trust.

My application uses the IoC container Spring.NET, which immediately failed. I suspected (incorrectly) that Windsor might have worked better, but couldn’t tell from the documentation. To make it easy to plug in different IoC containers, I started using the Common Service Locator. If you’re doing IoC without the common service locator, I really recommend you check it out.

I was then fortunate enough to find this page, which has great information on the different IoC containers and their Azure compatibility:

Castle Windsor – My preferred IoC container, but it won’t run under medium trust. Out!

StructureMap
– My second favorite IoC container. Runs under medium trust locally, but not under Azure. Submitted bug report to Jeremy Miller. Reading through the StructureMap user’s group, it looks like he’s going to try to fix that early this year.

Ninject
– I didn’t really monkey around with Ninject much. The sample code I saw was riddled with [Inject] attributes, which kinda turned me off. Apologies to @nkohari if I dismissed it too early.

Autofac
– Works great in medium trust under Azure, easy to configure, but doesn’t support registering arguments for constructor injection at configuration time. You have to specify them when you resolve the service.

Unity
– No problems at all! Worked great in medium trust on Azure, easy to configure, supports everything I need! I gotta say I’m really impressed by how far Unity has come in such a short time.

My only reasonable option was Unity, which is Microsoft’s IoC container. After another fun conversion, I was up and running! I honestly don’t have any complaints about their IoC offering.

Instances

The Azure team decided to introduce the concept of “Instances”. You have to decide how many virtual instances of a web server that you want running. I really don’t understand the logic here. Their sales pitch is all about handing unpredictable traffic patterns, yet an instance based approach just gives me another aspect of the application that I have to worry about. They’re promising that a configurable heuristics system will eventually be in place to handle the management of the number of instances. In effect, they are putting a band aid on a problem that they’ve created even before release.

Contrast this design with the Google App Engine. With their system, you don’t have to worry about configuring instances at all. It automagically scales from nothing to infinity automatically.

Instances on the worker roles make sense. Worker roles are not public facing, they are there to process data. By configuring the number of worker role instances, I can change the rate at which my data gets processed.

Conclusion

I realize that Azure isn’t even in beta yet, so I shouldn’t expect the world. I had my fingers crossed that their CTP would be production quality (wouldn’t that be nice?). I think that Microsoft will eventually have a great cloud platform on their hands, it’s simply a question of timing. Personally, I really don’t want to have to worry about uptime, scaling, RAID, drivers, viruses, etc. so I think cloud computing is the inevitable solution.

Convenient Synchronization with Mesh and DropBox

A couple of weeks ago, I finally signed up for DropBox. If you’re unfamiliar with the service, it’s a file synchronization service. You install a client on multiple machines, and you get a special folder (aka a dropbox). When you make changes on any computer, it’s synchronized with a central server, as well as the other clients.

image

Now that I’ve gotten the chance to put DropBox through its paces, I have to say that I’m very impressed. I’ve done a lot of operations that can sometimes choke file monitoring software like moving and renaming files, copying files while synchronizing, and in-use files. DropBox powered through like a champ, never giving me any errors, and without any noticeable mistakes.

In addition to simply synchronizing your files, their service also keeps a copy of your files on their server. Better yet, it automatically revisions the files. It seems to be fairly efficient, even considering all my files and revisions. Right now I’m only using 7.8% of the 2GB of space they give you for free.

One of the applications that I use the most is OneNote. Pretty much all of my disconnected thoughts go into OneNote until I can get them organized. I figured it was a great application to test the responsiveness of DropBox. I opened OneNote on two different computers. When I changed the text on one machine, the changes showed up on the other in 10-15 seconds. Perfect for keeping my notes in sync!

My one and only complaint about DropBox is that I can’t create multiple DropBoxes. A single DropBox is simple and efficient, but it would be nice to have a little more flexibility.

Live Mesh

A few nights ago, I got a demo of the Azure platform by a Microsoft Evangelist. Azure is a huge blank term for a group of confusing technologies. Even the name itself is confusing, since Azure is a cloud computing platform and is also the color of the sky when there are no clouds.

image

More importantly, one great thing to come out of the “Live Services” portion, is a free product called “Live Mesh”. It’s essentially a competitor to DropBox. The nice thing about Live Mesh is it’s flexibility. I can make any number of synchronized folders, and they all seem to be as reliable as DropBox. Thanks to a sophisticated permissions system, you can even share folders with other people. For example, you can have a folder set up to distribute your photos to your family.

The Microsoft Azure Evangelist showed us a demo with the client installed on his laptop, and another client installed on his Windows Mobile phone. When he takes a picture on his phone, it’s immediately pushed over to the other clients. It’s a neat trick, and does make my mobile device more useful.

imageimage

As far as I can tell, Live Mesh doesn’t have plans to support a revision system like DropBox. I think this is a horrible, horrible mistake. Having a file on multiple machines provides nice redundancy, yet if you accidentally delete a file on one computer, Live Mesh will happily delete every copy of it. It even happened to Scott Hanselman. In my opinion, this completely destroys any hope it has of competing with DropBox (at least for me). I’m hoping that they’ll add a backup feature, or someone will use their API to add it for them.

Others

One service I have yet to try is SugarSync. It looks promising because it syncs multiple folders, stores revisions, and even has a Windows Mobile version (although it’s missing real-time sync). On paper, it looks like it has all the options you would expect from this type of service.

Syncplicity looks respectable, but with so many alternatives, I’m just not sure if they have anything unique that sets them apart.

Conclusion

I think this type of application is going to have a huge market. This is one of those few killer app’s that if done well, will be on everyone’s computer. Obviously Microsoft’s offering will be positioned to dominate, but we all know that they don’t always have the absolute best product.

For now, I’ll be using DropBox for my main document folder. It suits my needs, and until it messes up, I won’t need to look elsewhere.

Value type comparison pitfall with == vs Equals

I recently ran into a situation that momentarily confused me, because it was non-intuitive to me at first. I’m working on a class that tracks changes made in UI controls in Silverlight, and I wrote code similar to the following:

private void checkChanges(UIElement control)
{
	object oldValue = getOldValue(control);
	object newValue = getNewValue(control);

	if(oldValue == newValue)
		return;

	Debug.WriteLine("The value has changed");
}

The data type I was working with in this case was a DateTime, which happens to be Struct, which is a value type. I know that this code works as expected:

DateTime time1 = DateTime.Parse("1-1-09");
DateTime time2 = DateTime.Parse("1-1-09");

//This is true
Assert.IsTrue(time1 == time2);

The code above works because I’m comparing 2 DateTime structures. The original code does not work because the structures are being boxed, in other words, they’re wrapped in objects. When you use “==” on two objects, it’s comparing the memory references, determining if they’re the same instance. In this case, the DateTime objects are each boxed into separate boxes.

The workaround is to use the “Equals” method which exists on all objects, and is overridden for the most common framework elements you’ll use. For example, DateTime overrides .Equals to determine if the date/times are equivalent.

So if I wanted to fix my original code, it would look like this:

private void checkChanges(UIElement control)
{
	object oldValue = getOldValue(control);
	object newValue = getNewValue(control);

	if(oldValue.Equals(newValue))
		return;

	Debug.WriteLine("The value has changed");
}

Of course this problem applies to all values types such as int, double, and any custom structs you may have created.

It goes without saying that this pitfall is not an issue when you’re working with reference types, because they have no need to be boxed.

Is Quality Important?

Joel Spolsky and Jeff Atwood stirred up some debate when they said “Quality just doesn’t matter that much”. At first, I was a little outraged. My entire development process is built around quality. Without it, airplanes would fall from the sky and your car wouldn’t start in the morning.

Levels-of-Quality

So can we definitively put the quality question to rest? Unfortunately, “No”.

First of all, we need to understand that quality isn’t a Boolean. It’s not “yes”, you have quality, or “no”, you don’t have quality. Quality is a gradient, but it’s even worse than that. Everyone sees it differently, and everyone experiences a different aspect of it. In short, quality is a multidimensional gradient!

I used to work at a small development company where I worked very closely with the President of that company. He was concerned with quality, but that took a backseat to the features that went into the product. The features themselves sold the product, and wowed the people writing the checks. Once they purchased our software, the integration efforts were large enough that the customer was essentially locked-in. Throw an expensive support contract into the mix, and it was a money making machine.

The company ended up being very successful, and was eventually assimilated by a huge company. The owners ended up walking away with a few million each. Try to explain to them that quality is more important than features!

Now fast-forward a few years, and we can examine what eventually happened. The product did work, and honestly it was the best in its class simply due to the scope of the problems it was trying to solve, and the high barrier of entry for competitors. However, the quality issues eventually caught up with the product. It became difficult to maintain and add extra features. The only solution was to slowly rewrite sections of it.

I think a great analogy is the turtle and the hare. If you’re in for the long haul, you want to be the steady turtle. If you’re in it for the short term, you want to be as quick as possible, even at the cost of stopping to nap. The problem is, you’re making others suffer for your negligence.

If you want the best of both worlds, build quality into your development process. I’ll be covering this in a series of articles that discuss unit testing (and testing in general) in exhaustive detail. They should be coming out by the middle of March. Stay tuned!

Using C# Yield for Readability and Performance

I must have read about "yield" a dozen times. Only recently have I began to understand what it does, and the real power that comes along with it. I’m going to show you some examples of where it can make your code more readable, and potentially more efficient.

To give you a very quick overview of how the yield functionality works, I first want to show you an example without it. The following code is simple, yet it’s a common pattern in the latest project I’m working on.

IList<string> FindBobs(IEnumerable<string> names)
{
	var bobs = new List<string>();

	foreach(var currName in names)
	{
		if(currName == "Bob")
			bobs.Add(currName);
	}

	return bobs;
}

Notice that I take in an IEnumerable<string>, and return an IList<string>. My general rule of thumb has been to be as lenient as possible with my input, and as strict as possible with my output. For the input, it clearly makes sense to use IEnumerable if you’re just going to be looping through it with a foreach. For the output, I try to use an interface so that the implementation can be changed. However, I chose to return the list because the caller may be able to take advantage of the fact that I already went through the work of making it a list.

The problem is, my design isn’t chainable, and it’s creating lists all over the place. In reality, this probably doesn’t add up to much, but it’s there nonetheless.

Now, let’s take a look at the "yield" way of doing it, and then I’ll explain how and why it works:

IEnumerable<string> FindBobs(IEnumerable<string> names)
{
	foreach(var currName in names)
	{
		if(currName == "Bob")
			yield return currName;
	}
}

In this version, we have changed the return type to IEnumerable<string>, and we’re using "yield return". Notice that I’m no longer creating a list. What’s happening is a little confusing, but I promise it’s actually incredibly simple once you understand it.

When you use the "yield return" keyphrase, .NET is wiring up a whole bunch of plumbing code for you, but for now you can pretend it’s magic. When you start to loop in the calling code (not listed here), this function actually gets called over and over again, but each time it resumes execution where it left off.

Typical Implementation

Yield Implementation
  1. Caller calls function
  2. Function executes and returns list
  3. Caller uses list
  1. Caller calls function
  2. Caller requests item
  3. Next item returned
  4. Goto step #2

Although the execution of the yield implementation is a little more complicated, what we end up with is an implementation that "pulls" items one at a time instead of having to build an entire list before returning to the client.

In regards to the syntax, I personally think the yield syntax is simpler, and does a better job conveying what the method is actually doing. Even the fact that I’m returning IEnumerable tells the caller that its only concern should be that it can "foreach" over the return data. The caller can now make their own decision if they want to put it in a list, possibly at the expense of performance.

In the simple example I provided, you might not see much of an advantage. However, you’ll avoid unnecessary work when the caller can "short-circuit" or cancel looping through all of the items that the function will provide. When you start chaining methods using this technique together, this becomes more likely, and the amount of work saved can possibly multiply.

Ayende has a great example of using yield for a slick pipes & filters implementation. He even has a version that is multi-threaded which I find very intriguing.

One of my first reservations with using yield was that there is a potential performance implication. Since c# is keeping track of what is going on in what is essentially a state machine, there is a bit of overhead. Unfortunately, I can’t find any information that demonstrates the performance impact. I do think that the potential advantages I mentioned should outweigh the overhead concerns.

Conclusion

Yield can make your code more efficient and more readable. It’s been around since .NET 2.0, so there’s not much reason to avoid understanding and using it.

You can find detailed information about how the yield keyword works under the hood here.

Have you been using yield in interesting ways? Have you ever been bitten by using it? Leave a comment and let me know!