Archive for c#

Value type comparison pitfall with == vs Equals

I recently ran into a situation that momentarily confused me, because it was non-intuitive to me at first. I’m working on a class that tracks changes made in UI controls in Silverlight, and I wrote code similar to the following:

private void checkChanges(UIElement control)
{
	object oldValue = getOldValue(control);
	object newValue = getNewValue(control);

	if(oldValue == newValue)
		return;

	Debug.WriteLine("The value has changed");
}

The data type I was working with in this case was a DateTime, which happens to be Struct, which is a value type. I know that this code works as expected:

DateTime time1 = DateTime.Parse("1-1-09");
DateTime time2 = DateTime.Parse("1-1-09");

//This is true
Assert.IsTrue(time1 == time2);

The code above works because I’m comparing 2 DateTime structures. The original code does not work because the structures are being boxed, in other words, they’re wrapped in objects. When you use “==” on two objects, it’s comparing the memory references, determining if they’re the same instance. In this case, the DateTime objects are each boxed into separate boxes.

The workaround is to use the “Equals” method which exists on all objects, and is overridden for the most common framework elements you’ll use. For example, DateTime overrides .Equals to determine if the date/times are equivalent.

So if I wanted to fix my original code, it would look like this:

private void checkChanges(UIElement control)
{
	object oldValue = getOldValue(control);
	object newValue = getNewValue(control);

	if(oldValue.Equals(newValue))
		return;

	Debug.WriteLine("The value has changed");
}

Of course this problem applies to all values types such as int, double, and any custom structs you may have created.

It goes without saying that this pitfall is not an issue when you’re working with reference types, because they have no need to be boxed.

Using C# Yield for Readability and Performance

I must have read about "yield" a dozen times. Only recently have I began to understand what it does, and the real power that comes along with it. I’m going to show you some examples of where it can make your code more readable, and potentially more efficient.

To give you a very quick overview of how the yield functionality works, I first want to show you an example without it. The following code is simple, yet it’s a common pattern in the latest project I’m working on.

IList<string> FindBobs(IEnumerable<string> names)
{
	var bobs = new List<string>();

	foreach(var currName in names)
	{
		if(currName == "Bob")
			bobs.Add(currName);
	}

	return bobs;
}

Notice that I take in an IEnumerable<string>, and return an IList<string>. My general rule of thumb has been to be as lenient as possible with my input, and as strict as possible with my output. For the input, it clearly makes sense to use IEnumerable if you’re just going to be looping through it with a foreach. For the output, I try to use an interface so that the implementation can be changed. However, I chose to return the list because the caller may be able to take advantage of the fact that I already went through the work of making it a list.

The problem is, my design isn’t chainable, and it’s creating lists all over the place. In reality, this probably doesn’t add up to much, but it’s there nonetheless.

Now, let’s take a look at the "yield" way of doing it, and then I’ll explain how and why it works:

IEnumerable<string> FindBobs(IEnumerable<string> names)
{
	foreach(var currName in names)
	{
		if(currName == "Bob")
			yield return currName;
	}
}

In this version, we have changed the return type to IEnumerable<string>, and we’re using "yield return". Notice that I’m no longer creating a list. What’s happening is a little confusing, but I promise it’s actually incredibly simple once you understand it.

When you use the "yield return" keyphrase, .NET is wiring up a whole bunch of plumbing code for you, but for now you can pretend it’s magic. When you start to loop in the calling code (not listed here), this function actually gets called over and over again, but each time it resumes execution where it left off.

Typical Implementation

Yield Implementation
  1. Caller calls function
  2. Function executes and returns list
  3. Caller uses list
  1. Caller calls function
  2. Caller requests item
  3. Next item returned
  4. Goto step #2

Although the execution of the yield implementation is a little more complicated, what we end up with is an implementation that "pulls" items one at a time instead of having to build an entire list before returning to the client.

In regards to the syntax, I personally think the yield syntax is simpler, and does a better job conveying what the method is actually doing. Even the fact that I’m returning IEnumerable tells the caller that its only concern should be that it can "foreach" over the return data. The caller can now make their own decision if they want to put it in a list, possibly at the expense of performance.

In the simple example I provided, you might not see much of an advantage. However, you’ll avoid unnecessary work when the caller can "short-circuit" or cancel looping through all of the items that the function will provide. When you start chaining methods using this technique together, this becomes more likely, and the amount of work saved can possibly multiply.

Ayende has a great example of using yield for a slick pipes & filters implementation. He even has a version that is multi-threaded which I find very intriguing.

One of my first reservations with using yield was that there is a potential performance implication. Since c# is keeping track of what is going on in what is essentially a state machine, there is a bit of overhead. Unfortunately, I can’t find any information that demonstrates the performance impact. I do think that the potential advantages I mentioned should outweigh the overhead concerns.

Conclusion

Yield can make your code more efficient and more readable. It’s been around since .NET 2.0, so there’s not much reason to avoid understanding and using it.

You can find detailed information about how the yield keyword works under the hood here.

Have you been using yield in interesting ways? Have you ever been bitten by using it? Leave a comment and let me know!

Locking sessions for multi-threaded access

I recently ran into a situation where I needed to upload some small files from a Flex client application to an ASP.NET web server. I decided to store the uploaded files in the users session while they were in the checkout process. Once the user confirms their order, the images are read from the session and stored to the database.

Here is the original code from the page that accepts each uploaded file, and adds it to a Dictionary in the collection:

if (Session[SESSION_ORDER_FILES] == null)
{
	//Our dictionary hasn't been created, so we do it now
	files = new Dictionary<string, byte[]>();
	Session[SESSION_ORDER_FILES] = files;
}
else
{
	//The dictionary has already been created, just load it
	files = (Dictionary<string , byte[]>) Session[SESSION_ORDER_FILES];
}

//If we have the "_clearPrevious" flag, that means all
//of the files should be removed from this users session
if (_clearPrevious)
	files.Clear();

//If the file name is the same, replace it
if (files.ContainsKey(_fileName))
	files.Remove(_fileName);

files.Add(_fileName, bytes);

The problem is that we ended up with missing images. The client was sending them, but when the user confirmed their order they were missing images in the session. Since ASP.NET will process page requests in multiple threads, the session can be accessed in multiple threads!

Now, we need to find a way to lock them. I questioned whether ASP.NET would give me the same session object each time, or a new instance representing the same session. I whipped up this code in a test page. It saves the previous session reference to the session. I know it’s a little strange, but since no serialization happens with the session, it gave me a good way to know if the previous session object and the current session object were the same instance.

const string SESS_SESS = "test";
var currSessionObj = Session[SESS_SESS];

if(currSessionObj == null)
	//First page load
	Session[SESS_SESS] = Session;
else
	lblText.Text = (Session[SESS_SESS] == Session).ToString();

The result of this page was false. That means you most certainly do get a new session instance each time. Keep in mind that I’m not saying it’s a different session, the object you’re accessing the session with simply changes.

What does this mean?

This means that you have to be careful when there is a chance that you’re working with session objects in multiple pages, or in a page that could be accessed multiple times simultaneously. Thankfully, there are only a few real-world scenarios where this would be a large concern.

As with any other kind of multi-threaded code, be careful if you’re checking the session, and then performing an action based on the result. In that case, you’ll need to lock a global object that is available to all threads that could access that code. Here is an example:

lock(Global.SessionLock)
{
	if(Session["foo"] == null)
		Session["foo"] = new Bar();
}

In your Global class, you’ll need this field:

static object SessionLock = new object();

Using objects or repository interface in constructor

I’ve been really trying to use the Single Responsibility Pattern in all of the classes I design. Recently, I needed to create code to query a list of holidays from the database, and then create a method that allows you to get the number of holidays between two given dates.

Here was my first stab at the constructor:

public HolidayCalculator(IEnumerable<DateTime> holidays)

It’s simple and easy to understand. Then I started thinking about some of the dependency injected examples I’ve seen. For example, one of the Spring.NET IoC quickstarts has a similar example, except that they’re trying to list movies. In their example, they use an IMovieFinder interface. That interface has a single method that retrieves a list of movies. Using this concept, my constructor would look like (and what I ultimately changed it to):

public HolidayCalculator(IHolidayRepository holidayRepository)

That example originally seemed unnecessarily complex to me. Why separate something so simple and disconnected into an interface? Well, it turns out there are a couple of good reasons that you might want to do this.

Delay loading

With my original constructor, I had to load all of the holidays from the repository (ultimately a database in the production environment) to even create an instance of this class. This is certainly less than ideal when I want to use this class as a singleton that may get created early in the application.

Single Responsibility

In my original design, I was accepting in a list that would get cached in my holiday calculator. My class now has two responsibilities. It has to calculate holidays, and it has to cache the holiday list. What if I wanted to change how the list was cached? I would have to change the class, which is not ideal.

Holiday-Calculator-Design

Ideally, this class would load the holiday list each time it needs to perform a calculation. The implementation passed into the constructor would be responsible for caching. In fact, we can now easily separate out the caching feature, and the holiday loading feature. Both classes would implement the IHolidayRepository interface and would be chained together. The caching class would take an IHolidayRepository.

Incremental Coding

Following the Agile philosophy, I can now deliver code faster. I don’t need to add a caching layer. I can have a working application in less time, and then later evaluate if I need to cache the holiday data.

Conclusion

Overall, this design is a little more work, but I think the benefits outweigh the extra classes and interface I needed to create. This design makes it easy to test, and each class has almost no code in it. Reading it and understanding it is extremely simple.

Using "var" to simplify code and avoid redundancy

You’ve probably already heard of the new "var" keyword that you can use to declare variables in your .NET code. I wanted to clear up some quick myths and give a quick overview of when it’s most valuable.

If you haven’t heard of it, you can now use this syntax:

var c = new Cat();

Instead of this:

Cat c = new Cat();

As Jeff Atwood mentioned, it’s a great way to avoid redundancy. It’s obvious that you’re creating a "Cat" object, why do you have to say it twice?

The most important thing to realize, is that it’s NOT a var like in JavaScript or other languages (DIM in VB). It really is 100% a "Cat" object, complete with intellisense. The generated IL would be no different than specifying the type when you declare it. It’s simply a compiler trick.

Another important thing to remember is that you the assignment must be combined with the declaration. If that wasn’t the case, readability would be very poor.

var myName = "Jason"; //Allowed
var yourName; //Not allowed
yourName = "Bob"; //Glad you can't do this

What are some really good examples of when it ideally should be used:

List<Dictionary<int, string>> customers = new List<Dictionary<int, string>>(); //Yuck!
var customers = new List<Dictionary<int, string>>(); //Yay!

OrderRepository orderRepo = (OrderRepository)ctx.GetObject("orderRepository"); //Yuck!
var orderRepo = (OrderRepository)ctx.GetObject("orderRepository"); //Yay!

string name = "Jason Young"; //Kinda yucky
var name = "Jason Young"; //Kinda better

As you can see, those examples have obvious redundancy. Using the "var" keyword increases readability, and makes it easier to change if needed.

There are certainly times when its use is questionable. In the following example, I’m calling a function that returns some data. Since I’m not explicitly defining the type that is being returned from that function, I have to do some digging to figure out the type being returned. In this case, you’ll have to define the correct way to handle it in your coding standards.

var data = GetData();

Another potential readability issue comes up in the following case. You might not want to use the "Circle" specific methods (Circle inherits from Shape).

Shape s = GetCircle(); //I see what you're doing
var s = GetCircle(); //Do you want a shape, or a circle?

Aside from a couple of decisions that need to be made in your organization, I think this is a great addition, and should make our lives as developers a little bit easier. It’s just another tool for our toolbelt. With great power comes great responsibility