Archive for .net

Understanding LINQ and LINQ to SQL (and EF)

Back to basics for this post. Developers often throw around the word LINQ when talking about a number of different technologies. Now that I have been comfortably using a wide variety of LINQ technologies for a fair amount of time, I’m now able to convey some of the key differences that are critical to using LINQ technologies efficiently. I’m also using this as a foundation and reference for some exciting upcoming posts.

The first key point is to know what the heck LINQ is. LINQ itself is a number of separate features. One of these key features is being able to write SQL-like syntax (query syntax) in your code. At a basic level, that’s all you need to know for now.

LINQ (to objects)

First, we’re going to talk about LINQ to objects, which I typically just refer to as LINQ (possibly making the matter more confusing). It has absolutely nothing to do with SQL Server, Oracle, or any other kind of relational database. I’m talking about LINQ to objects, because I think that understanding it and contrasting it with LINQ to SQL is critical to understanding both.

For a moment, forget that LINQ exists. Let’s say that you wanted to filter a list of names, to only get names that start with the letter “J”. You could write the following “utility” function: (if you don’t understand “yield return”, see this post on that topic).

public static IEnumerable<string> GetNamesStartingWithJ(IEnumerable<string> names)
{
    foreach(var name in names)
        if(name.StartsWith("J"))
            yield return name;
}

A new feature in C# introduced in .NET 3.0, is a concept known as an extension method. This lets us turn my handy dandy static utility method into a method that can be called on a list of names. By changing the signature to this:

public static IEnumerable<string> GetNamesStartingWithJ(this IEnumerable<string> names)

I can then call it like this (Sweet!):

var myListOfNames = new List<T> {"Abe", "Jack", "Jason"};
var jNames = myListOfNames.GetNamesStartingWithJ();

We haven’t even talked about LINQ yet, but we’ve basically reinvented a portion of it. As an exercise for the reader, think about how you could use a Lambda parameter to pass in a filter criteria to create a ".Where” method. All the pieces are in place to re-create this form of LINQ yourself.

One actual new feature for LINQ is known as query syntax. Basically, it gives us an alternative way to write our query. It makes the code look more like SQL, and less like a long chain of extension methods.

Lambda Syntax:

var uppercaseJNames = names.Where(name => name.StartsWith("J")).Select(name => name.ToUpper());

Query Syntax (same query):

var uppercaseJNames = from name in names
	where name.StartsWith("J")
	select name.ToUpper();

In both of those examples, the exact same operations are occurring, and you get the result. The one you choose will most likely come down to personal preference. It’s also worth noting that some of the extension methods provided out of the box are not available in query syntax. You can either avoid the query syntax in those cases, or use a hybrid approach.

How is LINQ to SQL (and Entity Framework, etc) Different?

Now, I hope you understand that there isn’t really any magic going on in LINQ. Microsoft has simply given us a new set of easy to use tools that make working with sets a breeze.

LINQ to SQL is a different matter. Instead of executing code, you’re building an expression. An expression is simply a “picture” of what you’re trying to accomplish. It can interpreted in many different ways. To understand the underlying technology, you’ll have to read up on expression trees, which I’m intentionally keeping outside the scope of this post.

If we have a “picture” of a query, what happens to it when we want to “run” it? LINQ to SQL, Entity Framework, and other LINQ implementations look at your query, and basically translate it into something else. How about an example?:

//Query Syntax
var deviceIds = from device in Devices
where device.Type == "I"
select device.DeviceId

//Lambda Sytax (extension methods)
var deviceIds = Devices
   .Where (device => (device.Type == "I"))
   .Select (device => device.DeviceId)

//SQL
SELECT [t0].[DeviceId]
FROM [Devices] AS [t0]
WHERE [t0].[Type] = "I"

I’ve provided the query syntax and the lambda syntax. At the bottom is the resulting translation into a SQL statement.

In this last example, I’ll try to make it clear that your code is simply interpreted and translated:

//Query Syntax:
from device in Devices
where device.Type != null
select device.DeviceId

//SQL Syntax:
SELECT [t0].[DeviceId]
FROM [Devices] AS [t0]
WHERE [t0].[Type] IS NOT NULL

Notice that the C# operator “!=” translates in SQL to “IS NOT NULL”. This was handled automatically for us. Our expression did NOT get back all the rows and apply a conditional to it.

Why is this important? To use either technology effectively, you have to understand that when you’re working with objects, it’s simply a chain of methods, and often behaves as you would expect. When working with LINQ to SQL (or a related technology), the expression is evaluated, and might not execute like you expected.

Understanding the internal workings of these technologies will let us fully take advantage of all the wonderful features it has to offer. In upcoming posts, I’ll be warning you of some potential pitfalls related to how your queries are interpreted and translated. I’ll also be showing you how to get significant performance gains by using LINQ to SQL or Entity Framework efficiently (over traditional SQL based solutions). I’ll also be showing you how I write LINQ queries to query an AutoCAD document!

Related Posts:

Delayed execution vs ToList() in LINQ Database Queries

LINQ to SQL and Entity framework allow us to build a query, which gets translated into an expression tree, and executed once the full query is built. The beauty is that we can build up a query using multiple expressions and Lambdas, without actually querying the data. Since these types of queries are delay loaded, why not avoid executing them until the last possible moment? Read on to see why this is usually a bad idea.

First, let’s take a look the code for a repository method that builds a query, executes the query, and returns the results in a list:

public IEnumerable<Cat> FindAllCats()
{
	var query = from c in db.Cats
		select c;

	return query.ToList();
}

Execute in Repository

The “ToList” is forcing the IQueryable<Cat> query to execute and put the results in a list immediately. However, we know that IQueryable<T> inherits from IEnumerable<T>, so what happens if we avoid the list creation completely?

public IEnumerable<Cat> FindAllCats()
{
	var query = from c in db.Cats
		select c;

	return query;
}

Execute in UI

In this scenario, our method is returning the same interface, but the underlying type is now a LINQ database iterator instead of a List<T>.

Delaying execution can lead to multiple executions

If the code is not explicitly putting the results into a list, we’re actually passing back a form of an iterator. This works great if we only need to execute the query once. However, if we iterate through the list more than once, we actually end up executing our query multiple times. This can obviously lead to poor performance.

If you’re writing fast queries, you may not even notice if they’re being called too many times. However, there may be a worse problem lurking in your code. Each time you iterate through the enumerator, you’re getting a different set of objects. The same query is being made with the same results, but the objects are re-built each time. This leads to objects that are equivalent, but not the same. For example, you may get back Cat objects with the names “Bill” and “Ted”, but if you actually check them for equality using “==”, they will not be the same object instance. Correction: Scott points out in the comments that this isn’t necessarily the case. Keep in mind that it can still occur if projecting types and not working with the original objects.

Delaying execution may mean you no longer have a database connection when attempting to execute the query

If you delegate the task of initiating your query to another layer, you better be sure that the database connection is still around, and is in a queryable state. If you’re using the standard repository pattern and a short-lived database connection pattern, you may quickly run into problems when you try to iterate through the results of the enumerator you receive from your repository layer.

Conclusion

If you’re thinking about moving the execution of your queries to another layer, make sure you understand the consequences. You’ll need to weigh those consequences against the tiny benefit that you’ll receive from the delayed execution. There may be cases where delaying the execution or possibly avoiding it completely will improve your application, but those are probably very rare cases.

Common Pitfalls when working with DateTime’s

In .NET, the DateTime structure provides us wonderful functionality, but this seemingly simple structure can cause a lot of headaches if you don’t fully understand how to use it properly.

Clock

Understand the terminology

First, UTC, GMT, and even Zulu time are all the same thing. They’re basically a universal time clock that is not subject to changes in time zones or time changes. Each tick of the universal clock represents a moment in our perception of time.

Use UTC as long as possible

UTC is very useful when developing software because it removes the need to know where the time was from, or where it’s going to be used. We don’t even care when it was from, or when we’re displaying it. You can think of your local clock as a view of the time right now, where you are. It has already taken into account the time zone and daylight savings time.

These properties of your local clock suggest that we should always convert from the local clock to universal time as early as possible when accepting user input, and convert it back to the users time only when displaying it. This is a simple, easy to use pattern that may be enough to avoid some of the potential problems that other projects face. This pattern will give you the ability to cope with time changes and time zones much more easily.

Converting between local time and UTC is pretty easy. ToLocalTime will convert from universal time to local time. ToUniversalTime will convert to UTC. Just be aware that these methods have a certain amount of logic in them that only has the rules that were in effect when they were written. They are not perfect for all scenarios. You’ll also want to take a look at the Kind property, which affects which conversions you can perform, as well as providing a nice way to keep track of whether or not he time has been adjusted to UTC.

Daylight Savings Time & Time Changes

Every year in many parts of the world, the time changes. Apparently the idea is to save gobs of money by using the sunlight more efficiently instead of using artificial lights. Unfortunately, this really sucks for software developers.

I used to write software for manufacturing facilities that would run during a time change. If you have software that records and time-sensitive data during a time change, your software had better be prepared to handle it the fact that one hour is skipped, and another is repeated. Storing the data in UTC solves part of the problem. Unfortunately, when you try to display the data you’ll have an hour of missing data, and a hour with overlapping data. You may have to design your user interface to deal with this.

Fixed-time Appointments

Unfortunately, UTC doesn’t solve all of our time offset problems. Let’s say that you have an appointment that you’re scheduling for a future date that occurs when DST is in effect, but it’s not in effect right now. You choose 5:00am for your appointment time. Your application happily converts the time to UTC, and the reverse process expectedly yields the same result. The problem is, the time offset when the appointment occurs will be different than it is now. Daylight savings for the central time zone for example, switches between and offset of –5 and –6. This diagram attempts to visualize:

 DST DateTime Diagram

What we want to store is the fact that our appointment occurs at 5:00am local time. If we simply store the information as UTC, we’re losing this additional information. When we switch to non-DST time and use our current time adjustment of –6 hours, our appointment now occurs at 4:00am.

If you’re writing an application that stores fixed-time appointments as well as appointments that are designed to have even intervals (exactly 1 month apart, etc) or occur in a different time zone or DST, you’ll need to store an additional flag with the event so you can make the determination if it needs to be adjusted.

Conclusion

Times can be complicated depending on the requirements of your project. It would be unwise to work these problems out toward the end of a project, because the consistency of usage can’t be guaranteed. Do yourself a favor and plan ahead for these issues, and it will be much easier.

Speaking at Day of .NET at Fox Valley Tech

If you’re interested in hearing about writing practical unit tests in .NET, I’ll be speaking at the Fox Valley .NET user group “Day of .NET” event May 9th! Here is the synopsis for your reading pleasure:

Want to learn how to write good automated unit tests that are beneficial both to the product/customer and to you as a developer? See an overview of the mechanics of unit testing including the tools and frameworks available. You’ll see examples of how to test existing code, but you’ll also see practical examples of how seemingly un-testable code can be designed so that it can be tested with ease. Learn how test driven development and refactoring will improve the readability of your code, minimize debugging, and speed up development.

image

If you’re anywhere near the Northeast Wisconsin area, stop in. It’s free!

I’ll be publishing both the presentation and a supporting 25+ page paper shortly, so make sure you’re subscribed to my feed.

Fox Valley Tech is located at:
1825 N. Bluemound
Appleton, WI 54912

int inherits from object? An investigation into how.

I’ve began working with a study group which was formed to study for the .NET Framework Application Development certification exam (70-536). I’m eager to get certified because I think it helps fill-in knowledge gaps that I may not have necessarily took the time to focus on normally. One of the first things that came up in our study group is the fact that int, which is an alias for System.Int32, derives from Sytem.ValueType, which, in turn, derives from System.Object. Let’s take a close look at what that actually means, and how it’s implemented.

When I first heard that int ultimately derived from Object, I didn’t believe it for a number of reasons:

  • If you inherit from Object, that derived object IS an object
  • If int is a subclass of Object, then boxing isn’t necessary

The truth is, my assumptions were not correct. The .NET team, for consistency sake, made all types fit nicely into the type hierarchy:

Object Hierarchy

If the .NET team had built System.Int32 to work like any other reference type, there would be clear performance issues. In reality, we need value types to be lean and mean. They are stored on the stack (instead of the heap for reference types). To do this, there is some internal “magic” going on that treats objects that inherit from ValueType differently. Behind the scenes it optimizes how they’re used to get the best of both worlds. If you try to inherit from ValueType, you’ll get a compiler error, because it is only exists for build-in value types.

Of course we want our cake and we want to eat it too. There are often times when you want to use a value type in a method that takes an object as a parameter. To keep up the illusion that a value type is an object, the framework employs boxing. It effectively wraps the value type inside of an object.

Let me out, I'm an object!

Take a look a the following code:

public string GetObjectString(object obj)
{
    return obj.ToString();
}

[TestMethod]
public void IntAsObject_Boxing()
{
    var str = GetObjectString(4);
    Assert.AreEqual("4", str);
}

And look at the corresponding IL for the test method:

IL_0001:  ldarg.0
IL_0002:  ldc.i4.4
IL_0003:  box        [mscorlib]System.Int32
IL_0008:  call       instance string _4_17_09_Boxing.UnitTest1::GetObjectString(object)
IL_000d:  stloc.0
IL_000e:  ldstr      "4"
IL_0013:  ldloc.0
IL_0014:  call       void [Microsoft.VisualStudio.QualityTools.UnitTestFramework]Microsoft.VisualStudio.TestTools.UnitTesting.Assert::AreEqual<string>(!!0, !!0)
IL_0019:  nop
IL_001a:  ret

Notice that boxing occurs on line 3. It’s using the IL “box” command to let you stay oblivious to the fact that there is some magic going on behind the scenes.

Conclusion

The end result is that we have an integer, which is an object, but isn’t really, that needs to be wrapped inside of an object, which shouldn’t be necessary, but is because it is. :-)

Does this really matter? Well, not really. For the most part you don’t need to know this. If you’re truly inquisitive and want to know what’s going on, you may find it interesting.