Archive for software development

Introduction to Distributed Source Control

Version control systems manage the changes of documents. In software development, their primary purpose is to store the source code for an application, as well as every revision created during its development.

Currently, many developers use a centralized version control system such as Visual Studio Team System (VSTS) or Subversion. With such systems there is a central repository (i.e., Team Foundation Server (TFS)), usually located remotely, that houses the different versions of source code.

image

Unfortunately, a number of issues accompany typical source control systems that are based on a centralized repository, including, but not limited to the following:

  • Many operations such as checking code in or out can perform poorly over slow connections.
  • Working offline results in a reduced set of functionality, such as branching or committing multiple features or bug fixes.
  • Moving the repository can be difficult due to the fact that there is front-end and back-end management.
  • Working between networks that may never become bridged is impossible or difficult, since a connection must be made to the central repository.
  • Private work is typically not under source control.
  • There is often a single point of failure.
  • Security must be managed, and may become complex due to multiple permission sets and projects.

Distributed source control systems (or distributed version control systems, "DVCS" for short) are starting to gain popularity because they offer many advantages over the traditional, centralized repository.

image

They allow users to work independently in either a connected or disconnected environment. There is a tremendous amount of flexibility in regards to merging, managing different branches of development, and managing product features.

Adopters of this technology include Google Code and Sourceforge. Moreover, many major projects such as GNOME, Perl, MySQL, Python, and Ubuntu are also using a distributed source control system.

You may have already heard about some of the popular implementations. Git, Mercurial, and Bazaar are a few choices that have started to become mainstream. If you’ve worked with Subversion, you’ll find that migrating to this new generation of source control systems doesn’t mean giving up the features that you’re used to.

There are many problems with centralized repositories that simply disappear when you’re working with a distributed system:

  • Merging is a core feature and works how and when you want.
  • Security is trivial since everyone works in their own sandbox. You simply choose who you allow pushing and pulling changes to and from. In open source projects, this typically means allowing certain trusted individuals to push changes to the project repositories. When needed, additional security models such as authentication can be imposed.
  • Working disconnected doesn’t require any preparation. You are working offline by default. The only online operation is synchronizing with other repositories.
  • All operations are near-instantaneous. Synchronizing is the only operation that is dependent on the speed of your connection.

 

What is a Distributed Source Control System?

Distributed source control systems have the same purpose, but work much differently than systems like Team Foundation Server (TFS), SourceSafe, Subversion, and CVS. Instead of having a single repository that contains the source code and history, there are many repositories that have the source code, and some or all of its revision history. One or more peers have repositories for a project, and synchronize what they want, when they want to. There are really no requirements or restrictions. The focus is on synchronizing and working independently.

Workflow

Examples & screenshots included here are from Mercurial using the TortoiseHg explorer extension. Git has similar functionality using TortoiseGit. You can also use the command line for all/some operations if you prefer.

1. Clone a repository – To create your own local repository, you have to clone (copy) all or part of an existing repository. Since each developer has a copy of the repository, you can clone it from anyone.

image

2. Create a "working copy" of the code – Even though you have the full repository from another developer, you still need to "check-out" or get the latest version of the code. Since the repository is local, this operation is quick and can be done offline.

3. Make changes – Simply make any changes you like, without concerning yourself about how your source control works. This is similar to Subversion, and contrasts sharply with TFS which needs to track any changes you make by interfacing with Visual Studio.

4. Check-in changes – When it’s time to check in your local changes, they are simply committed to your local repository. They do not affect any other repository. Changes are detected by comparing the newest committed revision with the current version on disk.

image

5. Push or pull changesets – To actually send your changesets to another repository, you need to "push" or "pull" them. In the Mercurial dialog below, there are options labeled "Incoming" and "Outgoing" which simply compare the local changes with the remote changes and determine what will get pushed or pulled. The "Push" and "Pull" operations send your changesets to another repository, or pull changesets from another repository respectively.

image

When changesets are transferred between repositories, they do not affect any working copies. This flexibility allows changes to be synchronized without affecting work in progress.

Online/Offline Operations

Operation

TFS

Subversion

Mercurial/GIT/Bazaar

Get/Update

Online

Online

Offline

Check-out

Online

Online

Offline

Check-in

Online

Online

Offline

View History

Online

Online

Offline

Revert

N/A

Offline

Offline

Compare working changes

Online

Offline

Offline

Change tracking

Limited*

Offline

Offline

* Changes can be made in a special "offline" mode, and edited files will be checked-out when returning to "online" mode.

Merging Divergent Development Branches

In traditional, centralized source control systems, the only way for a divergence in code paths was to explicitly create a branch. While this is still possible in a distributed source control system, it is also possible for multiple developers to make independent changes that may or may not conflict.

The beauty of the system is that divergent code paths can be merged at any time. It is possible for the developers to make multiple changes, perform multiple synchronizations (pulls), yet not have to merge until they want to, or until they need to push their changes to another repository.

image

image

Where Is My Repository?

If you’re working with a team of 20 developers, and each one has a full copy of the repository, you don’t need a central repository. However, there are a few reasons why it is recommended:

  1. Central backup location – Even though you have numerous copies of the repository, it is still useful to have a single location that can be used as a place where an automated backup process is able to find it.
  2. Central communication hub – The logistics of pushing and pulling code between a number of developers can get complicated. Distributing your repository simplifies many of these problems, but is not perfect by itself. Having a central "authoritative" repository can make it quick and easy for developers to collaborate.
  3. Central location for builds – Automated builds and continuous integration servers need a location to pull source code from, which an authoritative repository provides.
  4. Central merge location – If multiple developers are pulling changes from one another, implicit branches can be created. A central repository serves as location where all of these branches are merged into one development line.

Repositories can typically be easily hosted internally using Apache, a built in daemon, CGI script, or simply just a file share. For simplicity, there are many services that provide repository hosting. For Git, there is GitHub and SourceForge. For Mercurial, there is BitBucket, Google Code, and SourceForge.

The beauty of distributed source control is apparent when you take into account the administrative overhead of a central server. Since the central server is no different than any other peer, it can be easily moved or modified. For example, you can start out with no central server, then you can use BitBucket to store your revisions, then you can move to another service within minutes. Changing providers simply means pushing your changes to another server.

Common Operations

Importing Existing Code

Importing existing code is an extremely simple operation. If it is new code that is not yet under source control, you can simply create a new repository within the folder that contains your code. You can then check in your code as desired.

In Subversion, the import process involved importing the code into the repository, and then checking out a working copy. Mercurial does not have this complexity.

image

Mercurial also comes with built in support for converting existing Subversion repositories to Mercurial repositories, including the entire revision history. More information is available here.

To convert from an older source control system such as Visual Source Safe, you can first convert the repository to Subversion, and then to Mercurial.

  Checking-in Code

It is worth mentioning a typical philosophical difference between how some source control systems promote the check-in process for changes. Systems like TFS and Visual Source Safe only provide limited functionality for reverting and re-applying specific changesets. For this reason, developers tend to check-in groups of unrelated changes. This tends to lead to less useful generic or incomplete comments such as "done for the day".

Flexible source control systems such as Subversion, Mercurial, and Git provide a lot more value when changesets are fine-grained, and represent a single change to the system. For example, renaming a page and changing the tab order are two changes that should be checked-in separately. If needed, either feature can be pulled in or out, moved, synchronized, or used to patch other versions. It also reduces the likelihood of conflicts, and typically makes conflict resolution easier. Other developers can quickly scan through the changelog and get a clear list of the features that were added, or bugs that were fixed. In an ideal world, all commits should be tied to a bug or feature to increase traceability.

Managing Branches & Releases

It is simple to create explicit branches that allow you to maintain parallel development of different features or versions. Branching simply involves entering a branch name when you commit your code. Switching between branches is as easy as performing an update to the latest revision of a branch. In contrast, TFS requires a branch to be created before you can commit changes to it. TFS also keeps a copy of each branch on the developers machine, which is optional with Subversion, Mercurial, and others.

Since changes can be made independently, there is also a concept of implicit branching. If we have two users, Ann and Bob, they are free to make changes independently of each other. If Ann checks in her changes, and then Bob pulls down those changes while having changes of his own, there are now two implicit branches of development. In this case pulling changes will automatically create multiple parallel lines of development. Changes cannot be pushed unless the code has been merged. The system is designed this way so that merging is only necessary when pushing, typically to a central repository or build server. The effect is that repositories that are only "pushed to" can be easily and cleanly maintained remotely.

Most distributed source control systems include tools that allow a visual display of code branches. This functionality is also likely to be included in Team Foundation Server 2010.

Tagging Revisions

In order to mark the significance of certain revisions, they can be tagged with a specific label. For example, when you release a specific version of your project, you can tag that revision with the label "v1.2" as seen below. Additional flexibility is provided by the "local tag" functionality, which lets you tag code on your computer without sharing the tag with others.

image

Terminology

Distributed Version Control System (DVCS) – Version control systems manage the changes of documents. In software development, their primary purpose is to store the source code for an application, as well as every revision created during its development.

Repository – A container for a set of changes that represent the history of the source code for a project. A repository may have the ability to store a partial history of the project, or the entire history. The repository is typically optimized by using compression and by only storing deltas or changes of files.

Changeset/Revision – A particular "delta" or change in the codebase. This can include any type of change, in any number of files. Visual Source Safe stored revision numbers for each file. Team Foundation Server and Subversion have global revision numbers for the entire repository. Distributed source control systems often use GUID’s or hash codes to represent specific revisions.

Working copy – A particular revision of the code that has been extracted or checked out from the repository. This revision includes the full version of all the files involved so that the developer can load and make changes to the code.

Bundle – A bundle is a file that contains a set of changes that is intended to be sent to another user to update their repository. This technology allows users to be physically disconnected yet pass code changes to each other. This file typically employs some form of compression to minimize file size.

Patch/diff – A patch is a file that shows the changes between two versions of a file or multiple files. It contains enough information to transform the old version into the new version, or vice-versa. It’s a quick way of sending someone a changeset. Patches are usually in the "unified diff" format, which looks like the following:

--- /path/to/original timestamp
+++ /path/to/new      timestamp
@@ -1,3 +1,9 @@
+This is an important
+notice! It should
+therefore be located at
+the beginning of this
+document!
+
 This part of the
 document has stayed the
 same from version to
@@ -5,16 +11,10 @@
 be shown if it doesn't
 change.  Otherwise, that
 would not be helping to
-compress the size of the
-changes.
-
-This paragraph contains
-text that is outdated.
-It will be deleted in the -near future.
+compress anything. 

 It is important to spell
-check this dokument. On
+check this document. On
 the other hand, a
 misspelled word isn't
 the end of the world.
@@ -22,3 +22,7 @@
 this paragraph needs to
 be changed. Things can
 be added after it.
+
+This paragraph contains
+important new additions
+to this document. 

 

References

Recommended Reading

Common Pitfalls when working with DateTime’s

In .NET, the DateTime structure provides us wonderful functionality, but this seemingly simple structure can cause a lot of headaches if you don’t fully understand how to use it properly.

Clock

Understand the terminology

First, UTC, GMT, and even Zulu time are all the same thing. They’re basically a universal time clock that is not subject to changes in time zones or time changes. Each tick of the universal clock represents a moment in our perception of time.

Use UTC as long as possible

UTC is very useful when developing software because it removes the need to know where the time was from, or where it’s going to be used. We don’t even care when it was from, or when we’re displaying it. You can think of your local clock as a view of the time right now, where you are. It has already taken into account the time zone and daylight savings time.

These properties of your local clock suggest that we should always convert from the local clock to universal time as early as possible when accepting user input, and convert it back to the users time only when displaying it. This is a simple, easy to use pattern that may be enough to avoid some of the potential problems that other projects face. This pattern will give you the ability to cope with time changes and time zones much more easily.

Converting between local time and UTC is pretty easy. ToLocalTime will convert from universal time to local time. ToUniversalTime will convert to UTC. Just be aware that these methods have a certain amount of logic in them that only has the rules that were in effect when they were written. They are not perfect for all scenarios. You’ll also want to take a look at the Kind property, which affects which conversions you can perform, as well as providing a nice way to keep track of whether or not he time has been adjusted to UTC.

Daylight Savings Time & Time Changes

Every year in many parts of the world, the time changes. Apparently the idea is to save gobs of money by using the sunlight more efficiently instead of using artificial lights. Unfortunately, this really sucks for software developers.

I used to write software for manufacturing facilities that would run during a time change. If you have software that records and time-sensitive data during a time change, your software had better be prepared to handle it the fact that one hour is skipped, and another is repeated. Storing the data in UTC solves part of the problem. Unfortunately, when you try to display the data you’ll have an hour of missing data, and a hour with overlapping data. You may have to design your user interface to deal with this.

Fixed-time Appointments

Unfortunately, UTC doesn’t solve all of our time offset problems. Let’s say that you have an appointment that you’re scheduling for a future date that occurs when DST is in effect, but it’s not in effect right now. You choose 5:00am for your appointment time. Your application happily converts the time to UTC, and the reverse process expectedly yields the same result. The problem is, the time offset when the appointment occurs will be different than it is now. Daylight savings for the central time zone for example, switches between and offset of –5 and –6. This diagram attempts to visualize:

 DST DateTime Diagram

What we want to store is the fact that our appointment occurs at 5:00am local time. If we simply store the information as UTC, we’re losing this additional information. When we switch to non-DST time and use our current time adjustment of –6 hours, our appointment now occurs at 4:00am.

If you’re writing an application that stores fixed-time appointments as well as appointments that are designed to have even intervals (exactly 1 month apart, etc) or occur in a different time zone or DST, you’ll need to store an additional flag with the event so you can make the determination if it needs to be adjusted.

Conclusion

Times can be complicated depending on the requirements of your project. It would be unwise to work these problems out toward the end of a project, because the consistency of usage can’t be guaranteed. Do yourself a favor and plan ahead for these issues, and it will be much easier.

Practical .NET Unit Testing – Free paper released

I’ve been working on a unit testing paper that sums up my experience in unit testing, and discusses some of the core information that I feel is important about the subject. It’s very much a work in progress, but I wanted to get it out sooner rather than later. I’ll be continuously updating it as time goes on.

Update: I updated the PDF location to one that doesn’t require registration.

Practical .NET Unit Testing

There are some really great books out there about unit testing, but I think some of them are trying too hard to be long enough to be considered a “book”. I set out to create a document that fills the gap between the various snippets of information from blog posts, and the comprehensive books on the subject. If you’re interested in something a bit more in-depth, here are some great books on the subject:

The paper currently consists of 5 main sections:

  • Why Write Unit Tests?
  • Unit Test Mechanics
  • Common Unit Testing Strategies
  • Designing for Testability
  • Advanced Techniques

Here is a more complete snapshot of the current outline:

  • Introduction
  • Unit Testing & Managers
  • What Unit Tests Really Do
  • Types of Testing
  • Testing Framework
  • Test Runner
  • Unit Test Structure
  • Other Test Attributes
  • What is Refactoring?
  • Test Driven Development
  • Evolving Code
  • When Should You Write Unit Tests?
  • Test is for Functionality, Not Code!
  • The Constraints of Reality
  • Interfaces – Quick Overview
  • Using a Mocking Framework
  • Stubs
  • The Test Driven Design Paradox
  • Testing Under Pressure
  • Extracting Duplicate Logic
  • Modular Design Benefits

So what are you waiting for? Go check it out online instantly, you can even download it as a PDF if you like. Is anything missing? Is anything just plain wrong? I’d love to hear your feedback.

Remember, if you want to hear more about unit testing, I’ll be speaking in Northeast Wisconsin Saturday, May 9th.

Speaking at Day of .NET at Fox Valley Tech

If you’re interested in hearing about writing practical unit tests in .NET, I’ll be speaking at the Fox Valley .NET user group “Day of .NET” event May 9th! Here is the synopsis for your reading pleasure:

Want to learn how to write good automated unit tests that are beneficial both to the product/customer and to you as a developer? See an overview of the mechanics of unit testing including the tools and frameworks available. You’ll see examples of how to test existing code, but you’ll also see practical examples of how seemingly un-testable code can be designed so that it can be tested with ease. Learn how test driven development and refactoring will improve the readability of your code, minimize debugging, and speed up development.

image

If you’re anywhere near the Northeast Wisconsin area, stop in. It’s free!

I’ll be publishing both the presentation and a supporting 25+ page paper shortly, so make sure you’re subscribed to my feed.

Fox Valley Tech is located at:
1825 N. Bluemound
Appleton, WI 54912

Maintaining Consistent Line Lengths

Today’s tip comes from the “Anally Retentive” department. In the .NET CLR team likes to keep their lines of code under 110 characters long. I’m assuming that they’re trying to maintain consistency and readability. I often try to maintain an imaginary line length limit, but I doubt I’m very consistent.

Vertical line in Visual Studio

Fortunately, Visual Studio provides a hidden feature that lets you draw a vertical line in the text editor to show you where a certain line length would end. Fire up your registry editor and find this key:

HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\9.0\Text Editor

If you’re using a version of Visual Studio before 2008, you’ll need to decrement the 9.0 version number in the path above.

Then, add the following value (as a string or REG_SZ) with the name of “Guides”:

RGB(192,192,192) 110

The first part is the color, and the second part is the line length. Personally, I use a line length of 110 to stay consistent with how Microsoft has chosen to do it. I like the color listed above because it’s faint, but visible. Since the line is almost impossible to see in the screenshot above, here is an un-scaled screenshot of the line itself:

Vertical Line

To further enforce the 110 character limit, you could also resize the code portion of your Visual Studio window so that it’s near the line. This will make the line itself a little less annoying, while allowing you to use the rest of the window for other information. For example, take a look at how much room I have on a 1920×1200 screen when I horizontally resize my code window:

Utilizing a large monitor in Visual Studio 

Obviously this tip isn’t for everyone. You may be working with legacy code with long lines, or you might work on a team that doesn’t mind long lines. The great news is that Visual Studio is pretty accommodating to however you like to work.