Archive for software

Introduction to Distributed Source Control

Version control systems manage the changes of documents. In software development, their primary purpose is to store the source code for an application, as well as every revision created during its development.

Currently, many developers use a centralized version control system such as Visual Studio Team System (VSTS) or Subversion. With such systems there is a central repository (i.e., Team Foundation Server (TFS)), usually located remotely, that houses the different versions of source code.

image

Unfortunately, a number of issues accompany typical source control systems that are based on a centralized repository, including, but not limited to the following:

  • Many operations such as checking code in or out can perform poorly over slow connections.
  • Working offline results in a reduced set of functionality, such as branching or committing multiple features or bug fixes.
  • Moving the repository can be difficult due to the fact that there is front-end and back-end management.
  • Working between networks that may never become bridged is impossible or difficult, since a connection must be made to the central repository.
  • Private work is typically not under source control.
  • There is often a single point of failure.
  • Security must be managed, and may become complex due to multiple permission sets and projects.

Distributed source control systems (or distributed version control systems, "DVCS" for short) are starting to gain popularity because they offer many advantages over the traditional, centralized repository.

image

They allow users to work independently in either a connected or disconnected environment. There is a tremendous amount of flexibility in regards to merging, managing different branches of development, and managing product features.

Adopters of this technology include Google Code and Sourceforge. Moreover, many major projects such as GNOME, Perl, MySQL, Python, and Ubuntu are also using a distributed source control system.

You may have already heard about some of the popular implementations. Git, Mercurial, and Bazaar are a few choices that have started to become mainstream. If you’ve worked with Subversion, you’ll find that migrating to this new generation of source control systems doesn’t mean giving up the features that you’re used to.

There are many problems with centralized repositories that simply disappear when you’re working with a distributed system:

  • Merging is a core feature and works how and when you want.
  • Security is trivial since everyone works in their own sandbox. You simply choose who you allow pushing and pulling changes to and from. In open source projects, this typically means allowing certain trusted individuals to push changes to the project repositories. When needed, additional security models such as authentication can be imposed.
  • Working disconnected doesn’t require any preparation. You are working offline by default. The only online operation is synchronizing with other repositories.
  • All operations are near-instantaneous. Synchronizing is the only operation that is dependent on the speed of your connection.

 

What is a Distributed Source Control System?

Distributed source control systems have the same purpose, but work much differently than systems like Team Foundation Server (TFS), SourceSafe, Subversion, and CVS. Instead of having a single repository that contains the source code and history, there are many repositories that have the source code, and some or all of its revision history. One or more peers have repositories for a project, and synchronize what they want, when they want to. There are really no requirements or restrictions. The focus is on synchronizing and working independently.

Workflow

Examples & screenshots included here are from Mercurial using the TortoiseHg explorer extension. Git has similar functionality using TortoiseGit. You can also use the command line for all/some operations if you prefer.

1. Clone a repository – To create your own local repository, you have to clone (copy) all or part of an existing repository. Since each developer has a copy of the repository, you can clone it from anyone.

image

2. Create a "working copy" of the code – Even though you have the full repository from another developer, you still need to "check-out" or get the latest version of the code. Since the repository is local, this operation is quick and can be done offline.

3. Make changes – Simply make any changes you like, without concerning yourself about how your source control works. This is similar to Subversion, and contrasts sharply with TFS which needs to track any changes you make by interfacing with Visual Studio.

4. Check-in changes – When it’s time to check in your local changes, they are simply committed to your local repository. They do not affect any other repository. Changes are detected by comparing the newest committed revision with the current version on disk.

image

5. Push or pull changesets – To actually send your changesets to another repository, you need to "push" or "pull" them. In the Mercurial dialog below, there are options labeled "Incoming" and "Outgoing" which simply compare the local changes with the remote changes and determine what will get pushed or pulled. The "Push" and "Pull" operations send your changesets to another repository, or pull changesets from another repository respectively.

image

When changesets are transferred between repositories, they do not affect any working copies. This flexibility allows changes to be synchronized without affecting work in progress.

Online/Offline Operations

Operation

TFS

Subversion

Mercurial/GIT/Bazaar

Get/Update

Online

Online

Offline

Check-out

Online

Online

Offline

Check-in

Online

Online

Offline

View History

Online

Online

Offline

Revert

N/A

Offline

Offline

Compare working changes

Online

Offline

Offline

Change tracking

Limited*

Offline

Offline

* Changes can be made in a special "offline" mode, and edited files will be checked-out when returning to "online" mode.

Merging Divergent Development Branches

In traditional, centralized source control systems, the only way for a divergence in code paths was to explicitly create a branch. While this is still possible in a distributed source control system, it is also possible for multiple developers to make independent changes that may or may not conflict.

The beauty of the system is that divergent code paths can be merged at any time. It is possible for the developers to make multiple changes, perform multiple synchronizations (pulls), yet not have to merge until they want to, or until they need to push their changes to another repository.

image

image

Where Is My Repository?

If you’re working with a team of 20 developers, and each one has a full copy of the repository, you don’t need a central repository. However, there are a few reasons why it is recommended:

  1. Central backup location – Even though you have numerous copies of the repository, it is still useful to have a single location that can be used as a place where an automated backup process is able to find it.
  2. Central communication hub – The logistics of pushing and pulling code between a number of developers can get complicated. Distributing your repository simplifies many of these problems, but is not perfect by itself. Having a central "authoritative" repository can make it quick and easy for developers to collaborate.
  3. Central location for builds – Automated builds and continuous integration servers need a location to pull source code from, which an authoritative repository provides.
  4. Central merge location – If multiple developers are pulling changes from one another, implicit branches can be created. A central repository serves as location where all of these branches are merged into one development line.

Repositories can typically be easily hosted internally using Apache, a built in daemon, CGI script, or simply just a file share. For simplicity, there are many services that provide repository hosting. For Git, there is GitHub and SourceForge. For Mercurial, there is BitBucket, Google Code, and SourceForge.

The beauty of distributed source control is apparent when you take into account the administrative overhead of a central server. Since the central server is no different than any other peer, it can be easily moved or modified. For example, you can start out with no central server, then you can use BitBucket to store your revisions, then you can move to another service within minutes. Changing providers simply means pushing your changes to another server.

Common Operations

Importing Existing Code

Importing existing code is an extremely simple operation. If it is new code that is not yet under source control, you can simply create a new repository within the folder that contains your code. You can then check in your code as desired.

In Subversion, the import process involved importing the code into the repository, and then checking out a working copy. Mercurial does not have this complexity.

image

Mercurial also comes with built in support for converting existing Subversion repositories to Mercurial repositories, including the entire revision history. More information is available here.

To convert from an older source control system such as Visual Source Safe, you can first convert the repository to Subversion, and then to Mercurial.

  Checking-in Code

It is worth mentioning a typical philosophical difference between how some source control systems promote the check-in process for changes. Systems like TFS and Visual Source Safe only provide limited functionality for reverting and re-applying specific changesets. For this reason, developers tend to check-in groups of unrelated changes. This tends to lead to less useful generic or incomplete comments such as "done for the day".

Flexible source control systems such as Subversion, Mercurial, and Git provide a lot more value when changesets are fine-grained, and represent a single change to the system. For example, renaming a page and changing the tab order are two changes that should be checked-in separately. If needed, either feature can be pulled in or out, moved, synchronized, or used to patch other versions. It also reduces the likelihood of conflicts, and typically makes conflict resolution easier. Other developers can quickly scan through the changelog and get a clear list of the features that were added, or bugs that were fixed. In an ideal world, all commits should be tied to a bug or feature to increase traceability.

Managing Branches & Releases

It is simple to create explicit branches that allow you to maintain parallel development of different features or versions. Branching simply involves entering a branch name when you commit your code. Switching between branches is as easy as performing an update to the latest revision of a branch. In contrast, TFS requires a branch to be created before you can commit changes to it. TFS also keeps a copy of each branch on the developers machine, which is optional with Subversion, Mercurial, and others.

Since changes can be made independently, there is also a concept of implicit branching. If we have two users, Ann and Bob, they are free to make changes independently of each other. If Ann checks in her changes, and then Bob pulls down those changes while having changes of his own, there are now two implicit branches of development. In this case pulling changes will automatically create multiple parallel lines of development. Changes cannot be pushed unless the code has been merged. The system is designed this way so that merging is only necessary when pushing, typically to a central repository or build server. The effect is that repositories that are only "pushed to" can be easily and cleanly maintained remotely.

Most distributed source control systems include tools that allow a visual display of code branches. This functionality is also likely to be included in Team Foundation Server 2010.

Tagging Revisions

In order to mark the significance of certain revisions, they can be tagged with a specific label. For example, when you release a specific version of your project, you can tag that revision with the label "v1.2" as seen below. Additional flexibility is provided by the "local tag" functionality, which lets you tag code on your computer without sharing the tag with others.

image

Terminology

Distributed Version Control System (DVCS) – Version control systems manage the changes of documents. In software development, their primary purpose is to store the source code for an application, as well as every revision created during its development.

Repository – A container for a set of changes that represent the history of the source code for a project. A repository may have the ability to store a partial history of the project, or the entire history. The repository is typically optimized by using compression and by only storing deltas or changes of files.

Changeset/Revision – A particular "delta" or change in the codebase. This can include any type of change, in any number of files. Visual Source Safe stored revision numbers for each file. Team Foundation Server and Subversion have global revision numbers for the entire repository. Distributed source control systems often use GUID’s or hash codes to represent specific revisions.

Working copy – A particular revision of the code that has been extracted or checked out from the repository. This revision includes the full version of all the files involved so that the developer can load and make changes to the code.

Bundle – A bundle is a file that contains a set of changes that is intended to be sent to another user to update their repository. This technology allows users to be physically disconnected yet pass code changes to each other. This file typically employs some form of compression to minimize file size.

Patch/diff – A patch is a file that shows the changes between two versions of a file or multiple files. It contains enough information to transform the old version into the new version, or vice-versa. It’s a quick way of sending someone a changeset. Patches are usually in the "unified diff" format, which looks like the following:

--- /path/to/original timestamp
+++ /path/to/new      timestamp
@@ -1,3 +1,9 @@
+This is an important
+notice! It should
+therefore be located at
+the beginning of this
+document!
+
 This part of the
 document has stayed the
 same from version to
@@ -5,16 +11,10 @@
 be shown if it doesn't
 change.  Otherwise, that
 would not be helping to
-compress the size of the
-changes.
-
-This paragraph contains
-text that is outdated.
-It will be deleted in the -near future.
+compress anything. 

 It is important to spell
-check this dokument. On
+check this document. On
 the other hand, a
 misspelled word isn't
 the end of the world.
@@ -22,3 +22,7 @@
 this paragraph needs to
 be changed. Things can
 be added after it.
+
+This paragraph contains
+important new additions
+to this document. 

 

References

Recommended Reading

Azure – Performance, IoC, and Instances

Ever since the Google App Engine was released, I’ve been fascinated with cloud computing frameworks. The vision is to have a website that can scale from nothing to infinity, without having to worry about servers, viruses, uptime, etc. I’ve finally gotten a chance to play around with Azure, and I must say that I’m in love with the concept, but disappointed by the current reality.

Azure

Performance

I’ve taken a site that I consider a “playground site”, and converted it over to run in Azure. One of the metrics I wanted to look at was the responsiveness of the deployed application. I run the main version of the site on a dedicated server, and I don’t think it’s unreasonable to use that as a baseline. After all, the purpose of Azure is to have the advantages of all the different types of hosting, yet have less to worry about.

To gauge performance, I used the Firefox add-in called Firebug. This let me see the amount of time that each requested element took to be transferred from the server. It also gives some insight into the amount of time it takes for the page to render. In the future, I’m going to use some server tracing to find specific operations that may be taking longer.

This is the baseline data from http://www.simpletracking.com. As you can see, the page is served up very quickly. The page takes less than 100ms to render (1/10 of a second), and the entire page comes through in less than half of a second.

simpletracking.com

Now take a look at the same code running on Azure:

simpletracking.cloudapp.net

To render the page, instead of 89ms, it now takes ~650ms. It takes a full second for the entire page and its elements to be sent down to the client.

Running both pages several times started to give me interesting results. The dedicated server was giving me extremely consistent results (even with other users hitting it). Azure however, was all across the board. It was typically around 1 second for the entire page to render, but would spike up to 5 seconds occasionally. Personally, I think this is completely unacceptable performance. Hopefully this is not indicative of the performance I can expect once it’s released.

IoC

Azure is designed so that if you have an application that runs in medium trust, it shouldn’t require any conversion to run straight in Azure (in most cases). If you’re using a database, there are other restrictions because Azure doesn’t use a standard SQL database. In addition to these obvious issues, a non-obvious issue is that if you’re using an IoC container, it probably won’t run in medium trust.

My application uses the IoC container Spring.NET, which immediately failed. I suspected (incorrectly) that Windsor might have worked better, but couldn’t tell from the documentation. To make it easy to plug in different IoC containers, I started using the Common Service Locator. If you’re doing IoC without the common service locator, I really recommend you check it out.

I was then fortunate enough to find this page, which has great information on the different IoC containers and their Azure compatibility:

Castle Windsor – My preferred IoC container, but it won’t run under medium trust. Out!

StructureMap
– My second favorite IoC container. Runs under medium trust locally, but not under Azure. Submitted bug report to Jeremy Miller. Reading through the StructureMap user’s group, it looks like he’s going to try to fix that early this year.

Ninject
– I didn’t really monkey around with Ninject much. The sample code I saw was riddled with [Inject] attributes, which kinda turned me off. Apologies to @nkohari if I dismissed it too early.

Autofac
– Works great in medium trust under Azure, easy to configure, but doesn’t support registering arguments for constructor injection at configuration time. You have to specify them when you resolve the service.

Unity
– No problems at all! Worked great in medium trust on Azure, easy to configure, supports everything I need! I gotta say I’m really impressed by how far Unity has come in such a short time.

My only reasonable option was Unity, which is Microsoft’s IoC container. After another fun conversion, I was up and running! I honestly don’t have any complaints about their IoC offering.

Instances

The Azure team decided to introduce the concept of “Instances”. You have to decide how many virtual instances of a web server that you want running. I really don’t understand the logic here. Their sales pitch is all about handing unpredictable traffic patterns, yet an instance based approach just gives me another aspect of the application that I have to worry about. They’re promising that a configurable heuristics system will eventually be in place to handle the management of the number of instances. In effect, they are putting a band aid on a problem that they’ve created even before release.

Contrast this design with the Google App Engine. With their system, you don’t have to worry about configuring instances at all. It automagically scales from nothing to infinity automatically.

Instances on the worker roles make sense. Worker roles are not public facing, they are there to process data. By configuring the number of worker role instances, I can change the rate at which my data gets processed.

Conclusion

I realize that Azure isn’t even in beta yet, so I shouldn’t expect the world. I had my fingers crossed that their CTP would be production quality (wouldn’t that be nice?). I think that Microsoft will eventually have a great cloud platform on their hands, it’s simply a question of timing. Personally, I really don’t want to have to worry about uptime, scaling, RAID, drivers, viruses, etc. so I think cloud computing is the inevitable solution.

Convenient Synchronization with Mesh and DropBox

A couple of weeks ago, I finally signed up for DropBox. If you’re unfamiliar with the service, it’s a file synchronization service. You install a client on multiple machines, and you get a special folder (aka a dropbox). When you make changes on any computer, it’s synchronized with a central server, as well as the other clients.

image

Now that I’ve gotten the chance to put DropBox through its paces, I have to say that I’m very impressed. I’ve done a lot of operations that can sometimes choke file monitoring software like moving and renaming files, copying files while synchronizing, and in-use files. DropBox powered through like a champ, never giving me any errors, and without any noticeable mistakes.

In addition to simply synchronizing your files, their service also keeps a copy of your files on their server. Better yet, it automatically revisions the files. It seems to be fairly efficient, even considering all my files and revisions. Right now I’m only using 7.8% of the 2GB of space they give you for free.

One of the applications that I use the most is OneNote. Pretty much all of my disconnected thoughts go into OneNote until I can get them organized. I figured it was a great application to test the responsiveness of DropBox. I opened OneNote on two different computers. When I changed the text on one machine, the changes showed up on the other in 10-15 seconds. Perfect for keeping my notes in sync!

My one and only complaint about DropBox is that I can’t create multiple DropBoxes. A single DropBox is simple and efficient, but it would be nice to have a little more flexibility.

Live Mesh

A few nights ago, I got a demo of the Azure platform by a Microsoft Evangelist. Azure is a huge blank term for a group of confusing technologies. Even the name itself is confusing, since Azure is a cloud computing platform and is also the color of the sky when there are no clouds.

image

More importantly, one great thing to come out of the “Live Services” portion, is a free product called “Live Mesh”. It’s essentially a competitor to DropBox. The nice thing about Live Mesh is it’s flexibility. I can make any number of synchronized folders, and they all seem to be as reliable as DropBox. Thanks to a sophisticated permissions system, you can even share folders with other people. For example, you can have a folder set up to distribute your photos to your family.

The Microsoft Azure Evangelist showed us a demo with the client installed on his laptop, and another client installed on his Windows Mobile phone. When he takes a picture on his phone, it’s immediately pushed over to the other clients. It’s a neat trick, and does make my mobile device more useful.

imageimage

As far as I can tell, Live Mesh doesn’t have plans to support a revision system like DropBox. I think this is a horrible, horrible mistake. Having a file on multiple machines provides nice redundancy, yet if you accidentally delete a file on one computer, Live Mesh will happily delete every copy of it. It even happened to Scott Hanselman. In my opinion, this completely destroys any hope it has of competing with DropBox (at least for me). I’m hoping that they’ll add a backup feature, or someone will use their API to add it for them.

Others

One service I have yet to try is SugarSync. It looks promising because it syncs multiple folders, stores revisions, and even has a Windows Mobile version (although it’s missing real-time sync). On paper, it looks like it has all the options you would expect from this type of service.

Syncplicity looks respectable, but with so many alternatives, I’m just not sure if they have anything unique that sets them apart.

Conclusion

I think this type of application is going to have a huge market. This is one of those few killer app’s that if done well, will be on everyone’s computer. Obviously Microsoft’s offering will be positioned to dominate, but we all know that they don’t always have the absolute best product.

For now, I’ll be using DropBox for my main document folder. It suits my needs, and until it messes up, I won’t need to look elsewhere.

Advantages of a 3rd party diff/compare tool

I recently spent nearly an hour trying to figure out why all of my unit tests stopped working in a particular class. It turns out that I had accidentally deleted a single character in one of my strings, but the built in diff tool that comes with Team Foundation Server is very simplistic. Learn how and why you can replace your stock compare tool with something a little more powerful.

Here is a screenshot of what you’ll see in the stock Team Foundation Server compare tool:

Default TFS Compare Tool

See how hard it is to spot the difference? The problem is, ANY change on the entire line causes it to show up as “changed”. That includes whitespace changes. For this reason, I frequently end up with extra full lines that are colored as having been changed, making it harder to see the actual code changes. For the most part, I really don’t care about whitespace changes because they deal with the formatting of the document, and I’m more concerned with functional changes to my code.

The good news is that those smart guys at Microsoft make it easy to integrate a third party compare tool right into their tools. James Manning was even kind enough to include detailed instructions and the exact settings needed for every major compare tool. You can even use them for merging if you like.

Since my background is in Subversion and TortoiseSVN specifically, I pulled out my trusty KDiff3 (SourceForge) compare tool. It’s a common alternative for TortoiseSVN’s own diff tool.

After wiring up KDiff3, here is what I saw when I compared revisions:

KDiff3 Character Difference

Notice how easy it is to see that I changed a single letter (it’s obviously easier when it’s full-size).

There are other good reasons to use a third party compare tool (which vary by tool obviously):

  • Easily compare entire file structures (folder diff)
  • Inline editing
  • Easy to use outside of Visual Studio – often with an explorer context menu

I recommend giving a few of the compare tools a try and see which works best for you. I don’t really see any risk in using a third party compare tool, but there are certainly a lot of advantages that you may not even know you’re missing right now. I suggest also taking a look at WinMerge in addition to KDiff3, since it seems to be fairly popular and feature-rich as well.

ClearType in Remote Desktop with XP

A new feature in XP SP3 that should of particular interest to developers is ClearType over RDP (remote desktop protocol). If you occasionally use remote desktop to work from home, or work remotely to your development machine, please read on.

If you’re not familiar with ClearType, you can head over to Wikipedia for a full explanation. In a nutshell, it takes advantage of the fact that each pixel in an LCD screen actually has 3 sub-pixels. They can be “hacked” to improve the anti-aliasing of text displayed on the screen. I’ve been a huge fan of the feature, especially for source code, and I have a hard time living without it. In XP (locally), it’s turned off by default, but turned on in Vista.

ClearType Effects Dialog

I had always noticed the remote desktop would not give me ClearType. However, I became curious when I found the following options on my Vista machine. Apparently ClearType over RDP is now supported in the client, and is also supported when using Vista as the RDP server (no hacking needed).

RDP Experience Options

Unfortunately, these options have no effect when using XP. If you want ClearType over RDP with XP SP3 (sorry, only SP3+), add the following registry key:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\WinStations]
"AllowFontAntiAlias"=dword:00000001

After you add that registry key, simply reboot the server (XP), and reconnect. From what I can tell, the client options no longer matter. Even if I uncheck the “Font Smoothing” option, it still uses ClearType. It’s not a big deal, but I thought it was worth mentioning.

So far, after using this option for a while, I haven’t seen a significant performance impact over a VPN on the Internet.

If you want to take this a step further, install Consolas, it’s a font designed specifically for software development, and to take advantage of ClearType. It’s a free download from Microsoft.