Monday 26 September 2016

Agile Version Control

Introduction

A mistake often made when adopting Agile is insisting on certain Agile practices and outcomes without converting to using the necessary tools and techniques (see the CASE STUDY below for an example). This is one major deficiency of Scrum or, at least, of using Scrum by itself.  Scrum does not require necessary development tools (and even some essential processes) that allow Agile to work. I have talked about this previously (eg the Summary of November 2013 post).

A crucial practice in Agile is Continuous Integration (CI).  CI is difficult, if not all but impossible, without certain tools and practices, such as automated builds (ye olde build box), Agile (JIT) Design, etc. I will also mention Unit Tests here (again :) as without their safety net you cannot hope to make CI work.  CI also depends on using a modern version control system, like Git, and using it in the right way.  This is what I want to talk about.

CASE STUDY
A few years ago I was working on a project where management insisted on a move to Agile with the aim of creating new software releases every few weeks, instead of every few months, as was previously done (ie, about 4 to 6 times more frequently). However, no new tools or development infrastructure was introduced to facilitate this. Moreover essentially the same procedures were used.  The development procedures alone were onerous, but not as bad as testing and release procedures (of which I had little understanding and will make no comment).

For an unlucky developer there was a tedious and error-prone procedure for every new release. It was bearable when done a few times per year but less bearable when it had to be done more often. This was a typical Waterfall development approach where the project was branched for the new release so that bug fixes could be made on the branch without affecting ongoing development. (I will explain this sort of approach in detail below.)

The major steps were essentially
• Branch the project in VSS, then delete some of the unneeded branched files
• Branch and move some global headers shared between projects
• Manually modify project files to handle VSS problems and change global header locations

This whole process usually took one developer at least a day if everything went well. This is not an exaggeration, though the whole process was exacerbated by the use of VSS and a large manual process that should have been automated.

I will get to the point of this post in a moment but first I give a brief overview of how version control relates to the development process and how it was used before Agile came along.

NOTE: If you are familiar with version control concepts then you can skip to the Continuous Integration section below.

Version Control

All version control systems allow you to keep track of the changes made to source files. One advantage of this is that you can see how the software has evolved over time. This can provide a deeper understanding of code than can be obtained by just looking at the current state. Being able to compare source files from different times is invaluable when investigating why a change was made, how bugs were introduced, etc.

Moreover, you can get a snapshot from any point in time. For example, in the diagram below you could use the version control system to "checkout" the source as it was at the time of Release 1.0. You can then build that specific version if you need to investigate its behavior.

Diagram 1. Basic Version Control

Each box in the diagram represents a check-in of one or more files. Of course, this is a simplified diagram - real projects have many more check-ins (hundreds or even thousands).

Another essential facility of a version control system is branching. This allows variations to be developed from a common base version. Traditionally, branching has two uses:
  • release branching - a branch is created when a new version is released
  • feature branching - a branch for an experimental or long-term development

Release Branching

Release branching (sometimes called fix branching) is very common (if not ubiquitous) in pre-Agile development.  It allows released versions to be quickly fixed while not interfering with ongoing development. For example, consider a software project with two releases: versions 1.0 and 1.1, with ongoing development on version 2.0.
Version Control Jargon      

Repository (repo) = file historical storage
Checkin = add or update file(s) to the repo
Checkout = obtain a local copy of file(s)
  usually in order to update and checkin
Commit (v) = checkin
Commit (n) = files that were checked in
Merge = combine changes from 2 sources
Working Copy (WC) = local copy of files
HEAD = pointer into the repo for the WC,
  usually the most recent commit on the trunk
Branch = fork in version history
Trunk = ongoing development "branch"

Now imagine that a user has found a critical bug in version 1.0 (Bug 2 in the diagram below). You can't reproduce the bug in the latest version but you can reproduce it in 1.0 (and 1.1). Of course, you can't simply give the customer a copy of 2.0 as they have not paid for the new features and, in any case, it is not ready for release. You need to provide a fix for version 1.0.

You check out the code for 1.0 to view and debug it and quickly find the problem. Now you can check-in your fix to the branch for version 1.0. You also port and check-in the fix to the version 1.1 branch as well. (For completeness you also check why the bug no longer occurs in 2.0 - it may simply be hidden by other changes or obviated by some later development.)

Diagram 2. Release Branching

Feature Branching

Feature branching is traditionally used for a development that needs to be separate from the main ongoing development. This may happen for various reasons:
  • the development is experimental and may not prove to be viable
  • the development is not certain to be needed (eg, for proposed legislation)
  • the development is for a large feature that overlaps with other release(s)
Diagram 3. Feature Branching

These branches are always intended to be merged back into the trunk, but it can happen that the branched code is not required and so is discarded, eg if the experimental development is found not to be viable.

I have been involved with a few feature branch developments and they are notoriously tedious and troublesome. The first problem to avoid is that by the time the feature branch is merged back into the "trunk" there are so many incompatibilities caused by the divergent code that it can be difficult or even impossible to merge the differences. In this case a great deal of work is required to integrate the changes and often this involves workarounds and kludges that corrupts the integrity of the software design. It's not uncommon for the feature to have to be completely rewritten to be compatible with the current ongoing project.
“feature branches
can be difficult
or impossible
to merge”


Because of the above problem developers have learnt to "merge early and often". That is, changes on the trunk should be regularly merged into the feature branch to avoid divergence. Of course, this is a tedious and time-consuming process that tends to get skipped due to more urgent tasks. It often also requires discussions between members of the feature and maintenance teams to understand what the code does and how best to merge the differences.

Diagram 4. Merging Trunk Changes

Diagram 5. The completed feature is merged into the trunk

Continuous Integration

These sorts of problems of merging and integrating code (as well as other problems) led to the practice of continuous integration (CI) which is core to the Agile approach to software development. But even without Agile, CI avoids integration headaches, improves common understanding and communication in the team and generally results in a better design and less bugs. It is an example of DIRE since you are not isolating the new features from the rest of the code as it evolves.

Agile Approach

CI enables the agile approach of delivering small improvements that slowly but surely moves the development towards the target. The target, of course, is the PO's understanding of what is needed and which may itself be moving.

Each atomic development task, called a User Story, needs to be small enough to be completed in a few days (and certainly within the current sprint). If the task is larger than that, then it needs to be split up.
What is a User Story?   

User Stories are used in Agile as a replacement for "specs". A User Story is a simple statement about a change or enhancement to the software. This is often written on a small card in the format:

As <A> I want <B> so I can <C>  where:

<A> = the person/group requiring the enhancement -
  often a software user, but can be anyone
<B> = a simple description of the enhancement
  from the perspective of <A>

<C> = the purpose or benefit of the enhancement -
  this can be skipped but I highly recommend it

A User Story is almost all the written documentation you need to specify all changes to the software.  Of course, for a large feature you will have many User Stories grouped into an Epic.

The other written documentation you need is a handful of Acceptance Criteria written on the back of the related User Story card. These explain how you can check that a User Story is complete.

Example:

As an administrator I want to be able to change my password so I can ensure the security of the system

Acceptance Criteria:
1. old password must be entered first
2. new password must be entered twice to catch typos
3. new password must be different to old password

The common argument against this approach is that it is inefficient - it's better to understand the problem, come up with a solution and implement it all in a controlled manner. In theory this sounds like a good argument, in practice it doesn't work (see May 2014 post on Agile Design for more on the evils of BDUF).  If BDUF did ever work as it's supposed to (which it very rarely - if ever - does) it would be more efficient. But even then the Agile approach is more reassuring to the PO/users/stakeholders; even in that worst case it still has the perception of greater productivity since everyone can see progress being made.

A stronger argument against the Agile approach is that there are some complex tasks that cannot be decomposed into simpler ones - they cannot be tackled at all with an evolutionary approach. Again this may be theoretically possible but I have never encountered such a situation in practice. Once you get the hang of it,  it's easy to find a way to work towards a goal while keeping the software useable and useful at every point along the way (or at least at the end of every sprint).

The crucial point is that User Stories are designed such that at every stage the software can be used. At the end of every sprint the PO will have a working, bug-free piece of software that can be tested and even delivered to real users. To make this work you need a certain type of version control system.

So what sort of version control do you need for Agile?

In the end many things in Agile - short sprints, small User Stories, JIT Design, feature teams, and CI - work together and depend on a version control system that allows easy branching and (especially) merging. Having a clumsy or manual merging process is not an option as User Stories are continually being merged back into the trunk.

Conventionally version control systems treat the relationship between versions as a tree. If you look back at all the above version control diagrams (ignoring the dashed arrows) you will see that they are all tree diagrams. (I know, it's obvious that you need branches to form a tree.) Modern version control systems help you merge code between branches (the dashed arrows leading into the blue boxes) but you still need to manually keep track of where the merge comes from and which bits have been merged already.

This is where Git  comes in.

Git

In my opinion Git is the only version control system that should be used for Agile development. Git has one killer feature - a version can have two parents. Git can automatically merge versions always keeping track of things so that it does not miss versions or try to merge the same thing more than once.

This means that a version "tree" becomes instead a "DAG" (directed acyclic graph) because each version can have two - not just one - parents.

Before I discovered Git I used another fine version control system called SVN (short for Subversion), starting about 10 years ago, and found it a joy to use except for one thing - on occasion I would need a long-term branch which was painful to keep updated with trunk developments. To avoid a nasty surprise when the branch had to be merged back into the trunk I regularly merged trunk code into the branch (as in Diagram 5 above). However, to make sure that changes were not missed, or the same change merged more than once I had to manually keep track of what versions from the trunk had been merged into the branch. This was tedious and error-prone and something that Git does for you.

Agile Version Control

Agile version control using Git is simple. A developer branches the code to work on a User Story. Git makes it easy to merge the branch back into the trunk. A simple example is shown in the following diagram where all User Story branches are merged back into the trunk by the end of each sprint.

Diagram 6. Agile Version Control

However, generally you need control of what features are delivered to "production". This is often accomplished by having dual streams - an on-going "development" stream (or branch) and a separate "delivery" stream (trunk) allowing control over when features are delivered.

Diagram 7. Dual Streams

This is very different from traditional version control where branches are eventually discarded (after possibly having been merged back into the trunk) - instead you have two on-going streams. This approach is only possible with a version control system such as Git where a version (ie, a node in the diagrams) can have two parents - in the diagrams this is any node with two outgoing arrows.

For a large project with multiple teams I have even seen the suggestion of multiple on-going "development" branches (eg: see Version Control for Multiple Agile Teams). I have not tried this but I have reservations because code merges between the teams would occur irregularly and might easily be forgotten (remember the rule of merge early and often). The two teams might create conflicting changes which are not discovered until the conflicting code is merged from the trunk into the other teams stream.


Diagram 8. Multiple Development Streams

Summary

Agile version control is very different to traditional version control. It is performed using many small feature branches which are being continually merged back into the trunk (or main development stream). This is necessary for the practice of Continuous Integration (CI) which is a core part of the Agile approach.

CI is an example of JIT (and hence DIRE) allowing problems to be found as soon as possible. It also supports other Agile practices such as short sprints and evolving the software using small, simple, user-centric User Stories. Use of CI depends on a version control system that allows easy branching and merging.

Most Agile teams also have two ongoing code streams (see Diagram 7) - the development "branch(es)" and the delivery "trunk". Again, this relies on a version control system that supports easy merging.

As far as I know Git is the only version control system currently available where a version node in the repository can have two parents. In other words Git allows you to automatically and safely merge code from different sources.

Although Git is not without it's problems (which I will discuss next month) I think using it is essential for Agile development to work smoothly. I will discuss the day-to-day use of different version controls systems (including Git) next month.

3 comments:

  1. Your "dual Streams" version control is essentially GitFlow.

    ReplyDelete
  2. Thanks for that. I had not heard of GitFlow before but it seems to have a large following! I had a quick look at how GitFlow works and it is similar, in that they both have on=going streams.

    However, there are a few things I don't like about GitFlow which I will explain in my next blog (November).

    ReplyDelete
  3. This was very good stuff.It will really help me.Thanks for a nice share you have given to us.

    ReplyDelete