Saturday, 14 December 2013

The Gas Factory Anti-Pattern

I know I promised to talk about the challenges of Unit Tests next (and I will soon) but I thought it an opportune moment to discuss a growing problem that greatly benefits from Unit Tests. I consider it to be one of the worst anti-patterns as it is largely unrecognized and has dreadful consequences for maintainability of software.

Luckily, most anti-patterns are occurring less frequently as more and more developers learn how to design and build software better. However, this anti-pattern
I sometimes discuss anti-patterns that I think are not well known or under emphasized. For example, see Layer Anti-pattern. If you don't know what anti-patterns are see the Wikipedia page.
seems to be on the rise. It also seems to occur most commonly in overtly well-designed code. I am talking about what some people have called the Gas Factory anti-pattern.

The Problem

Essentially the problem is caused by code that tries to do too much or be too flexible at the expense of being overly complex for the task for which it was intended. (I used to call it the Swiss Army Knife anti-pattern but Gas Factory has better negative connotations. :) This strikes at the heart of software design - see my first blog post Handling Software Design Complexity.

I first encountered this problem about 20 years ago (though it did not have a name until recently). I believe it is mainly due to the fact that about that time (or earlier) code reuse was being pushed as the software development "silver bullet" and a generation of designers got into the habit of trying to make their designs more general and flexible. But this flexibility comes at the largely unappreciated cost of complexity (and reducing complexity has always been my number one aim in life :). I have already covered a lot of this in my blog on the problems of making code reusable at Reusability Futility, so I won't go into that again.
Doing a Google search it seems that some people think that the Gas Factory is caused by the code bloat of a very inexperienced programmer. (You know the sort of code you would write when you first started to learn to program.) However, this is simply due to inexperience.

The Gas Factory anti-pattern actually tends to affect experienced developers more than inexperienced ones.

The problem is that the "best" designers are often the worst offenders. Just look at the horrendous Windows API which I have heard was created by very experienced developers. (Luckilly .Net is much better.) The practice is often motivated by the best of intentions and on the surface may even seem like the right thing to do, but the side-effects can be severe.

Thought Processes

I have identified some thought processes that I, and others, have been guilty of in the past that has caused the problem in some designs I have used:

1. An ambition to build some general purpose library that just happens to solve the problems at hand as a subset of a much more general set of problems. The cost is a library that has many disadvantages (longer to develop, harder to verify etc) but the main cost is in the increased complexity for anyone using or maintaining it.

The way to avoid this is to focus on the problem at hand and not on any more general problem, or possible future changes. This behavior is the motivation in Extreme Programming for the rule that you should never code for the future, only the immediate requirements (see YAGNI below).

2. When coding, a temptation to add some unneeded facility now, for a perceived negligible cost which would be hard to add later and (it is assumed) would surely be needed. I actually believe that it is sometimes a good idea to add such "free" features while the code is fresh in your memory as long as you can squeeze it into the sprint and it doesn't become an undocumented feature.

However, it may be a good idea to disallow this as a general policy of YAGNI. Prohibition reduces the chance of unauthorized changes and undocumented features creeping in, which can become a maintenance problem down the track.

The crucial part of this thought process is the phrase "hard to add later". Here again, Unit Tests (see previous post on Unit Tests) can help by making later changes much easier.

3. The desire to test some new technique or technology, even though its use will complicate the design for no real advantage. This is the worst reason, at least morally, as it places self-interest ahead of the interests of the customer or company. I am appalled that this practice is extremely common in the industry, since it is often tolerated or even goes unnoticed.
it places
self-interest
ahead of...
the customer

This problem is not often discussed but I am very glad that I read about it recently in "97 Things Every Software Architect Should Know". In fact it is "thing" number 1: Don't put your resume ahead of the requirements by Nitin Borwankar.

Of course, it may not be done consciously. It is often due to something the developer has recently read. They may, at that brief moment in time, honestly believe that trying some new pattern, practice, tool or algorithm is the best way to accomplish the task. However, I still feel that at the back of their mind their is some nagging guilt that it is not in the customer's best interests.

YAGNI

I have talked about the XP (Extreme Programming) principle of YAGNI (You Ain't Gonna Need It) before. To recap, this is a policy of only ever building what you know is needed right now. Actually many Agile proponents go further: only ever add the smallest incremental change that still leaves you with working software. YAGNI puzzled me at first but now I realize that not following this idea resulted in much of the problems and frustrations I have had to overcome in many projects in the past few decades.

Many of the libraries and tools I have worked with (and created too) have been overly complex by trying to anticipate future changes or other uses that almost always never eventuate. One former colleague, after spending many months creating such a library to be used as the basis of another program (hi Alex) described it as like "building the foundations for an airport runway, only to put a shack on it".

The counter-argument against YAGNI is that it will result in software that is harder to modify, because you are not allowing for future changes. First, I will say that often the simpler software ends up being easier to modify than the software that was designed to be modified. This is my experience, and I'm not sure why, but it may be that changes rarely eventuate as anticipated. However, I do agree that there is a lot of validity to this counter-argument but I have an ace up my sleeve...
  Unit Tests...
free you from the
burden
of thinking about
the future

There is also one final, crucial part of the puzzle: Unit Tests! If you have a full set of Unit Tests you can modify with impunity. This frees you from the burden of thinking about the future so you can concentrate on the important thing - what the customer wants right now.

Conclusion

The Gas Factory anti-pattern is growing. The unnecessary complexity it causes has many bad effects. First and foremost, it makes the the design of software hard to understand which makes it hard to build and maintain. Moreover, the reasons for doing it, are obviated by the use of Unit Tests.

If not (yet) the most serious bad design practice then it is the most insidious. It is so insidious that many people are not even aware of it -- it was even recently removed from the Wikipedia page on anti-patterns because somebody thought it was a furphy.

Saturday, 7 December 2013

Unit Tests - Personal Experiences

Blogs are supposed to be about personal experiences, so I'm going to talk about my own personal experiences with Unit Tests, or as I first called them Automated Module Regression Tests. (This continues my coverage of Unit Tests which started last month and will continue next week when I talk about the problems of Unit Tests and how to overcome them.)

Discovery

Like a lot of people I discovered Unit Tests a long time ago but did not realise there full importance for a long time.

In 1985 I was working on several MSDOS projects for a small software company called Encom. We were using Microsoft C version 2.0 which was a re-badged version of Lattice C. In 1985 or 1986 Microsoft released version 3.0 of their C compiler (which I'll abbreviate as MSC3) which was, in fact, their Xenix (MS variant of UNIX) C compiler ported to MSDOS.

I had the job of rebuilding all our software with the new MSC3. This taught me a few subtleties of portability (especially about writing structs to binary files). For example, when Lattice C creates the memory layout for a struct it builds its bit-fields top-down (ie, using the top bits of the bit-field storage unit first), whereas MSC3 (and most C compilers) build bit-fields from the LSB up. This caused a bit (no pun intended) of havoc with our binary files.

Luckily, apart from the bit-fields problem and a few places where it was assumed that char was unsigned, our code was very portable, but one problem was that I'd used some Lattice string routines. The MSC3 run-time library instead had most of the "de facto" string routines found in UNIX compilers and omitted to implement the Lattice ones. (Microsoft provided a header file called v2tov3.h to assist with portability but it was pretty much useless.)

Lattice String Functions       

Note that I found the Lattice string routines very useful and better thought out and named than the MSC3 (ie, UNIX) string routines. For example, routine names had a prefix indicating what was returned (which connoted not only the type but the purpose of the value). For example, a function returning a pointer to a string had the prefix "stp". 


This is strangely reminiscent of Hungarian notation (ie, Application Hungarian not System Hungarian) that Microsoft revealed soon afterwards.

A few of the Lattice string routines had no equivalent in MSC3, so I had to rewrite these routines for the new C compiler. I gave each of the Lattice functions its own source file and, as I believe was common practice even then, I added a test rig at the end of the source file, using #ifdef  TEST, something like this:

/* stpblk - skip leading blanks
 * port of Lattice func to MSC3
 * ... */
#include <ctype.h>
#include <assert.h>

char *stpblk(char *str)
{
   assert(str != NULL);
   while (*str != '\0' && isspace(*str))
      ++str;
   return str;
}

#ifdef TEST /* use -DTEST cmd line option to build test rig */
int main()
{
   char buf[256];
   do
   {
      gets(buf);
      printf("stpblk on <%s> returned <%s>\n", buf, stpblk(buf));
   } while (buf[0] != '\0');

   return 0;
}
#endif /* TEST */

This test rig allowed me to enter a test string and see the result of the call to stpblk(). That way I could do a lot of exploratory testing to check that the function was working correctly.

It occurred to me that it would be more thorough to create a complete set of test cases and code the tests directly, rather than the more haphazard approach of manually trying all sorts of different values. That is, something like this:

/* ... as above */
#ifdef TEST
int main()
{
   assert(*stpblk("") == '\0'); /* Test empty string */

   assert(*stpblk(" ") == '\0'); /* just whitespace */
   assert(*stpblk(" ") == '\0');
   assert(*stpblk("\t") == '\0');
   assert(*stpblk("\n") == '\0');

   assert(strcmp(stpblk("a"), "a") == 0);
   assert(strcmp(stpblk(" a"), "a") == 0);
   assert(strcmp(stpblk("\ta"), "a") == 0);
   assert(strcmp(stpblk("a "), "a ") == 0);

   assert(strcmp(stpblk("abc"), "abc") == 0);
   assert(strcmp(stpblk(" abc"), "abc") == 0);
   assert(strcmp(stpblk("abc "), "abc ") == 0);
   assert(strcmp(stpblk("a b c"), "a b c") == 0);
   assert(strcmp(stpblk(" a b c"), "a b c") == 0);

   assert(strcmp(stpblk(" \xFF"), "\xFF") == 0);

   stpblk(NULL); /* should cause assertion */

   return 0;
}
#endif /* TEST */

I could then run the tests on the original Lattice compiler functions and my supposedly equivalent versions of the same functions to make sure they produced the same result. Further, if the function later needed to change (eg, to make it faster) I could simply re-run the tests to make sure it still worked. However, there were a few difficulties:
  • sometimes creating the tests took ten times longer than writing the code!
  • none of the tests actually found any bugs
  • there was no way to write tests to check that assertions worked without manual intervention (see last test above)

First Appraisal

Of course, my next thought was maybe we should do this for other functions/modules in our software. However, after some thought I rejected it because:
  • it was "impossible" to write tests for most of the code as it was mainly UI
  • a lot of code interacted with hardware which would require simulation
  • the cost/benefit ratio would not make it worthwhile
  • but mainly, the tests did not find any bugs

Another Attempt

I put the idea aside for awhile. I started working for a larger company (AMP) in 1986, and most of my work was user-interface stuff, and I assumed that automated tests like these would be impossible to create for that sort of code.

However, after a year or so I was given charge of a team with the responsibility of creating a library of modules designed to model many of AMP's policies (initially some new superannuation products but eventually we were to model many others such as insurance products, etc).

To me this seemed like an ideal application of my earlier discovery because:

  • each module was concerned purely with calculations - there was no messy interaction with the user, hardware, 3rd party libraries, etc
  • the calculations were complex and it was important to check that they were correct for all boundary conditions
  • the modules were likely to change as the company actuaries sometimes changed the rules
  • sometimes the initial code was far too slow - we would need to check that nothing was broken after optimization

It was around this time that I invented the term Automated Module Regression Tests (AMRT). I tried to promote the idea to management. Unfortunately, they were not keen on the idea as I admitted that it would at least double development time. Another problem was that the company had only recently embarked on a automated testing project which had been a major failure and anything that suggested "automated testing" was looked on with scorn. Finally, the technical manager of the section (PD) was very keen on some nonsense called "object oriented programming" and was not to be distracted from OOP by other silly ideas.

SQA

I left AMP (and C coding) soon afterwards. My next job was mainly working with assembler and performing sys admin tasks (UNIX), though I still did a bit of system programming in C.

None of these areas gave me any opportunity to try out AMRT, but, in 1993, I did a postgraduate course in SQA (software quality assurance) and I came to appreciate many of the subtleties of testing. (Only a small component of the course dealt with testing since quality assurance is about a lot more than testing.) Some important points on testing that I learnt:
  1. It is important to test code as soon as possible (see JIT testing) to avoid all sorts of problems (see Cost of Delay in a previous post on Change).
  2. Studies have shown that typical (Black Box) testing finds only 25% of all bugs.
  3. Studies show even the most thorough user testing finds less than 50% of bugs.
  4. It is impractical to test all combinations of inputs even for simple modules.
  5. Programmers should test their code thoroughly as only they know how it works.
  6. Code coverage tools should be used to check that all (or most?) of the code is executed during testing.
  7. Ongoing changes cause most bugs even in projects with a low initial bug count.

However, no mention was ever made about Unit Tests. To me it seemed that AMRT was an ideal solution to many of these problems. First, the tests can be written at the same time as the code - an example of JIT. Moreover, because it is done by the original coder it can be directed at areas where there might be problems (or where future modifications may cause bugs). But the major advantage I saw is that it allowed the code to be modified with greatly reduced chances of changing existing behavior, ie creating new bugs.

In 1994 I got to work as the Team Leader on a new project. This gave me an ideal opportunity to try several new things that I had been wanting to try:

* C++ development
* Barry Boehm's Spiral Development Methodology
* AMRT (ie, Unit Tests)

The project was a major failure but I did learn a few things about AMRT. First, they made making changes easy. Second they could work as a form of documentation.

Agile

In 1995 I rejoined Encom to get back to programming (in C and later C++). Again I tried to convince others of the benefits of AMRT. Then in the late 90's a colleague (SR) introduced me to XP (Extreme Programming). I actually wasn't that impressed with XP when I first read about it as it mainly seemed to be a re-hash of many ideas I had already seen proposed in the SQA course. However, after a while I did realize that at the core of XP there were some new ideas which seemed to address a lot of problems that I had even recently encountered in software development.

One of the practices in XP was Unit Tests. I did not realize for a while that what was meant by Unit Tests was exactly what I called AMRT. And I don't believe that XP emphasizes Unit Tests enough.

Conclusion

I took a very circuitous route to come to a full appreciation of Unit Tests. Initially I did not appreciate their worth. When I tried them they did not find any bugs. When combined with the obvious costs of creating them they did not seem worthwhile.

Over many years I slowly came to realize that the benefit of Unit Tests is not that they find bugs, but it is that you can run them at any time to ensure that the code behaves correctly. This allows you to refactor code without fear. It also allows the software to easily evolve which is fundamental to the Agile approach. There are other benefits which I mentioned here last month. Further the costs can be overestimate and mitigated by tools and techniques which I will talk about next month.

What is very surprising to me is that Units Tests are not given more prominence in Agile development. Having read many books on Scrum I have not seen them mentioned once! Even in XP they are not explained well and not emphasized enough.
   Unit Tests...
allow software
to evolve

A fundamental aspect of the Agile approach is that the code keeps changing in response to feedback from the users of the software. Unit Tests are absolutely crucial to this approach since they enable the changes to be made that allow the software to evolve.

Saturday, 9 November 2013

Unit Tests - White Box Testing

When writing Unit Tests you should always/never use White Box Testing. The correct version of this statement depends on what you mean by White Box Testing.

I have seen (and been involved in) this debate a lot. These are the opposing arguments:

1. You should never use white box testing as then you are testing the implementation not the interface of the module. The tests should not have to change when the implementation changes.

2. You should always use white box testing since you can't possibly test every combination of inputs. Making use of knowledge of how the module is implemented allows you to test inputs that are likely to give incorrect results, such as boundary conditions.
there are different
interpretations...
of the opposite of
black box testing

Paradoxically, both of these arguments are valid. The problem comes down to what is meant by white box testing. Unfortunately, the meaning has never been clear since it is obviously just the opposite of black box testing and everyone knows what black box testing means. The problem is semantic -- there are different interpretations of what is meant by the opposite of black box testing.

Here is my attempt to describe two different meanings of white box testing:

Meaning 1: Test that a module works according to its (current) design by checking how it behaves internally by reading private data, or intercepting private messages.

Meaning 2: Test a module using knowledge of how it is implemented internally, but only ever testing through the public interface.

Like cholesterol, there are both bad (Meaning 1) and good (Meaning 2) forms of white-box testing.

Bad White Box Testing

Recently a colleague came and asked me how a Unit Test should access private data of the module under test. This question completely threw me as I had never needed, or even thought, to do this. I mumbled something about not really understanding why he wanted to.

Later, when I thought about it I realised that Unit Tests should not be accessing private data or methods of the modules they are testing. Doing this would make the unit test dependent on the internal details of the module. Unit Tests should only test the external behaviour of a module never how it is actually implemented.

As I said in my previous post (Unit Tests - What's so good about them?), one of their main advantages is that they allow you to easily modify and refactor code and then simply run the Unit Tests to ensure that nothing has been broken. This advantage would be lost if the test depends on the internal implementation of the module. Changing the implementation (without changing the external behaviour) could break the test. This is not what you want.

Unit Tests should only test the interface of the module and never access private parts.

Good White Box Testing

The whole point of Unit Tests is that they are written by the same person(s) who wrote the code. The best tests are those that test areas that are likely to cause problems (eg, internal boundary conditions) and hence can only be written by someone with an intimate knowledge of the internal workings of the module.

For example, say you are writing a C++ class for infinite precision unsigned integers and you want to create some Unit Tests for the "add" operation (implemented using operator+). Some obvious tests would be:

   assert(BigInt(0) + BigInt(0) == BigInt(0));
   assert(BigInt(0) + BigInt(1) == BigInt(1));
   assert(BigInt(1) + BigInt(1) == BigInt(2));
   // etc

Of course, the critical part of a BigInt class is that carries are performed correctly when an internal storage unit overflows into the next significant unit. Doing simple black box testing you can only guess at how the class is implemented. (8-bit integers, 32-bit integers, BCD or even just ASCII characters could be used to store the numbers.) However, if one wrote the code one could know that, for example, internally 32-bit integers are used, in which case this is a good test:

   // Check that (2^32-1) + 1 == 2^32
   assert(BigInt::FromString("4294967295") + BigInt(1) ==
          BigInt::FromString("4294967296"));

This test will immediately tell you if the code does not carry from the lowest 32-bit integer properly. Obviously, you need other tests too, but this tests a critical boundary condition.

To find the sort of defects that the above test checks for, using black box testing, would require a loop that would probably take days to run. (I will talk next month about why Unit Tests should not be slow.)

Which is The Accepted Definition?

The whole confusion about White Box Testing is that the distinction above is not clear. Most definitions imply the "good" definition, but then say something that contradicts this. For example, the Wikipedia page on White Box testing (as at November 2013) does not make the distinction but seems to imply the "good" definition, but then suggests it is talking about the "bad" form - for example it says it is like like in-circuit testing which is the hardware equivalent of accessing private data.

My general conclusion, talking to colleagues is that the "good" definition is what most people think White Box Testing is or should be, but there are some people that use the "bad" definition.

I don't really care which definition of white box testing you think is correct as long as you distinguish between them, and as long as you specifically use "good" white box testing with your unit tests.

Disadvantages of White Box Testing

At this point I will note a few problems.

First, with some code the distinction between interface and implementation is blurred. This is a bad thing for many reasons, one of which is you don't know if you are doing "bad" white box testing. See
Best Practice for Modules in C/C++ on how to separate interface from implementation. Also see the sections on Information Hiding and Decoupling at Software Design.

Also with white box testing it is easy to forget to add new tests when the implementation changes. For example, using the BigInt example above, imagine that the code was enhanced to use 64-bit integers (eg, because most users were moving to 64-bit processors where there would be a large performance boost). If, after changing the implementation, the test was not modified (or a new test not added) to check for overflow of the 64-bit unit then the Unit Tests would not be complete.

Conclusion

There are two points to this post.

1. Make sure you understand what someone means by white box testing. When they say Unit Tests should not use white box testing then they are probably talking about "bad" white box testing which accesses the internals of the module under test.
It is pointless
to write Unit Tests
that only do
black box testing

2. When writing Unit Tests you should always use "good" white box testing to test for different combinations of parameters, boundary conditions and other likely problem areas.

It is pointless to write Unit Tests that simply do black box testing. The whole point of Unit Tests is that they are written by whomever wrote the code, in order to test likely problem areas.

Sunday, 3 November 2013

Unit Tests - What's so good about them?

This time I will briefly describe the many benefits of Units Tests then later look at their limitations and challenges and how to overcome them.
What is a Unit Test?

The term Unit Test was once a synonym for module tests, referring to any test of an individual unit or module. Originally done manually, such as using some test rig to enter input and check output, then later by stepping through code in a debugger. This was typically only performed once a module had been implemented or an enhancement completed.


Nowadays, the term Unit Test has evolved to the stricter definition of a comprehensive set of automated tests of a module that can be run at any time to check that it works.

I'm sure you noticed that I have talked about Unit Tests in past posts, when relevant. That has meant I have mentioned them in almost every post since they have benefits in many areas of software development, particularly Agile software development. It's now time to tie everything together (and even put a nice red bow on it for Christmas :).

The idea for what are now called Unit Tests has been around for many years. I myself independently discovered them idea in 1985, giving it the possibly more accurate but duller name of "automated module regression tests" (AMRT). However, the use of Unit Tests has only really started gaining momentum in the last decade or so since the rise of XP (Extreme Programming), and TDD (Test Driven Development).

Better Design

I talked a little about how Unit Tests help with software maintenance last week, and I will summarize again below, but first I will explain how Unit Tests can even assist with the original design and implementation before we even consider changing it.

There is a lot of evidence that creating Unit Tests, at the same time as you write the code to be tested, results in a much better initial design. It helps the programmer to think about things (boundary conditions, error conditions, etc) that are often ignored in the rush to get something completed.
creating
Unit Tests ...
results in a
better
initial design

TDD (which is, in a way, an extension of Unit Testing) further improves the software quality, for example in verifiability (see Verifiability). It also assists in ensuring that the Unit Tests themselves are correct.

Another advantage is that creating Units Tests, while writing the code (eg, using TDD) means that bugs are found much earlier. This allows the design to be suitably modified before it becomes ossified.

Finally, the main advantage to the design is that Unit Tests make it much easier to resist the urge to add unnecessary code. This is the idea behind YAGNI in XP.

I could write a whole post on YAGNI (and probably will) and have also briefly talked about it in Reusability Futility. In brief, it is the idea that you do the absolute minimum to implement something.

Why are extra things done anyway? There are lots of reasons:
  • a future extension is thought to be certain
  • adding it now avoids later costs of change (see below)
  • something is seen as a nice addition for the user and is trivial to add
  • the developer wants to try something more challenging
  • a more general problem is solved in the name of reusability, rather than the specific one required
  • anticipated code degradation means it will be hard/impossible to add later
  • distrust of code maintainers to later add a feature properly
  • as an attempt to avoid repeated regression testing when things are later added
  • finally (and ironically) changes are made to make it more maintainable
The consequences are:
  • a more complex design, which is less easily understood
  • undocumented features which are not debugged and tested properly
  • when software is rewritten or just refactored, undocumented features are forgotten or inadvertently disabled - this can be very annoying for users
  • reluctance to refactor from fear of breaking undocumented features
  • can often constrain the design making future changes more difficult
  • usually makes the code less maintainable
Note that there is one consequence that I have not touted here. I have seen it said (for example see the Wikipedia page for YAGNI) that it is faster and requires less effort to create a simple design. This may sometimes be true but in my experience it is often harder and more time-consuming to create the simpler design than a more complex one.

Unit Tests mean that most of the anticipated problems with YAGNI disappear. You don't need to add the feature now because Unit Tests allow you to add it easily later. That is they reduce the costs of change. Moreover, the code does not degrade and maintainers are more likely to make modifications in a way that does not subvert the original design, because they don't have to fret over introducing new bugs, and good Unit Tests can even guide them how to make the changes,

Further, I have found that the use of Units Tests with a simple design makes it easier to maintain than a design which is explicitly built to be maintainable!
a simple design [is]...
easier to maintain
than a design...
built to be
maintainable!

Finally, Unit Tests free developers to concentrate on the task at hand, rather than being distracted by tasks they are not really good at (like predicting future user requirements).

Handling Change

In brief, the problem is that the costs of changing software are traditionally very high, primarily due to the risk of introducing new bugs (and the consequent problems) and the need for regression testing. These costs cause two effects:
  1. resisting change, which is bad for the long-term viability of software
  2. emphasis on avoiding mistakes causing an unnecessarily large up-front effort
Resisting change can result in lost business opportunities simply by avoiding adding new features. Further, the code is not refactored to improve maintainability or take advantage of a better design. Further still, improvements due to external technical advances (eg, improved hardware or software libraries), are often missed.

Worst of all, even when changes are made they are not done properly for various reasons like:

1. It may be very hard to make the change in a way that is consistent with the original design. A good example of this is given at the end of Item 43 in Scott Meyers book Effective C++, where multiple inheritance is used to avoid making the correct changes to the class hierarchy.
when changes are made
they are not
done properly

2. The changes may affect other people. For example, a library or DLL from another section may need to be changed. There can be a reluctance to ask others to make changes that might appear to be for the sake of one's own convenience. I gave an example of how this happened to me in a previous post - Why Good Programs Go Bad.

3. Often making changes properly may increase the chance of new bugs appearing (or old bugs reappearing). An example of bad code changes made in the name of avoiding bugs, is the common practice of cloning a function, or module, to handle a change, and making slight modifications for the new circumstance. This preserves the original function or module so that there is no chance of bugs being introduced in existing behaviour; but the consequence is that there will be duplicate code, which violates the DRY principle, and causes a maintenance problem.

4. Even a simple change has the possibility of introducing new bugs. Hence manual regression testing is required. This can have a large cost and is usually very tedious for those that do the testing.

Finally, the problem of spending an enormous effort up-front to get the specification right first time has many well-known, undesirable consequences. First, it makes managers very nervous when there appears to be a large cost (wages) with very little tangible evidence that anything has been accomplished. The analysts also soon realize that they don't really know what is required and/or what they are doing and that there is no chance of getting the analysis and design right first time. There is a large amount of effort which could be better spent in another way - the Agile way.

Agile methodologies generally help greatly to reduce the cost of change by catching bugs (and other defects) much earlier. But Units Tests especially enhance this aspect of Agile methodologies by:
  • detecting bugs earlier (reducing the cost of delay)
  • make code more maintainable (reducing cost of rework)
  • allow changes to be made properly
  • making refactoring easier and less risky
  • work as a form of "living documentation" making changes easier (see below)
  • even guiding developers to ensure that future code changes are done properly

Documentation

Technical documentation has always been one of the biggest problems in software development. Studies have shown (at least in Waterfall-style projects) that poor technical specifications are the major reason for project failure, followed by poor estimations (which are usually due to incomplete specs).

Before I talk about the relationship between Unit Tests and documentation, let's just look at the problems and their causes. There are a lot of problems with technical documentation which I group into three rough categories:

1. Accurate. One problem is that documents are simply incorrect, and the reviewing/proof-reading required to remove all errors would be onerous. Of course, another major problem is that they are incomplete, often with gaping holes, and no amount of reviewing is going to find a gap that nobody has thought of.

One way to attempt to overcome this has been to require people with authority to sign off on a document (presumably after having read it). That way, at least you have somebody to blame when things go wrong! Having more people read and sign-off is good (in one way) as it increases the chance of spotting mistakes but then (in another way) the blame-ability for each individual is diluted. And trying to blame is not the Agile approach.

2. Up to date. One reason documents are incorrect is because things change, usually a lot. Documents, even if correct initially, are invariably never in accord with the software at any particular point in time. One of the main reasons, is finding time to update them. You don't want to keep modifying something when you suspect it will change again in the near future. The end result is you keep postponing updating the document until it becomes irrelevant and everyone has forgotten about it, or the task is so big that you can never find time.

One reason documents are not updated is that they are hard to verify. It can be very difficult to check if there is a discrepancy between the actual software and the documentation. Even though the document and the code are very closely related there is no direct connection between, except via the brains of the developers. This is another good example of DIRE (Don't Isolate Related Entities).

3. Understandable. Finally, documentation is difficult to read. There are many reasons, like being too vague, including irrelevant information, bad grammar and incorrect terminology, etc. A common problem is assumed knowledge on the part of the reader - a good introduction/summary at the start of a document is almost never done but can be a great time-saver and avoid a lot of confusion.

The lack of a summary is symptomatic of the basic problem - the author is entirely focussed on getting all the facts down. This is like what happens with a newbie programmer - they are so focussed on getting the software working (ie, correct) they give no regard to other important attributes (like understandability). Unfortunately, document writers rarely get past the "newbie" stage, being mainly concerned with correctness not with the understandability of what they write.

Favour Working Software over Comprehensive Documentation

It is no mistake that this, the second of the four elements of the Agile Manifesto, deals with documentation. And there is no better example of favouring code over documentation than Unit Tests, since they are working software which actually obviates the need for much documentation. On top of their other advantages Unit Tests work as a form of living documentation which records not only how the code is supposed to work but also shows others how to use it.

Most documentation is poor, but even with the best you are never certain you have understood it completely. Most programmers create a little test program to check their understanding. With Units Test that little test program is already done for you.

There are many other ways that Unit Tests work as an improved documentation. By now, you probably get the gist so I will just create a list:
  • correct - unlike documentation mistakes are immediately obvious
  • verifiable - you can easily check if Unit Tests are correct by running them
  • understandable - working code is much easier to read/test than documentation
  • modifiable - Unit Tests allow code to be more readily modified
  • up to date - Unit Tests (if run and maintained) are never out of date
Of course, like documentation Unit Tests can be incomplete (and often are). This is something I will talk about in the next blog, but for now I will simply say that code coverage analysis can help to ensure that tests are reasonably complete and up to date.

Finally, using Unit Tests as documentation can be thought of as a form of automation. Automation is another thing I am keen on, as I hate performing tedious tasks - and reading documentation to check that my code matches it is a perfect example of a tedious task. Running Unit Tests automates this task.

Organization

Finally, Unit Tests are good for organizing your work. You develop a rhythm when using Units Tests (and particularly when using TDD), of getting into a cycle of coding/testing/fixing. Somehow it becomes more obvious what you need to do next. For example, after implementing a feature you just "fix the next red light" until all Unit Tests pass.

Summary

Once you start using Unit Tests you keep finding more things to like about them. Your design is more likely to be correct and more verifiable. Further the initial design can be simpler because you don't have to try to guess what will happen in the future. And when things do change, enhancements are made more quickly and reliably and bugs are found faster. Best of all changes can be made properly and the code refactored without fear of introducing bugs.

However, the main point I was trying to get across is that without Unit Tests I don't believe an Agile methodology can work. (This is one of my criticisms of Scrum - that it does not prescribe Unit Tests as essential to the methodology.) Unit Tests allow the software to start out simple and evolve into what it needs to be. They allow us to resist trying to predict the future.

So why aren't they universally used? First, there is the perceived cost/benefit ratio - though I think the real benefits are greatly underestimated, especially when combined with other Agile techniques. Another reason they are not widespread is due to several challenges that I will cover in my next post...

Saturday, 26 October 2013

Change (Unit Tests Intro)

A perennial problem in software development is coping with change.
It's happened again!! I started writing a blog about Unit Tests (see my next post coming soon) and ended up writing a lengthy introduction about Change. So I have split the intro into this separate post. This is so I don't overload you and so you can skip it, if you're not interested. (However, you probably should read the Conclusion below anyway.)
This post explains the types and reasons for change and the costs involved. It also looks at how change is traditionally handled and how Agile development and Unit Tests can help.

The Cost of Change

First, it may be helpful to understand one aspect of change - how much it costs, and particularly how the cost varies over time. This has been extensively studied and documented under the Waterfall development model - for example, search for Barry Boehm's articles on the subject if you are interested.

However, I can save you the trouble by giving this summary: the longer it takes before you find and fix a mistake the greater the cost. As the graph below shows the rate of growth in the cost is much worse than linear, even exponential.



Diagram 1. Cost of Change (Waterfall model)

The rule of thumb I was given (about 20 years ago) is that for every phase of development the cost goes up one order of magnitude. So if an analysis mistake is not picked up until testing this means that it would cost 1000 times more to fix, since it passed through three phases (analysis > design > coding > testing), while a coding bug picked up in testing will only cost 10 times more to fix than if it had been found straight away.

I believe there are two fundamentally different reasons for this, which I have called the cost of delay and the cost of rework.

Cost of Delay

Experienced programmers know that if you find a bug while working on the code, you can often correct it quickly or even immediately, but fixing the same bug a month or two down the track can take hours or even days, and even then may not be fixed properly. This is primarily due to limitations of the human brain - you will have forgotten exactly how the code works. (Of course, if the original developer has left the company and there are no Unit Tests or comments/documentation then it can take much longer.)
The Cost of Delay is ...
due to limitations of the human brain.

Worse, you may not even realize you have forgotten how the code works and make changes based on a misapprehension. In my experience, these are the worst source of bugs. I have often written, thoroughly tested and understood a module which was working perfectly. Then a simple change made later causes all sorts of problems. Again, as I discuss later, Unit Tests help here.

The cost of delay is also due to other tedious and time-consuming problems like setting up an environment for testing and debugging. You may not even be able to reproduce the problem in the latest version of the code so you need to find and rebuild an older version in which the problem occurs in the field - this may require tracking down old versions of compilers, tools, libraries etc, if this information was even written down somewhere.

All this should convince you that it is best to find bugs as soon as possible - something I previously discussed in JIT Testing. Actually this is a good example of the principle of DIRE (Don't Isolate Related Entities). In this case we are talking about not isolating the coding from the testing in time. This is one advantage of using Unit Tests, and particularly TDD (Test Driven Development).

More generally, reducing the cost of delay is one of the principal advantages of Agile methodologies. The continuous feedback from the customer/users (eg at least at the end of every Sprint in Scrum) means that problems are found and fixed much more quickly when the context is still fresh in everyone's mind.

Cost of Rework

The other problem with delaying changes is that in the meantime a lot of work may have been done, based on the original software. This problem is not due to the length of time that has passed but simply due to the fact that this work has to be repeated. For example, extensive testing may have been performed that would need to be redone to ensure that no new bugs were introduced.

Another cost, for software already released, would be notifying and updating users of the problem. Rolling out a new release, even just for a bug fix, can be costly.

There can be coding costs too, since the change may require a major internal re-design. In this case, the behaviour of the code must be understood once more, the code modified and once again a great deal of regression testing is required to make sure nothing has been broken.

Unit Tests can help reduce the costs of rework too. If the original software had comprehensive Unit Tests then changes can easily be made to the code. Generally, if all Unit Tests pass, it is safe to assume that no bugs have been introduced by the changes. This can reduce the costs associated with coding, debugging and even regression testing.

But there is an even greater advantage to Unit Tests. It is not uncommon to find that, due to the large cost of rework (as explained above), software changes are not done properly but in a manner that minimises these costs. That is changes are made in a way that minimises the risk of introducing new bugs.

I can give many examples of how this occurs but a common one is to duplicate a complete function or module and modify the copy leaving the original untouched (so that the original can handle the commonly used existing functionality in exactly the same way). This strategy results in code duplication (often on a massive scale) which contravenes the principle of DRY (see Principles of Software Design).
I briefly discussed this in early 2012 (see When Good Programs Go Bad) and a reader responded with a quote from Bruce Eckel:

"Management's reluctance
to let you tamper with
a functioning system
robs code of the resilience
it needs to endure."

This is very insightful but Units Tests can help to assuage that reluctance.

This sort of practice is so ingrained in the industry that most developers do not even realise they are doing it. However, developers are not entirely to blame here since it is often the managers that react very badly when bugs are found (and it was probably the same managers who imposed draconian deadlines that precluded creation of Unit Tests).

If you haven't seen the light yet here is a summary: Unit Tests allow software to be changed without compromising the original design and without fear of introducing new bugs. The code can then adapt and evolve and never need to be completely rewritten or discarded.

Reasons for Change

Why do we need to change software? Ask most people and you get two answers: fixing bugs and adding features. But there is a lot more to it than that! More generally, changes are made to improve the quality of the software (of which bug fixing is just one aspect) and, yes, to add enhancements.

Enhancements

I really don't have much to say about adding functionality, except that there is often a reluctance to add new features due to the cost and the possibility of breaking existing features. Unit Tests help by making the code more modifiable (see next blog) and by catching bugs caused by the changes.

Fixing Defects etc (User quality attributes)

The other reason for change is to improve the quality of the software. There are many aspects to the quality of code not just correctness (which is what fixing bugs is all about). The software might also need to be improved if specific problems have been identified in areas such as performance, usability, reliability and other "user" quality attributes.

Refactoring (Developer quality attributes)

What is often neglected are changes that improve developer quality attributes, especially maintainability (but also includes verifiability, portability, reusability, etc). See my previous post on The Importance of Developer Quality Attributes for an explanation of the difference between user and developer quality attributes. Changing the software to improve developer quality attributes is known as refactoring and many people are realising that it is essential to the long-term viability of any software.

There are actually many benefits to software that is easily modified (and hence easily refactored) as I discuss later. I will mention Unit Tests again here as they allow you to improve the software without fear of introducing bugs.

Change Management

Before Agile methodologies came along change was thought of as something to be avoided. Of course, considering the cost of change graph (above) this was completely understandable. I will first look how traditionally change has been managed and then look at the Agile approach.

Eliminate Mistakes

If you don't make mistakes then you don't have to fix them. This is
“ Right
 First
    Time ”
the attitude epitomized by the SQA motto Do It Right the First Time or simply Right First Time. Over many decades starting sometime in the 1960's there have been a huge number of software projects fail and the reason given has invariably been inadequate, poorly documented, or continually changing requirements.  (Some studies have found poor estimations as the primary cause but that is just a result of poor requirements.). In other words: we didn't really know what we were doing.

So the thinking was always that more time should have been spent on analyzing the problem in order to eliminate all the mistakes and omissions in the requirements and anticipate any "unanticipated" changes.

This is why there has been a huge amount of research, and even more debate, on how to avoid the mistakes including:
  • better and more thorough analysis techniques
  • better communication with the customer
  • the invention of various estimation techniques
  • using prototypes so the customer better understands what is specified
  • modelling languages and diagrams
  • formal proofs of correctness
  • etc
The problem is that with any reasonably large, real-world software project you can never get it right the first time! Moreover spending a lot of time and effort trying to do so is time-consuming. The customer or sponsor also becomes concerned when nothing appears to be happening except a lot of people trying to understand the problem.
You can never get it right first time!

Further, not all change comes about from mistakes. Even if you can avoid making mistakes (which you can't) there will be other reasons for change. You can't anticipate unanticipated changes. Some change you just can't avoid (such as regulatory changes).

Discourage Change

Whether deliberate or not, another strategy commonly used in a large project (under Waterfall) is to do everything possible to prevent changes from being made.

First, a complex and tedious procedure, with lots of forms, is set up for the approval of all changes. All proposals go to a change review board consisting of managers, analysts and architects with the knowledge and ability to give reasonable grounds for rejecting almost any proposal.

In this sort of environment refactoring the code is never even considered. This leads to a snowball effect; if code is not refactored to make it more maintainable then it becomes more and more expensive to make changes for other reasons.

In the end only the worst bugs and the most desirable new features are approved. It is hard to quantify, but the resulting lost opportunities can make a huge difference to the long-term viability of the product. Advances in software and hardware are accelerating -- if the software cannot be adapted to use them then it will be at a disadvantage to competitors who can.

Software that does not adapt to change will eventually atrophy and die.

Minimize Risk

OK, a change has been approved -- a major new feature or perhaps a serious bug needs to be fixed. But the problems don't stop there. There is the even more pernicious problem of how the change is made.

Most of the time changes are made, not in the best way, but in a way that reduces short-term risks. This may be a conscious decision of the designers but more often is due to the way the programmers work. (See my example above under The Cost of Rework).

Why do programmers work like this? It is sometimes due to laziness or fear (which is the main motivation behind the XP idea of Courage). But it's more often due to conditioning by poor management practices (see Why Good Programs Go Bad for full explanation).
  •  programmer do things the easy way (Code Reviews and Unit Tests can help here)
  •  unrealistic deadlines, with no time allocated to later refactor
  •  management intolerance of bugs caused by making changes properly (Unit Tests help here)
The end result is software that degenerates into the classic unmaintainable Ball of Mud.

Agile Approach

Agile methodologies take a completely different approach, by recognizing that when creating software you can't even get close to getting it right first time. The Agile catch cry is instead Embrace Change. Many people think this simple means we have to accept change (and the costs), but Agile actually questions the Waterfall assumptions, and tries to find ways that actually reduce the costs of change.

In other words rather than cope with that horrible cost of change curve above (associated with Waterfall methodologies) it changes the shape. There has been a lot of debate about how the curve looks under Agile but it might be something like this:

Diagram 2. Cost of Change (Agile)

However, in all the debate about the shape of the curve, an important point is missed. Mistakes are caught sooner due to continuous feedback from the customer, so we don't get so far along the curve before finding and fixing defects. For example, many problems are found under Scrum in the Sprint Review which would not be found until months later under the Waterfall model.

Further, Unit Tests, which I consider a fundamental part of Agile, have an even greater benefit. If Unit Tests are written at the same time as the code (or better still TDD is practiced) then many bugs will be found straight away that would not be found till later. This reduces the costs of delay.

Even more important is that if changes are required, Units Tests allow them to be made more easily and reliably. This reduces the costs of rework.

Finally, the disincentive to refactor the code is greatly reduced by Units Tests, which can greatly reduce the costs of lost opportunities, and extend the life of the software.

Conclusion

Using a Waterfall development methodology the costs of change are prohibitive and so are avoided or performed in a way that reduces risks (eg, the risk of introducing new bugs or needing a full regression test). Software developed and maintained like this becomes very expensive to maintain. To remain competitive (unless you have a nice cushy monopoly on your market) it will need to be discarded and rewritten from scratch.

An Agile development methodology turns this problem on its head by embracing change and reducing the costs of change. For example, the continuous customer feedback means problems are found much more quickly.

Further, the Agile approach to change is greatly enhanced by use of Unit Tests since they:
  • make code more maintainable and verifiable
  • reduce the cost of change by allowing changes to be made easily
  • allow code to be refactored to take advantage of better design/new technology
  • facilitate making changes properly, thus avoiding a maintenance nightmare
  • allows changes to be made without fear of bugs or lots of regression testing
I will elaborate on this and other advantages of Unit Tests next...

    Sunday, 13 October 2013

    Book Review: Clean Code

    Over a year ago I was given a copy of the book Clean Code by Robert (Uncle Bob) Martin. There are many good things in this book. I guess the highest compliment I can pay is to note that I have changed my coding practices as a result of reading it.

    Actually all the developers where I work were given a copy of this book. In retrospect this is odd as all the example code is in Java and we do our coding in C, C++ and C#. (We probably got a free pallet of the books as we had Object Mentor come in and do an audit on our development practices a few years ago.)


    Anyway, I think everyone here who read the book got something out of it (though everybody I talked to said they skipped the Java examples).

    The Cover

    Of course, you should not judge a book by it's cover but I found a few things about the cover misleading. (I am not sure that anything is insinuated by the picture of the M104 galaxy on the cover which is both beautiful and distant and home to one of the largest known black holes!)

    First the name on the cover says Robert C. Martin so the implication is that he wrote it. It's not until you start reading some later chapters that you notice that some of them say "By ..." or "With ...". Apparently Uncle Bob did not write all of it.

    I generally avoid reading anthologies, as I prefer a book to be written by one author (or group of cooperating authors), rather than separate loosely related offerings from different people. This makes for a more concise, consistent and generally holistic text; whereas an anthology is invariably disjointed, contradictory and often repetitive.

    Admittedly, this book is not as bad as some anthologies, since most of the text seems to have been written or at least edited by Uncle Bob, but there are examples of repetition and contradiction which I mention later.


    My second complaint is that there is a long blurb on the back cover but absolutely no mention is made of Java. The perception I obtained by reading the blurb and the Introduction (which also does not mention Java) is that the book is suitable for all programmers, but that is not true at least for some chapters.

    Some parts and even whole chapters are almost completely Java-centric (eg Chapters 11 and 13). Not only are the examples in Java but often the text gives advice specifically aimed at Java programmers. This is another problem with using different authors as the chapters by Uncle Bob are aimed at all programmers (but still with Java examples) but some of the other chapters are aimed squarely at Java developers.

    Some code examples can be followed without knowledge of Java, but many require a fairly deep understanding. I did some Java programming about 15 years ago but I still could not understand most of them. I would prefer examples in another language or at least a better explanation of the bits that only an hardened Java programmer would understand.


    My final complaint about the cover is the inclusion of the word Agile in the title. (The full title is: Clean Code - A Handbook of Agile Software Development.) The publisher probably insisted on this title since Agile is the flavour of the decade. (The previous decade book titles had to include Object Oriented.)

    Admittedly, there is quite a bit of content specific to Agile Methodologies, but most of the book is not about agile techniques. In fact many of the ideas far predate agile.

    Good Things

    This book is full of good advice and ideas. Please don't take my negative comments (above and below) as lack of endorsement. Most developers, even experienced ones, can get an enormous amount out of the book.

    That said, I did find a few things that I strongly disagree with. I hope the arguments below can likewise convince you.

    The best thing about the book is that it presents a well-rounded summary of the important points of designing and writing good code. A lot of these are old ideas (though they seem to be presented as if they are new), but it is good to have them all in one place and presented in a reasonable order.

    There are many places in the book where I felt that I could have written the same thing almost word for word, such as the description of maximum code line length. (My personal convention is code not to go past line 100 and end of line comments not to exceed column 120.)

    There are also a few worthwhile things that I had not considered or read about before. An example is the section on creating code at different levels of abstraction (see page 36).

    Bad Things

    Probably my biggest complaint is that the first few chapters are far too detailed, stating the bleedingly obvious (though there are a few things that are bleedingly wrong - see Identifiers and Comments below). For example, there is a whole chapter on creating names for identifiers, then much of the same thing is considered in the next chapter (Functions). When I first wrote some coding standards (for a team of developers at AMP in 1986) all I said on the subject is:
    • identifiers with broad scope (eg, global variables and functions) should have long descriptive names
    • variables with narrow scope (eg, local variables) should have shorter names
    • all variables and functions should have a comment describing their purpose where they are declared
    I still believe this is enough. (I actually thought this might be too long and considered cutting it down.) I really can't see how Uncle Bob can justify writing dozens of pages on this subject.

    Finally, I will mention that Uncle Bob loves his TLAs (Three Letter Acronyms) such as DRY, SRP, etc which I am not sure is a good or bad thing. The book gives the impression that he not only invented all these acronyms but also the ideas behind them. Of course, this is not true as the ideas have been known and practised (not under those names) for decades. (An arguable exception is IOC, though I have seen similar approaches used in the past.)

    Unit Tests

    Unit Tests have been my pet subject for more than two decades, so I was pleased that there was a chapter on them. Like many people I independently discovered what are now called units tests (in my case in 1985) but I gave them the unglamorous name of automated module regression tests.

    Again, this chapter is a little verbose at explaining the simple things. It also misses some important areas like mock objects and black-box vs white-box testing.

    The Unit Tests chapter also covers TDD (test driven development). TDD should have been given its own chapter, and explained in more depth, due to its importance (it is a lot more than just unit tests).

    In this chapter Uncle Bob also says that unit tests do not need to be efficient. I disagree. Unit tests are just like any other piece of code and should have the quality attributes required of them. Of course, production code is more likely to require optimization, but that does not mean that units tests should be slovenly. In fact, I remember reading somewhere else in the book where he (or another of the authors) says that tests should run fast to avoid not being run at all - another contradiction.

    Also on the subject of units tests, it is mentioned in the following chapter (page 136), that unit test code can be allowed access to the internals of an object. This is simply wrong, tests should only ever test the interface of an object never how it is implemented internally.

    Identifiers and Comments

    There is one area this book really goes off the rails. Uncle Bob insists that you should try to avoid adding comments in your code by instead using long descriptive identifiers. This is the old nugget of self-describing code which I have already refuted in a previous blog (see Self Describing Code).

    But Uncle Bob goes further by promoting the idea of creating even more identifiers. The first way he does this is by creating temporary variables solely for the purpose of giving them a meaningful name. His other idea is to extract bits of code into tiny little functions for the same reason. Both of these ideas I really hate, for many reasons (described below), not the least of which is that I already have enough trouble thinking up good identifiers without this extra burden.

    I will use a numbered list to emphasize how many things are wrong with this approach:

    1. Over the past few decades many people have been infatuated with the idea of self-describing code. In fact this was the guiding principle behind the design of COBOL. All have been failures. (COBOL was a successful language but it was recognized that this aspect of the language was a failure.)

    2. The whole idea that long descriptive identifiers are good and comments are bad is contradictory. In many ways identifiers are comments - the compiler doesn't care about the characters of the identifier, just that whenever it is used it is spelt the same way.
    “identifiers
    are comments”

    Using long variable names instead of comments makes no sense.

    3. Repeatedly typing a long variable name becomes more and more tedious. I know that many editors/IDEs provide name-completion but it is still distracting to have to look at a list of names and pick the right one (and name-completion propagates horrible typos).

    Worse, whoever has to read the code must scan these long, tedious names. You can skip over long comments but variables names are harder to skip as they are embedded in the code.

    Research has shown that identifiers that are too long (ie more than 16-20 characters) make code difficult to read. This affects the understandability of the code and consequently the maintainability.

    4. Uncle Bob promotes the idea that code should read like a well-written novel. This is something I wholeheartedly agree with. I guess then, a variable would be analogous to a character in the novel. Let's look at how characters in novels are described.

    When a (major) character first appears in a novel they are introduced to the reader with their full name, and any relevant description of their appearance and/or character, etc (in Dickens this can go on for many pages). Subsequently, the character is only referred to by their first or last name or even simply as him or her.

    These characters also have relatively short, easily remembered names. More importantly different characters generally have very different names so that they are easily distinguished from each other. (Though I have read novels which were very confusing because there were two characters with similar names.)

    The analogy in code is that you "introduce" a variable by declaring it with a relatively short (but descriptive) name as well as using a comment that describes its purpose. (In fact, this is something that I have required of my team members for the last 27 years - ie all variables and functions to be described when declared.) The important thing about the variable name is that it is easily remembered, that its name gives an indication of its purpose, and that it is quite different to other identifiers in the same scope.

    Using a very long, overly descriptive, name that tries to describe the full purpose of the variable is equivalent to repetitively re-describing the same character throughout a novel. This is as tiresome when reading code just as it would be reading a novel.

    5. Uncle Bob's assumption seems to be that all comments have to be read. Some comments (usually end of line comments) are additional tips in case you can't understand the code. If you understand the code you don't need to read the comments.

    6. The first criticism Uncle Bob has of comments (page 54) is that they "lie". I agree that many comments have absolutely no value or even, if incorrect, a negative value. Even if they are initially correct they can quickly become out of sync with the code as code changes are made and the comments not updated accordingly.

    I guess Uncle Bob is coming from the agile stance of "favour working code over documentation" (another thing with which I wholeheartedly agree).

    However, as mentioned in point 2 (above), identifiers are just as much "documentation" as comments. In fact, in my experience misleading identifier names are far more of a problem than misleading comments.

    The other point is that just because something is done badly does not infer you should stop doing it. (Did we stop making airships after the Hindenberg and other airship disasters? Actually, you may think that's a counter-example but airships are a great alternative form of transport that with modern weather-forecasting could be made very safe.) Instead we should find ways to improve comments (and identifiers). For example, in my experience comments are better written and better maintained in code that is subject to code reviews.

    Please note at this point that I do think we can eliminate the needs for some comments by using unit tests. In this case we are actually favouring working code (the tests) over documentation (the comments).

    7. One thing I haven't mentioned yet but really bugs me is Uncle Bob's penchant for taking a small expression or even part of an expression and turning it into a function. The sole purpose of this is to be able to add another identifier (ie, the function name) to describe what is happening, rather than adding a comment.

    Now I have nothing against short functions. I believe that the ideal length for a function is less than 10 lines, but these functions that are actually a fraction of a line are bad for several reasons.

    First, Uncle Bob mixes this reason for creating short functions with the other reasons. I believe he should make clear that he promotes short functions to aid organization and understanding of the code, but also is promoting adding short functions as a replacement for comments.

    Second, it moves the actual code somewhere else. If you really want to look at the code, not just the function name, you have to go and find it.

    Third, the name that is given to the function may make perfect sense to whoever created it, but it may be gibberish to a later reader of the code. In my experience, no matter how long and descriptive a name, someone (actually most, if not all, people) will misinterpret it.

    Fourth, I don't think it is a good idea to change the actual code for the purpose of trying to explain it. It is tricky enough to create quality code without this extra consideration.

    Lastly, this approach actually goes against the agile principle of favouring code over documentation. You are replacing a piece of working code with the name of a function, and as I mentioned above identifiers are more documentation than code.

    8. A similar one is using temporary variables for the purpose of introducing another descriptive identifier and hence avoiding a comment.

    Using too many temporary variables has several problems. First, they often lead to bugs for many reasons, such as not being initialized, being of the wrong type (eg leading to overflows), using the wrong one when there are many of them, etc.

    Another problem is that the code bloats when using lots of temporaries, which makes it difficult to understand. I would much rather read a concise one line expression (even a complicated one) than try to decipher 10 lines of code using half a dozen temporary variables.

    Finally, when temporaries are used for control flow it can be very difficult to understand what the code is doing without stepping through it. Control-flow based on variables begins to look like self-modifying code which has been regarded as unacceptable for more than 50 years.

    9. I don't know about you but I often have to manually type in an identifier. I'm not sure why but it might be because I use multiple systems and I need to search on one system for something I found on a different system. Or it may be that a colleague has asked me to search for the use of a particular variable in the code.

    Long variable names are very tedious and difficult to type in correctly, especially if they use incorrect/ambiguous camel-casing (eg Bitmap vs BitMap).

    This is just one more reason to use short, simple, descriptive, easily remembered, easily differentiated variable names.

    10.Here is an example of a function name taken from one of Uncle Bob's good code examples:

        isLeastrelevantMultipleOfNextLaregrPrimeFactor (page 145)

    The problem is that despite it being almost 50 characters long I still do not understand it's purpose.

    Contents/Index

    Unfortunately, ever since I first read K&R (with its brilliant index), I tend to judge a book on how easily I can find information.

    At least Clean Code has a table of contents and an index but there are problems such as wrong page numbers. The index is below average - for example, look up "error handling" and you are directed to pages 8 and 47-48. However, there is a whole chapter on error handling on pages 103-112. Again this is probably symptomatic of the book being written by different people.

    Further the Introduction mentions that the book is divided into three sections but it is not really clear where the sections start and end.

    Conclusion

    I like almost everything about this book, except for the few things I have mentioned above. It is definitely worth reading by any programmer.

    However, I will note that unless you use Java you may not get as much out of it as you want. And it seems to me that you need to be an advanced Java programmer to understand some of the chapters not written by Uncle Bob. There are also some areas that are repetitive and contradictory due to the use of multiple authors.

    One thing I didn't like was the excruciating detail in the first few chapters - but this might be useful for inexperienced developers. On the other hand there was not enough information in other areas such as TDD, Unit Tests and Concurrent Programming.

    Monday, 16 September 2013

    Customer Management

    There is often a disparity between what people want and what they need. (One example is the propensity to eat unhealthy food.) Fortunately, people are generally rational and well-educated in avoiding things that work to their detriment. (Most people have a reasonably healthy diet.)

    However, there is still one industry where customers seem to be unaware of the dangers of impulse buying. That industry is the software industry. Admittedly, customers (the people who pay for the software) have a much better understanding of what software can and can't do than they did a few decades ago. However, they are generally oblivious to what goes on under the hood which is a great shame since with greater understanding they could make better decisions.

    The Challenge

    It is in everyone's best interests to educate and involve the customer in the software development process. (Conversely, developers generally need to understand more about the business side of things, but I will leave that for another blog.) The customer needs to:
    • understand how software is developed and the importance of maintainability of the code
    • concentrate on why features are needed and not insist on a specific implementation
    • realize that short term thinking can incur a long term technical debt
    • become involved in the day-to-day development process
    • provide continuous feedback to ensure the project is moving in the right direction
    The challenge is to achieve the above ideal. If it is done at all, it is usually left to developer(s), such as analyst, system architect or technical manager, to educate the customer, but this can be a slow process. Getting the customer involved in, or even aware of, what the developers are doing can also be difficult. I make some suggestions later.

    YAA (Yet Another Analogy)

    In this sort of discussion, inevitably, (as in the under the hood reference above) I come back to the analogy with buying a car. I hope to show that when people buy a car they make reasonably informed decisions; when people (or companies) buy software, or software enhancements, they often make poor decisions.

    Almost everyone has a basic understanding of how a car works internally. When purchasing a car people will do some research or ask a knowledgeable friend for help. When I talk to my friends and colleagues, most of whom claim to have little knowledge of cars, they still have a very good understanding of the strengths of different brands, models and even the year of make. In fact everyone I have asked agrees on the most reliable brand.

    Despite the common joke, nobody really buys a car for the cup-holders.

    When buying software on the other hand there is far too much emphasis on the cup-holders, like GUI bells and whistles. Even most developers are oblivious to many things, such as trade-offs between different software quality attributes (see below). So it is no surprise that the "professionals" that customers consult (managers, analysts, salespeople, etc) have no idea about what is really in the customer's best interest.

    The Homer   

    This also reminds me of an episode of the Simpsons. The episode had Homer as the designer of a new car model. (His long-lost half-brother owned a car company and wanted to give the average person, like Homer, the car that they really wanted.)


    I can't remember much about the actual car except that it had two separate "bubbles" so that you could not hear the kids in the back making a racket.

    As you can imagine, "The Homer" was a disaster, as giving a person what they want (or what they think they want) is entirely impractical.


    This reminds of another old joke:

    Q. What's the difference between a car salesman and a computer (software) salesman.
    A. A car saleman knows when he is lying.

    Quality Attributes

    Way back last February I discussed the two different types of software quality attributes. (See The Importance of Developer Software Quality Attributes.) The main point of the post was that focusing on the customer quality attributes can be detrimental to the customer's interests, especially maintainability and especially in the long term.

    Customers, and non-technical stakeholders, need to understand the importance of developer quality attributes. Sometimes the developers have more important activities than adding more cup-holders!

    Feedback

    In the software industry the customer is not always right. In fact they are almost never 100% right and most of the time way off track.

    Of course, I am not saying that developers should arrogantly assume they know better than their customers. I am simply saying that most customers often only have a vague idea of what they want. And what they say they want is not what they (or their organization) actually needs. Further, what they say they want often evolves or even changes drastically. Different representatives of the customer may have very different views also.

    This is where regular feedback is so useful. It helps the customers and developers quickly hone in on exactly what is required. Unfortunately, generally developers get little and irregular feedback.

    Avoiding Disasters

    Just after I wrote the above I was watching one of my favorite TV shows - Air Crash Investigation. (This is the name in Australia. In other countries it may be called May Day or something else.) It is basically a documentary series about how planes crash. I like this show as it has a lot of information about how things go wrong which I find can often be applied to the design of software. Some of the accidents (actually surprisingly few) even have software problems as a contributing factor.

    In the episode I just saw, the aircraft was fine (ie all flight controls were working perfectly) except that the instruments were feeding them incorrect altitude information. Since it was at night and they were flying over the ocean they also had no visual feedback on their vertical position. To cut a long story short the plane flew into the sea and everyone died. There have also been other crashes and incidents where the aircraft flight controls were functioning perfectly but a problem was caused by instrument(s) feeding incorrect information to the pilots or the computers.

    This really drove home to me how important feedback is. The moral is if you don't want your software project to crash and burn you had better obtain accurate, frequent feedback from your customer.

    Customer Focus

    One of the principal tenets of software quality assurance (and QA generally) is a focus on the customer. It is correctly observed that many organizations become focused on what they are doing, not why (or for whom) they are doing it. Customer focus is a good thing!

    For a developer, obtaining customer focus is a problem, as most developers are many layers removed from the actual users of their software. Moreover, many developers take customer focus to mean intently responding to every request of the organization paying for the software (or usually one or two individuals of that organization), no matter how impractical or pointless they are.

    Instead you need to work with the customer to first identify what they need, and find the best way to give it to them. In the process you may need to persuade them to alter their perspective on what is important and consider the long term effects of what they are asking.
    developers do their customer a disservice by pandering to every whim of its representative

    Many developers do their customer (ie, the actual organization) a complete disservice by pandering to every whim of its representative (ie, the employee of the organization). This is not what is meant by customer focus!

    Employee Focus

    There is an old saying (from the quality assurance world) designed to emphasize the relative importance of the customer (the client(s) who pay the bills), the employees (ie staff) and the owners (ie shareholders). It is that CEO stands for:

        Customer
        Employee
        Owner

    My own rule of thumb is to use the 80:16:4 ratio. That is an organization should spend 80% of its effort in giving the customer what they want, 16% in employee assistance and development such as training, and 4% on investor relations.

    However, an idea that has recently gained some favor is the idea of employee focus rather than customer focus. This relies on the assumption that if you treat your employees well then they will in turn treat the customers well. In my experience, this does not always hold, but I guess that depends on the employees.

    A Sad Tale

    Speaking of keeping employees happy reminds me of an experience that I had in my 3rd year of work as a programmer. Unfortunately, I have oft seen similar things many times since.

    I was working for a big company (at the time the largest company in Australia). Soon after I started, our section began a new project that were we told was of strategic importance to the company and that was critical to be delivered on time to coincide with a new marketing campaign. The project was driven by one person from the "business" side of the company upon whom we depended for our direction and to whom we reported progress. However, we also interacted with his "big boss" and other representatives from the "business" side of the company from time to time.

    Although the project was fairly well specified compared to earlier projects there were inevitable problems and delays. There were also extra undocumented requirements added that were deemed critical to the success of the project by the "big boss". Our own project manager also added his own ideas on how and what should be done.

    Of course, a month before the deadline we were behind schedule but due to long hours and dedication we felt that there was a good chance of finishing by the immovable deadline. Then the project was simply canceled by the "bigger boss". No proper explanation was given. One concern was that perhaps that one of the "bosses" did not believe we would deliver on time. Eventually the explanation given was that there had been a "strategic change in direction".

    Obviously, this upset a lot of people who had worked very hard on the project. The "business analyst", who had put her heart and soul into the project, quit on the spot and several team members left soon afterward. Most of these people actually worked for a software contracting firm hired by the big company, so it was this company (rather than the "big company") that lost out due to loss of experienced staff and the effect on morale. (Though to be fair the "big company" was a major shareholder in the contracting company.)

    Customer Management

    There are three important aspects to what I call customer management.

    First, you need to educate the customer so they can make informed decisions. This can be a slow and painful process but the benefits are enormous. I provide a list of some of the things the customer needs to learn in the What Every Customer Should Know section below.

    Also, sometimes you need to say "no" to a customer request if it is a bad idea. (I talk about how to spot these bad ideas below in the section on The Customer's Customer.) Finding out the reason for the request (or the reason for the reason for the reason ...) can lead to a deeper understanding of what the customer needs rather than what they think they want. If the customer understands something of the development process (see above) they will be more likely to understand your point of view.

    Finally, you need to involve the customer in the development process for several reasons:
    1. so they learn more about how the software is developed (which assists their education);
    2. so they appreciate the effort involved; 
    3. (possibly most importantly) so they can provide feedback that drives the project in the right direction;
    4. last (but probably not least importantly) for developer motivation.
    I also discuss some ideas for getting the customer involved below.

    What Every Customer Should Know
    • between 50% and 90% of software work is maintenance = to save time and money there needs to be effort spent on making software maintainable
    • unit tests can add to the initial cost of developing software but the long term benefit is enormous
    • bugs can be a good sign! - they may indicate that the developers are refactoring the code to improve maintainability
    • but having good unit tests mean that the code can be refactored with almost no risk of bugs
    • sometimes things just take time
    • immovable deadlines can result in bad (quick and dirty) changes being made to the code
    • continual bad code changes (without a chance to refactor) can destroy the software
    • having good unit tests means that changes can usually be made more quickly with minimal chance of bugs
    • if a developer is reluctant to change what is working then it indicates that the code is in a poor state (or the developer does not know what they are doing)
    • a team without customer feedback is like a ship without a rudder
    • provide feedback about your goals, not preconceived ideas about how they should be achieved
    The Customer's Customer

    Sometimes a customer will insist on something that you are not sure about. To get a different perspective it is often useful to consider the customer's customer.

    In the excellent book 97 Things Every Software Architect Should Know, tip number 82 is called Your Customer Is Not Your Customer. The example given is developing software for a web site and the client says not to worry about the security (such as not using SSL). In this case the correct thing is to say no to the customer.

    When your customer asks for something that does not seem right, you need to consider the customer's customer(s). This allows you to clearly explain what you are doing or why you must say no to a request. Again this goes back to giving the customer what they need, not what they ask for.

    Getting The Customer Involved

    An important part of agile methodologies is to have the customer regularly use the software. For example, in Scrum there is a Sprint Review at the end of every Sprint. This is the best way to get the customer involved, by having working software which shows what has been recently accomplished.

    However, it can be difficult to get a disinterested client to even come to sprint reviews. Sometimes an incentive, like free chocolates, or just making it a bit more fun can help. Of course, a few reminder emails will not go astray.

    It is also worthwhile keeping all stakeholders in the loop, with regular emails or even a newsletter.

    An important thing is not to isolate developers from the client and users of the software. Encouraging this interaction not only gets the customer more involved in the process, but they also learn to appreciate how the software is developed and the difficulties that the developers have to deal with. Moreover, most developers are highly motivated by interacting with the actual users of their work!
    developers are
    highly motivated
    by interacting
    with the users
    of their work!

    Note that one potential problem with a lot of interaction between the client/users and the developers is the possibility of unauthorized changes being requested. It should be made clear to all, that all change requests are to go through the proper channels (eg, the Product Owner in Scrum).

    Scenarios

    Just to emphasize the points I invite you to consider these scenarios that are all based on real-life experiences.

    1. Your company is considering buying a smaller software company for a particular product that it has developed. Your boss insists that you make sure it is a good buy so their are extensive discussions with the two developers. You also have several people "road test" it to expose bugs and problems of usability, reliability, efficiency, etc.

    At no time does anyone run metrics on the source code or inspect it with regard to maintainability, etc. This is like buying a car without opening the bonnet and checking the log books. Actually I consider it to be even worse than that since maintainability of software is usually its most crucial attribute.

    2. The customer has an urgent requirement. There is not time to understand how to implement the change correctly. A quick and dirty fix is implemented. No time is allocated for repairing the band-aid solution later.

    The result may be a happy customer in the short term. However, there will be bad to horrendous consequences in the long term. If software is continually modified like this, and not refactored, it eventually becomes prohibitively expensive to maintain. I have seen code that has degenerated to such an extent that seemingly simple changes can require ten (or more) times the effort than they should.

    The customer needs to be made aware that this sort of development style is not in their own long term best interests. This is what is meant by technical debt - saving time and money in the short term means you will be paying interest which has far greater costs in the long term in the form of much higher maintenance costs.

    3. The customer asks for a significant change and is very pleased when the changes are delivered ahead of schedule. They next ask for a more important but apparently much easier change and are very disappointed when it takes many times longer than expected or can not even be delivered.

    The customer has no idea how the software works internally. They need to realize that sometimes apparently simple things need a major re-factorization to accomplish.

    4. Your software stores a large database of customer details in a web site. You know there are potential security holes which could allow someone to access customer information or even modify it.

    It is in the best interests of the user of the web site to have good security and to encrypt customer details. Of course, you would never even consider storing credit card details unencrypted, would you?

    Conclusion

    The idea of customer management sounds a little condescending, but the point of this post is about finding how to give the best value possible to the customer, in the long term, despite what the customer does, or says they want.

    Customers need to be educated to understand that there is more to software than the GUI. Insisting on strict deadlines and intolerance of bugs can have dire consequences for the maintainability of the software.

    Customers also need to realize that what they initially believe they want may be nothing like what they need. Often the only way to obtain a good result is to use a process of iterative refinement based on continuous feedback as described in various agile methodologies.

    Continuous feedback means that the customer needs to become more involved in the development process. It is essential they know what is happening under the hood in order to make the best decisions.