Saturday, 7 December 2013

Unit Tests - Personal Experiences

Blogs are supposed to be about personal experiences, so I'm going to talk about my own personal experiences with Unit Tests, or as I first called them Automated Module Regression Tests. (This continues my coverage of Unit Tests which started last month and will continue next week when I talk about the problems of Unit Tests and how to overcome them.)

Discovery

Like a lot of people I discovered Unit Tests a long time ago but did not realise there full importance for a long time.

In 1985 I was working on several MSDOS projects for a small software company called Encom. We were using Microsoft C version 2.0 which was a re-badged version of Lattice C. In 1985 or 1986 Microsoft released version 3.0 of their C compiler (which I'll abbreviate as MSC3) which was, in fact, their Xenix (MS variant of UNIX) C compiler ported to MSDOS.

I had the job of rebuilding all our software with the new MSC3. This taught me a few subtleties of portability (especially about writing structs to binary files). For example, when Lattice C creates the memory layout for a struct it builds its bit-fields top-down (ie, using the top bits of the bit-field storage unit first), whereas MSC3 (and most C compilers) build bit-fields from the LSB up. This caused a bit (no pun intended) of havoc with our binary files.

Luckily, apart from the bit-fields problem and a few places where it was assumed that char was unsigned, our code was very portable, but one problem was that I'd used some Lattice string routines. The MSC3 run-time library instead had most of the "de facto" string routines found in UNIX compilers and omitted to implement the Lattice ones. (Microsoft provided a header file called v2tov3.h to assist with portability but it was pretty much useless.)

Lattice String Functions       

Note that I found the Lattice string routines very useful and better thought out and named than the MSC3 (ie, UNIX) string routines. For example, routine names had a prefix indicating what was returned (which connoted not only the type but the purpose of the value). For example, a function returning a pointer to a string had the prefix "stp". 


This is strangely reminiscent of Hungarian notation (ie, Application Hungarian not System Hungarian) that Microsoft revealed soon afterwards.

A few of the Lattice string routines had no equivalent in MSC3, so I had to rewrite these routines for the new C compiler. I gave each of the Lattice functions its own source file and, as I believe was common practice even then, I added a test rig at the end of the source file, using #ifdef  TEST, something like this:

/* stpblk - skip leading blanks
 * port of Lattice func to MSC3
 * ... */
#include <ctype.h>
#include <assert.h>

char *stpblk(char *str)
{
   assert(str != NULL);
   while (*str != '\0' && isspace(*str))
      ++str;
   return str;
}

#ifdef TEST /* use -DTEST cmd line option to build test rig */
int main()
{
   char buf[256];
   do
   {
      gets(buf);
      printf("stpblk on <%s> returned <%s>\n", buf, stpblk(buf));
   } while (buf[0] != '\0');

   return 0;
}
#endif /* TEST */

This test rig allowed me to enter a test string and see the result of the call to stpblk(). That way I could do a lot of exploratory testing to check that the function was working correctly.

It occurred to me that it would be more thorough to create a complete set of test cases and code the tests directly, rather than the more haphazard approach of manually trying all sorts of different values. That is, something like this:

/* ... as above */
#ifdef TEST
int main()
{
   assert(*stpblk("") == '\0'); /* Test empty string */

   assert(*stpblk(" ") == '\0'); /* just whitespace */
   assert(*stpblk(" ") == '\0');
   assert(*stpblk("\t") == '\0');
   assert(*stpblk("\n") == '\0');

   assert(strcmp(stpblk("a"), "a") == 0);
   assert(strcmp(stpblk(" a"), "a") == 0);
   assert(strcmp(stpblk("\ta"), "a") == 0);
   assert(strcmp(stpblk("a "), "a ") == 0);

   assert(strcmp(stpblk("abc"), "abc") == 0);
   assert(strcmp(stpblk(" abc"), "abc") == 0);
   assert(strcmp(stpblk("abc "), "abc ") == 0);
   assert(strcmp(stpblk("a b c"), "a b c") == 0);
   assert(strcmp(stpblk(" a b c"), "a b c") == 0);

   assert(strcmp(stpblk(" \xFF"), "\xFF") == 0);

   stpblk(NULL); /* should cause assertion */

   return 0;
}
#endif /* TEST */

I could then run the tests on the original Lattice compiler functions and my supposedly equivalent versions of the same functions to make sure they produced the same result. Further, if the function later needed to change (eg, to make it faster) I could simply re-run the tests to make sure it still worked. However, there were a few difficulties:
  • sometimes creating the tests took ten times longer than writing the code!
  • none of the tests actually found any bugs
  • there was no way to write tests to check that assertions worked without manual intervention (see last test above)

First Appraisal

Of course, my next thought was maybe we should do this for other functions/modules in our software. However, after some thought I rejected it because:
  • it was "impossible" to write tests for most of the code as it was mainly UI
  • a lot of code interacted with hardware which would require simulation
  • the cost/benefit ratio would not make it worthwhile
  • but mainly, the tests did not find any bugs

Another Attempt

I put the idea aside for awhile. I started working for a larger company (AMP) in 1986, and most of my work was user-interface stuff, and I assumed that automated tests like these would be impossible to create for that sort of code.

However, after a year or so I was given charge of a team with the responsibility of creating a library of modules designed to model many of AMP's policies (initially some new superannuation products but eventually we were to model many others such as insurance products, etc).

To me this seemed like an ideal application of my earlier discovery because:

  • each module was concerned purely with calculations - there was no messy interaction with the user, hardware, 3rd party libraries, etc
  • the calculations were complex and it was important to check that they were correct for all boundary conditions
  • the modules were likely to change as the company actuaries sometimes changed the rules
  • sometimes the initial code was far too slow - we would need to check that nothing was broken after optimization

It was around this time that I invented the term Automated Module Regression Tests (AMRT). I tried to promote the idea to management. Unfortunately, they were not keen on the idea as I admitted that it would at least double development time. Another problem was that the company had only recently embarked on a automated testing project which had been a major failure and anything that suggested "automated testing" was looked on with scorn. Finally, the technical manager of the section (PD) was very keen on some nonsense called "object oriented programming" and was not to be distracted from OOP by other silly ideas.

SQA

I left AMP (and C coding) soon afterwards. My next job was mainly working with assembler and performing sys admin tasks (UNIX), though I still did a bit of system programming in C.

None of these areas gave me any opportunity to try out AMRT, but, in 1993, I did a postgraduate course in SQA (software quality assurance) and I came to appreciate many of the subtleties of testing. (Only a small component of the course dealt with testing since quality assurance is about a lot more than testing.) Some important points on testing that I learnt:
  1. It is important to test code as soon as possible (see JIT testing) to avoid all sorts of problems (see Cost of Delay in a previous post on Change).
  2. Studies have shown that typical (Black Box) testing finds only 25% of all bugs.
  3. Studies show even the most thorough user testing finds less than 50% of bugs.
  4. It is impractical to test all combinations of inputs even for simple modules.
  5. Programmers should test their code thoroughly as only they know how it works.
  6. Code coverage tools should be used to check that all (or most?) of the code is executed during testing.
  7. Ongoing changes cause most bugs even in projects with a low initial bug count.

However, no mention was ever made about Unit Tests. To me it seemed that AMRT was an ideal solution to many of these problems. First, the tests can be written at the same time as the code - an example of JIT. Moreover, because it is done by the original coder it can be directed at areas where there might be problems (or where future modifications may cause bugs). But the major advantage I saw is that it allowed the code to be modified with greatly reduced chances of changing existing behavior, ie creating new bugs.

In 1994 I got to work as the Team Leader on a new project. This gave me an ideal opportunity to try several new things that I had been wanting to try:

* C++ development
* Barry Boehm's Spiral Development Methodology
* AMRT (ie, Unit Tests)

The project was a major failure but I did learn a few things about AMRT. First, they made making changes easy. Second they could work as a form of documentation.

Agile

In 1995 I rejoined Encom to get back to programming (in C and later C++). Again I tried to convince others of the benefits of AMRT. Then in the late 90's a colleague (SR) introduced me to XP (Extreme Programming). I actually wasn't that impressed with XP when I first read about it as it mainly seemed to be a re-hash of many ideas I had already seen proposed in the SQA course. However, after a while I did realize that at the core of XP there were some new ideas which seemed to address a lot of problems that I had even recently encountered in software development.

One of the practices in XP was Unit Tests. I did not realize for a while that what was meant by Unit Tests was exactly what I called AMRT. And I don't believe that XP emphasizes Unit Tests enough.

Conclusion

I took a very circuitous route to come to a full appreciation of Unit Tests. Initially I did not appreciate their worth. When I tried them they did not find any bugs. When combined with the obvious costs of creating them they did not seem worthwhile.

Over many years I slowly came to realize that the benefit of Unit Tests is not that they find bugs, but it is that you can run them at any time to ensure that the code behaves correctly. This allows you to refactor code without fear. It also allows the software to easily evolve which is fundamental to the Agile approach. There are other benefits which I mentioned here last month. Further the costs can be overestimate and mitigated by tools and techniques which I will talk about next month.

What is very surprising to me is that Units Tests are not given more prominence in Agile development. Having read many books on Scrum I have not seen them mentioned once! Even in XP they are not explained well and not emphasized enough.
   Unit Tests...
allow software
to evolve

A fundamental aspect of the Agile approach is that the code keeps changing in response to feedback from the users of the software. Unit Tests are absolutely crucial to this approach since they enable the changes to be made that allow the software to evolve.

No comments:

Post a Comment