Tuesday, 19 June 2012

Inversion of Control


I first heard the term inversion of control (IOC) last year.  (Apparently it is mentioned in the GOF book which I read over ten years ago, but most of that book went over my head. :)  I thought I'd better do a bit of reading to find out what all the fuss is about.  After reading half a dozen definitions I have concluded that it is rather vaguely defined, or perhaps just not well understood by most people that talk about it.

My understanding is that essentially it's purpose is decoupling (a very worthwhile purpose) using the mechanism of callbacks.  Now you may think of callbacks as simply function pointers, but with object-oriented languages the mechanisms for "calling back" are less obvious (though under the hood it all boils down to function pointers) - I explain this in Object-Oriented Callbacks below.  Though the term only seems to be used in "object-oriented" literature, it can be applied in any language that supports function pointers.

What confuses most people (if they don't just ignore it) is understanding what is actually being inverted.  Essentially, using a callback mechanism allows a program or module to give control to a subservient module with the understanding it can get back control at relevant places (using callbacks).  So the "inversion" is simply that the calling module gives up control of a process to the called module.  See the code example in C, and diagrams, below if this is unclear.

Another term that is often seen in a related context is dependency injection.  This is related to inversion of control in that it separates how callbacks are found (or injected) into the calling module.  It is another example of separation of concerns to facilitate decoupling; and, in a way, is itself an example of inversion of control.

IOC Examples

The term inversion of control appears to have been first used in Designing Reusable Classes by Johnson and Foote, in the context of frameworks.  Traditionally (before frameworks) programs were created as a main program which called into pre-written library routines to accomplish tasks such user I/O.  As user interfaces became more advanced it was recognized that the control in many programs followed a pattern, so instead an approach was taken where a pre-written framework was invoked when the application started up which called back into the custom user interface code.  In a sense the main control was "inverted" by being passed from the application designer to the framework.  This had the advantage that much of the code of an application did not have to be rewritten each time, but it meant that the application had to conform to the framework's way of working.

Actually, even before the invention of frameworks, control had been inverted by modern GUIs.  For example, in the Windows operating system the main control of most programs is within the operating system in the "message loop" - as the user interacts with the program this sends messages back to the custom application code.  This mechanism afforded users much more freedom when compared to earlier software so some people also think of inversion of control as giving control to the user rather than having the software strictly control the process.

Inversion of control has actually been in use in software development for a very long time (though not given that name).  It is applied whenever a common complex process is recognized and incorporated into a reusable module.  The main process cedes control to this module but still retains relevant control of the behavior via callback(s).  The main advantage is that similar or identical code is not duplicated in every piece of software that needs to perform the process.  It also assists decoupling which is a primary goal of well-written software.

C Code Example

Here is a C code example which demonstrates inversion of control used in sorting.

It is a common requirement to need to sort a list of strings with options that control the ordering such as top-to-bottom, bottom-to-top, case-insensitive, etc.  The sorting code is left out below, for brevity, but the important thing is the calls to one of several functions that determines the relative order of the strings.

enum sort_type { ALPHA_TOP_TO_BOTTOM, ALPHA_BOTTOM_TO_TOP, CASE_INSENSITIVE_TOP_TO_BOTTOM, ... };


void sort(const char *str[], size_t num, enum sort_type how)
{
    ...


    bool is_before;
    switch (how)
    {
    case ALPHA_TOP_TO_BOTTOM:
        is_before = compare(str[index1], str[index2]);
        break;
    case ALPHA_BOTTOM_TO_TOP:
        is_before = compare(str[index2], str[index1]);
        break;
    case CASE_INSENSITIVE_TOP_TO_BOTTOM:
        is_before = case_insensitive_compare(str[index1], str[index2]);
        break;
    case ...
        ....
    }
    ...
}

This approach to sorting means that you have to write all the sorting code yourself, which gives a hierarchical structure where the main code calls the sub-routine sort() which in turn repeatedly calls one of the comparison routines.

Diagram 1. Sorting using Normal (Hierarchical) Control

Obviously, sorting is a very common thing to do, so it would be useful to encapsulate the sorting in a reusable module.  This is exactly what the standard C library function qsort() is intended for.
The standard C library function qsort() implements the "Quick Sort" algorithm.  Using qsort() is where beginners usually first encounter the concept of function pointers.
But to control the ordering of the sorted objects qsort() needs to make a call back into the calling code -- this is accomplished by a function pointer supplied as the fourth parameter to qsort().  A function pointer used like this is often called a callback.

    ...
    const char *str[] = {" Abc", "def", "123", "aaa", "abc", };   // test data


    int (*pcomp)(void *, void *);


    switch (how)
    {
    case ALPHA_TOP_TO_BOTTOM:
        pcomp = &compare;
        break;
    case ALPHA_BOTTOM_TO_TOP:
        pcomp = &reverse_compare;
        break;
    case CASE_INSENSITIVE_TOP_TO_BOTTOM:
        pcomp = &case_insensitive_compare;
        break;
    case ...
        ....
    }
    qsort(str, sizeof(str)/sizeof(*str), sizeof(*str), pcomp);
    ...


}


// Call back functions
int compare(const void *p1, const void *p2)
{
    return strcmp(*(const char **)p1, *( const char **)p2);
}
int reverse_compare(const void *p1, const void *p2)
{
    return strcmp(*(const char **)p2, *( const char **)p1);
}
int case_insensitive_compare(const void *p1, const void *p2)
{
    return _stricmp(*(const char **)p1, *( const char **)p2);
}
...

Diagram 2. Inversion of Control using a CallBack

Function Pointers as Callbacks

The idea of callbacks is almost as old as stored-program computers.  After all any subroutine has a starting address which can be stored in memory.  As long as the calling convention and parameters of different subroutines are the same they can be substituted for each other, giving an effect now known as polymorphism.

Numerous uses for addresses of functions (commonly called function pointers) were found in assembly languages and they soon found there way into high-level languages like PL/1, C and Pascal.  We saw above how function pointers can be used for callbacks from the standard C library function qsort().

Object-Oriented "Callbacks"

With the advent of object-oriented languages, inversion of control became easier and safer.  However, it is a bit more convoluted so I will take some time to explain it.  As usual we have the calling code that relinquishes control and the receiving module or class that controls the process.  However, instead of passing one or more function pointers (callbacks) for the controlling module to call, a pointer (or reference) to an object is supplied.

This "callback" object either inherits from an abstract base class or implements an interface.  Note that in C++ we use an abstract base class, since the language does not support interfaces like newer languages (such as Java and C#).  However, they are essentially the same thing so for simplicity I will just use the term interface.  Of course, the idea of using an interface (ie, a set of functions) for polymorphism is a very old one but has only recently been formalized in object-oriented languages.

An object that implements an interface basically just provides an array of function
A vtable is not to be confused with a delegate (as in C#).  A delegate is an array of function pointers where each function has exactly the same signature (parameters etc).  Whereas a vtable (or interface) is an array of function pointers where each function has a different purpose so the signatures are not necessarily the same.
pointers.  In C++ this is called the "vtable", because each of the function pointers points to a "virtual" function which can vary depending on which particular class implements the interface.  Each function of the interface has a name (at least at compile time) and an signature which says precisely what parameters it is passed, what it returns, etc.  Polymorphism is achieved by the fact that different classes can implement the same interface in different ways.

Interface Example

Now I will show the same sorting code that we saw above using an "interface" instead of a function pointer.  Note that this code is in C++ which does not support interfaces so we use an abstract base class instead.  If written in Java or C# we would define a comparer interface, then each comparer class would implement (inherit from) the comparer interface instead of inheriting from an abstract base class.  The sort function is then simply passed a reference (pointer) to the interface.

Of course, for this example the interface only needs one function (which takes two strings and returns an int).  The advantage of an interface or abstract base class over a simple function pointer is that it can define more than one function.  Also since an interface is attached to an object, the object itself may have data (private members) that are used in calls to the interface.

class IComparer{public: virtual int compare(const char*, const char*)=0;};


void sort(const IComparer & comp, const char * str[], size_t num_strings);


// Implements normal alphabetic compare
class alpha_compare : public IComparer
{
public:
    int compare(const char *s1, const char *s2) {return strcmp(s1, s2);}
};


// Implements case-insensitive compare
class case_insensitive_alpha_compare : public IComparer
{
public:
    int compare(const char *s1, const char *s2) {return stricmp(s1, s2);}
};


...
    const char *str[] = {" Abc", "def", "123", "aaa", "abc", };   // test data


    ...
    alpha_compare ac;                       // sort using normal alpha compare
    sort(ac, str, sizeof(str)/sizeof (*str));
    ...

Diagram 3. Inversion of Control using an Interface

Now the module or class that gets control accepts an interface (technically a pointer to an object of a class that derives from the interface) and the caller that relinquishes control implements the class derived from the interface.  This class implements the call-back function (in the above code comparer::compare()).

Misuse of Inversion of Control

Like many new ideas, inversion of control tends to be used when it is not really
I will have more to say about this in a future blog on the Gas Factory anti-pattern.  Following fads and experimenting with novel designs is a major contributor to this problem.
necessary and even when it is detrimental to a design.  It is important to understand different options when designing software then choose the most appropriate.  Usually, the most appropriate is the simplest (YAGNI).

If you know anything about XML you may know that there are two main programming "APIs" that were invented for processing XML files.  The first is DOM that allows you to very easily read and modify an entire XML file.  The problem with DOM is that it usually requires storing the entire contents of the XML file in memory; for a large XML file this may be impractical.  So SAX was invented to allow sequential processing of an XML file from start to finish basically so that XML data of indefinite length could be processed.

With SAX the main program (using the inversion of control principle) hands over control of the XML processing to the SAX module which uses callback functions to signal when it encounters something that may be of use to the caller.

There are many problems with SAX, such as the fact it is very time-consuming, inflexible and tedious to use.  Despite being designed for efficiency (as opposed to DOM) it is often difficult or impossible to accomplish some tasks efficiently, both in terms of time and memory use.  When it is compared with certain alternatives (such as the .Net XmlReader class) it is glaringly obvious that it is a poor design.

The main problem with SAX is that, in this case, it is a poor idea to invert control.  Passing control to the SAX parser results in all sorts of problem for the callbacks.  For example, the callbacks usually have to store some (often a lot of) information between calls.  Even just ensuring that elements are parsed in the correct order will require "state" information to be kept.  Further, the whole file must be parsed even if most of the data is irrelevant - because control has been given to the SAX parser there is no way to control which parts of the file are used.

Conclusion

Inversion of control is simply the technique of passing control of a process to a subservient module.  Then, when necessary, control is handed back to the caller using callbacks.

This technique, under different names, has been used for probably close to half a century.  It is a very powerful technique for separating a process into a separate module, while still allowing necessary control.  However, as we saw with SAX it is not always appropriate.

Finally, I was also going to discuss the related topic of dependency injection, but this blog post is long enough already, so I will have to cover it in a later post.

Friday, 1 June 2012

Self Describing Code

I'm currently reading a book called Clean Code by Robert (Uncle Bob) Martin, and I intend to post a review when I have finished it.  Generally, there are a lot of good ideas in it, but one that I find awful is the idea of what is commonly called self-describing code.  Though Uncle Bob never mentions that specific term (perhaps due to its previous bad press), he prescribes to all the main tenets as exemplified by these quotes:

"[a name] should tell you why it exists, what it does, and how it is used" - page 18

"Don't be afraid to make a name long." - page 39
"comments compensate for our failure to express ourself in code" - page 84
"If a name requires a comment, then the name does not reveal its intent." - page 18
"A long descriptive name is better than a long descriptive comment." - page 39

Self Describing Code


I recall the idea of self-describing code being bandied about in the early 1980's.  I thought it had been discredited and forgotten.  It seems to have had a resurgence.


I believe it all began when many coding standards insisted that all code be commented.  This had the dreadful consequence that many useless comments were often seen.  Moreover, as the code was modified the comments became out of date.  The consequence was that anyone trying to understand the code did not read the comments because they were, at best, of no value whatsoever, and, at worst, grossly misleading.


Many people then had the idea that maybe we don't even need comments.  With a simple and elegant design and meaningful variable and function names the code can describe itself.  Further, it can't become out of date the way comments can.  Then they would give a beautiful example showing how some simple code could be written, and be understandable, without any comments.


Counter-Argument


There are a few problems with the above argument.  First the names of variables and functions can become out of date just as easily as comments can -- and in my experience misleading variable names are far more common than misleading comments.  A bigger problem is that most real-world code is not as simple as the sort of examples usually given; it is not always possible to refactor the code to a state where it is self-describing, or do so in a reasonable amount of time.  (Of course, writing good comments can take some time too, but a lot less than rewriting the code.)


The worst consequence of the idea is that many programmers use it as an excuse for not adding comments, arguing that their code is self-describing.  Usually it is really that they cannot be bothered to explain the code in words.  I would go even further and say that writing good comments can make the coder think more closely about what the code is doing which, in turn, can lead to better code and less bugs.


Finally, self-describing code is usually (paradoxically) harder to read.  It usually involves very long identifiers which clutter the code and make it difficult to quickly scan (see my blog post Long Identifiers Make Code Unreadable).  Uncle Bob also recommends creating lots of temporary variables and tiny functions for the sole purpose that the extra identifiers can be have long, meaningful names that explain the code.  This is what makes the code harder to scan - see the example below.


Example


In the section on choosing descriptive name (Clean Code page 309), Uncle Bob gives an example of bad code and how it should be fixed.  Let's look at his solution and how it could be better.


First we look at the original:

public int x()
{
    int q = 0;
    int z = 0;
    for (int kk = 0; kk < 10; ++kk)
    {
        if (l[z] == 10)
        {
            q += 10 + l[z+1] + l[z+2];
            z += 1;
        }
        else if (l[z] + l[z+1] == 10)
        {
             q += 10 + l[z+2];
             z += 2;
        }
        else
        {
            q += l[z] + l[z+1];
            z += 2;
        }
    }
    return q;
}


I agree the variable names are lousy, but if Uncle Bob was honest about it I think even he would admit that nobody really writes code like this.

Lets look at Uncle Bob's refactoring:

public int score() {
    int score = 0;
    int frame = 0;
    for (int frameNumber = 0; frameNumber < 10; frameNumber++) {
        if (isStrike(frame)) {
            score += 10 + nextTwoBallsForStrike(frame);
            frame += 1;
        } else if (isSpare(frame)) {
            score += 10 + nextBallForSpare(frame);
            frame += 2;
        } else {
            score += twoBallsInFrame(frame);
            frame += 2;
        }
    }
    return score;
}

This is much better, and makes the code fairly obvious especially if you are familiar with the rules for scoring in Ten Pin Bowling.  There are some obvious problems with the code - for example, the integers frame and frameNumber have confusingly similar names.  Also are the magic numbers (with value10) in lines 4, 6, and 9 related?

My biggest concern is with the very small functions used to replace less than a single line of code.  These were simply introduced in lieu of a comment to try to explain what the code does.  The trouble is the (fairly long) names don't really tell me what the function does.

Looking at the name nextTwoBallForStrike I might be able to guess what it does, especially if I know the scoring rules, but personally I would still be inclined to check just to make sure.  All this takes time and prevents fluid reading of the code.

This is how I would have refactored the code:


static const int max_frames = 10;                    // Number of frames in a game


// Returns total score for a game of ten-pin bowling. On entry the number of fallen pins
// for each ball is provided in fallen[] and the number of balls bowled is in balls_bowled.
public int score()
{
    int points = 0;                                            // Cumulative count of points for the game
    int ball = 0;                                               // Current shot or "ball" (1 or 2 per frame)
    int frame;                                                  // Current frame of the game


    for (frame = 0; frame < max_frames; ++frame)
    {
        if (fallen[ball] == 10)
        {
            // Strike (all pins knocked over from 1 ball) gives bonus of next two balls
            points += 10 + fallen[ball + 1] + fallen[ball + 2];
            ++ball;                                                 // Move to next frame (no 2nd ball)
        }
        else if (fallen[ball] + fallen[ball + 1] == 10)
        {
            // Spare (all pins knocked over in 2 balls) gives bonus of following ball
            assert(fallen[ball + 1] > 0);
            points += 10 + fallen[ball + 2];
            ball += 2;                                             // Move to first ball of next frame
        }
        else
        {
            // Open frame just gives score of all fallen pins of both frame's balls
            assert(fallen[ball] + fallen[ball + 1] < 10);
            points += fallen[ball] + fallen[ball + 1];
            ball += 2;                                             // Move to first shot of next frame
        }
    }
    assert(frame == max_frames && ball == balls_bowled);
    assert(points <= max_frames*30);                // Maximum possible score
    return points;
}




How is this any better than Uncle Bob's version?  The main difference is the comments.  There are actually two types of comments: overview comments that appear above a section of code and explanatory comments that appear to the right of the code.

The overview comments allow you to get a quick idea of what is happening without trying to understand the code.  Scanning the above function it would take at most a few seconds to understand roughly what it is doing.  When you work with software of thousands or millions of lines of code then being able to track down what you are looking for quickly with these sorts of comments can save hours or even days of time.

Admittedly, Uncle Bob conveys the same information by adding new functions, with descriptive names, rather than adding comments.  However, I think the comments add a lot more useful information, especially for someone unfamiliar with the scoring system in ten pin bowling (ie, most people).  Moreover, the comments allow you to see if the code is of interest and whether you need to inspect the actual code.

An explanatory comment is additional information at the right end of a line of code.  You can often simply ignore them since reading the code may be enough if you already know what it is trying to do, but you can refer to them if you need more information.

When reading these sorts of comments I usually try to understand the code first then read the comment and if the comment makes no sense I will go and read the code again.  (Most of the time this is because I did not read the code properly.)  This method of using comments to double-check my understanding of the code has saved me from numerous stupid mistakes.

I also made a few other improvements to the code.  I replaced the magic number 10 with the named constant "max_frames" as the number of frames in a game of ten pin bowling can possibly be more than 10.  It also differentiates it from the other magic number 10, which is the number of pins used.  (I didn't add a named constant for "max_pins" as it does not vary and feel it would make the code harder to read.)  I also added a few assertions to check for internal consistency and to ensure that the data supplied to the function makes sense.


Little Functions and Temporary Variables


Uncle Bob's strategy is thus to introduce extra little functions and temporary variables, with long explanatory names, in lieu of comments.


The problem with little functions is that they make it hard to quickly scan the code and spot problems.  (In fact the very useful facility now provided by many languages is that of lambda functions.  These allow you to avoid having to create little functions, which is the exact opposite of what Uncle Bob is trying to achieve.)  It's always better to have the code there where you can see it than to have to go and find a function, or worse not check but wrongly interpret the function's behavior based on its name.


Lots of temporary variables have always been a sign of an inexperienced programmer, at least to me.  Often what is written in ten lines of convoluted code can be expressed in one or two lines without all the temporary variables.  All the extra code make it harder and slower to read.  They also increase the chance of an error due to a name conflict or typo.

Maintenance of Comments

OK, so maybe comments can be useful when the code is first written but what about the major argument for self-describing code that comments rapidly become out of date and useless as the code is modified.  Admittedly this is a problem, but just because something is done badly does not necessarily mean you should abandon it.  For example, the design of most if not all software degrades over time (see my blog post Why Good Programs Go Bad) - using the above argument we should abandon any attempt at a good design.

I have found that code reviews are an effective way to ensure comments are maintained.  In the places I have worked, comments are noticeably better where code reviews are done.  To ensure comments are maintained a good idea is to add it to a check-list for code reviewers.  (Another item for the checklist is that identifiers, such as variable names, have also been updated to reflect their current use.)

Conclusion

Abandoning comments and trying to make code self-describing is a bad idea; well-written and maintained comments are absolutely invaluable for saving time and preventing defects.  Overview comments save time by allowing you to quickly scan code and drill down to the relevant location to make a change.  Comments also provide more information when you do not fully comprehend the code you are reading.

The most important thing is that comments allow you to double-check your understanding of the code.  I have lost count of the number of times that I have not fully understood some code I was reading and been saved from making a silly mistake when a comment made me aware of my misunderstanding.  I may be not as good at reading and understanding code as Uncle Bob and other super programmers, but then not everybody who has to read code is a super programmer.