Tuesday, 24 July 2012

Dependency Injection

Last month I explained Inversion of Control (IoC).  A related term is Dependency Injection (DI) which is another technique for assisting modularity and decoupling.  The terms dependency injection and inversion of control often appear together so that some people think they are closely related or even the same thing.  This is not true, but DI is said to be an example of IoC as well as being a way to resolve dependencies in other uses of IoC.

There are many articles on DI on various web pages and blogs, and almost all are misleading or simply wrong.  They often describe design patterns that can use DI (like Strategy, Bridge, Adapter, etc) but do not explain DI.  For example, this article gives a beautiful description of the Bridge Pattern and it's advantages but wrongly calling it dependency injection.

In Brief

In essence, DI separates the mechanism for resolving references into a separate module, which is why it is good for modularity.  Remember the callbacks (and interfaces) that I talked about in the IoC post?  These are just function pointers that can point to any function with the correct signature.  An actual function pointed to is what the code depends upon to get the job done, so is referred to as a dependency.  How the address of a specific function is assigned to a function pointer is what DI is designed for.  A separate module resolves the dependencies and injects them into the other modules that require them.

An example always helps to make things clearer. The example below is in C, as I know that several readers of this blog are very familiar with C.  The fact that it is in C might also help to dispel the fallacy that DI is necessarily an object-oriented technique.

Version One - Original Code

The example is actually based on the first C program I ever wrote, which was an MSDOS program to display a list of files.  One of the features is that it would display the files sorted in different ways (by name, size, time of last modification, etc).  The information for the files was stored in an array of structures like this:

  struct dir_entry
  {
    char name[MAXFN];
    time_t modified;
    long size;
  } entries[MAX_ENTRIES];

Note that I have left out a lot of the code including #includes.  For example, you would need to include stdlib.h, string.h, assert.h, time.h, etc.

The code to display the files looked something like this:

  /* display.c */
  static int compare_file_name(const void *p1, const void *p2)
  {
     struct dir_entry * d1 = (struct dir_entry *)p1;
     struct dir_entry * d2 = (struct dir_entry *)p2;

     return _stricmp(d1->name, d2->name);
  }

  static int compare_modified(const void *p1, const void *p2) { ...
  static int compare_size(const void *p1, const void *p2) { ...
  static int compare_file_extension(const void *p1, const void *p2) { ...

  void display_files()
  {
    /* Get files of current directory into entries[] */
     ...

     /* Sort the files according to user preference */
     enum sort_type how = options_get_sortorder();

     switch (how)
     {
     case SORT_NAME:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_file_name);
          break;
     case SORT_MOD:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_modified);
          break;
     case SORT_SIZE:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_size);
          break;
     case SORT_EXT:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_file_extension);
          break;
     default:
          assert(0);
          break;
     }

     /* Display the files */
     ...
  }

As we saw in the previous blog this is an examples of IoC.  Control of the sorting is inverted by being passed to qsort() (a standard C library function).

This is a reasonable design.  First, it separates the code for sorting the files into a separate (standard) module (ie qsort).  There is also a separate module that handles the user options:

  /* options.h */
  enum sort_type { SORT_NAME, SORT_MOD, SORT_SIZE, SORT_EXT, };

  /* options.c */
  static enum sort_order current_sort_order;

  enum sort_type options_get_sortorder()
  {
    return current_sort_order;
  }

Diagram 1.

However, the above original code did not seem quite right to me  One thing was that the DISPLAY module knows how the sorting is done.  It seemed to me that only the SORT module should know any of the details of the sorting.  For example, if a new sort option was added then the above switch statement would need to be modified, to add a new case, and a new compare function written.

Version Two - Hide Sort Details

My instinct was to create a SORT module that wrapped all details of the sorting.  In fact I am sure that at least one tutor at university advised me that whenever code has to deal with details that are irrelevant to the task, then they should just be pushed down into a lower-level function.  (I later found that this is a simplistic approach but more on that in an upcoming blog post.)

So I created a new version of the code that hid the details of sorting in a separate SORT module.

  /* display.c */
  void display_files()
  {
    /* Get files of current directory into entries[] */
     ...

     /* Sort the files according to user preference */
     sort_files(entries, num_entries);

     /* Display the files */
     ...
  }

  /* sort.c */
  static int compare_file_name(const void *p1, const void *p2) { ...
  static int compare_modified(const void *p1, const void *p2) { ...
  static int compare_size(const void *p1, const void *p2) { ...
  static int compare_file_extension(const void *p1, const void *p2) { ...

  void sort_files(struct dir_entry *entries, size_t num_entries)
  {
     switch (options_get_sortorder())
     {
     case SORT_NAME:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_file_name);
          break;
     case SORT_MOD:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_modified);
          break;
     case SORT_SIZE:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_size);
          break;
     case SORT_EXT:
          qsort(entries, num_entries, sizeof(struct dir_entry *), &compare_file_extension);
          break;
     default:
          assert(0);
          break;
     }
  }

Diagram 2.

This is an attempt to decouple the sorting code from the display code, but is it any better?  It's basically the same as the previous code but has simply pushed the sorting into a subroutine.

In many years of reading (and often writing :) lots of bad code, I have come to dislike this approach.  It results in software that is very hard to decipher as function calls can end up being nested dozens of levels deep.  Moreover, it often means that related pieces of code can become separated resulting in a design that is inefficient and (more importantly) difficult to understand and modify.  It also seems to encourage duplicate code as you often see similar pieces of code all over the place.

Another indicator of a bad design is the number of modules that need to be modified to accommodate a simple change.  If a new sort order option was required then, using the above design, we would have to modify both the OPTIONS module and add a new compare option to the SORT module.

Finally, a principle you should always try to uphold is DRY (don't repeat yourself).  All the calls to qsort() in the above switch statement are almost identical.  Is there a way to eliminate this redundancy?

Version Three - Separating Comparison from Sorting

Looking again at the above code...  Is the  SORT  module the best place for the comparison functions?  Sure sorting needs to know how to call the appropriate comparison function, but this is just a matter of knowing the signature of the function.  A breakthrough was realizing that comparing can, and should, be decoupled from sorting.  This is an example of the strategy pattern - the caller of the strategy only need know the interface, while the concrete strategy (the actual strategy used) can be in a different module.

In the next version of the software I moved the comparison functions to the OPTIONS module.  A pointer to the current comparison function can be obtained from the  OPTIONS module and passed to qsort.  This avoids having the extra sorting layer - now the sorting "module" (ie, qsort) has the single responsibility of performing the sort.

  /* display.c */
  void display_files()
  {
    /* Get files of current directory into entries[] */
     ...

    /* Sort the files according to user preference */
    qsort(entries, num_entries, sizeof(struct dir_entry *), options_sort_strategy());

     /* Display the files */
     ...
  }

  /* options.c */
  typedef int (* PCOMPFUNC)(const void *p1, const void *p2);

  static int compare_file_name(const void *p1, const void *p2) { ...
  static int compare_modified(const void *p1, const void *p2) { ...
  static int compare_size(const void *p1, const void *p2) { ...
  static int compare_file_extension(const void *p1, const void *p2) { ...

  PCOMPFUNC options_sort_strategy()
  {
    switch (current_sort_order)
    {
    case SORT_NAME:
        return &compare_file_name;
    case SORT_MOD:
        return &compare_modified;
    case SORT_SIZE:
        return &compare_size;
    case SORT_EXT:
        return &compare_file_extension;
    default:
        assert(0);
        return &compare_file_name;
     }
  }


Diagram 3.

Now the  OPTIONS module is responsible for determining the sort order.  If new sort options are added only that module needs to change.  However, it is still up to display_files() to call options_sort_strategy() to find out how to sort.  Relieving the DISPLAY  module of this last burden is accomplished with DI.

Version Four - Dependency Injection

Using DI, the  OPTIONS module is given control (using IoC) of setting up the callbacks.  The inject_comparer() function was created to inject the comparer dependency into the  SORT  module.  It is invoked whenever the sorting order changes and sets the default sort order during startup.

  /* config.c */
  static PCOMPFUNC current_file_comparer;

  static void initialize()
  {
    current_file_comparer = find_default_comparer();
    inject_comparer();
  }

  void config_set_comparer(PCOMPFUNC f)
  {
    current_file_comparer = f;
    inject_comparer();
  }

  static void inject_comparer()
  {
    sort_set_comparer(current_file_comparer);
  }

  /* options.c */

    ...
    config_set_comparer(new_comparer);
    ...

  /* sort.c */
  static PCOMPFUNC comparer;

  void sort_set_comparer(PCOMPFUNC f)
  {
    comparer = f;
  }

  void sort_files(struct dir_entry *entries, size_t num_entries)
  {
    qsort(entries, num_entries, sizeof(struct dir_entry *), comparer);
  }



Diagram 4.

The main difference here is that the CONFIG module has control of setting the sort order.  It injects the comparer into the  SORT  module.  The  SORT  module is just a wrapper of qsort that also stores the current comparer.  The concrete comparer functions are no longer part of the display or  SORT  module, but are set up in the config module (eg, at startup, when the user changes the sort order, etc).

The advantage is that the configuration is dynamic.  The  CONFIG  module can even discover or create new comparers at run-time.

Moreover, the existing software does not need to change when a new comparer is added.  This is great for maintainability since it avoids code changes that have to be reviewed, tested, etc, and could potentially introduce new bugs.

Dependency Injection

As we saw above DI is useful when used with the strategy pattern; but, in general, it can be used with any design that uses run-time polymorphism.  The point is that there is a separate module that resolves references at run-time and injects them into the relevant module(s).  These references can be injected as an interface (ie, pointer to a table of function pointers), or anything that allows code to be executed indirectly.  In our C example we simply used a function pointer.


One thing I particularly like about DI is that new modules can be added without changing the existing software.  As long as the new modules can be loaded dynamically then it is just a matter of configuration to inject them into the system.  Maintainability is usually the most important attribute of software and this is the ultimate in maintainability as the existing code does not have to be modified or even rebuilt.

The concept behind DI has been used in a more limited way by Windows software that supported plug-ins.  (I first saw this used in Windows 3.1 more than 20 years ago.)  Plug-ins rely on DLLs that all implement the same interface (ie set of functions).  A plug-in manager is often responsible for discovering the plug-in DLLs (eg, by looking in a certain directory) and injecting them into the relevant parts of the system.

Finally, I should clarify the relationship between Dependency Injection and Inversion of Control.  DI is related to IoC in two ways.  First, it is an example of IoC since it inverts control by passing control of setting up dependencies to the CONFIG module.  It is also often used with other types of IoC in setting up dependencies, since IoC is usually implemented using callbacks (pointers to interfaces or pointers to functions as we saw in the blog last month).

Conclusion

DI is a useful technique for decoupling of modules.  It separates out the responsibility for making connections between other modules.  However, many of the advantages often put forward for DI are not due to DI per se, but to its uses with design patterns such as Strategy, Bridge, Adapter, etc.

The true advantage of DI is that it allows better configuration.  This can go as far as discovering and using new code at run-time (as in the plug-ins we discussed above).  It is very useful for connecting up mock objects for unit tests.  Also by allowing software to be extended without any change to the original code it is great for maintainability.

4 comments:

  1. An excellent article. Certainly helped me get to grips with the principles.

    Quick question, I think in your "Version Three" code, the MOD, SIZE and EXT cases need the qsort call removing to just return the function pointer? As per the SORT_NAME case?

    ReplyDelete
    Replies
    1. Thanks, you are right about the code. I will fix it.

      Delete
  2. Hi there, great tutorial, especially because it's written in c. All the other high level languages are masking lots of DI implementation details, and by just looking at DI from a point of function pointers, made a picture lot clearer. Keep up the great work.

    ReplyDelete