Sunday, 2 June 2013

C++11 and Lambda Functions

Lambda functions are a great addition to C++ which I encourage you to try. They are especially useful in combination with the STL (standard template library). After a long wait lambdas have been added to the new C++ standard (known as C++11, released in August 2011). This makes me very, very happy (mainly because it greatly simplifies using the STL algorithms :). However, I am not so happy that it took so long.  :(Even C# has had lambda functions for five years.)

If you don't know what they are here, is a quick introduction to function pointers and lambdas...

Function Pointers, Etc

The C language has had function pointers from the outset. Function pointers are absolutely essential for implementing many design patterns that have been invented/documented in the last few decades. For a full explanation of function pointers see: http://www.cprogramming.com/tutorial/function-pointers.html.

Of course, with the arrival of C++ the use of function pointers has been supplanted to a great extent by the use of objects with member functions (sometimes called methods). However, behind the scenes function pointers are the enabling technology. After all a VTable is just an array of function pointers.

My first use of function pointers (like most people) was with the standard C library function qsort(). However, I always found qsort() extremely tedious to use since you have to create a comparison function (and think really hard to get the casting right). For example, to sort an array of strings you need something like this:

  static int compare(const void * s1, const void * s2)
  {
      return strcmp(*(const char *)s1, *(const char *)s2);
  }

     ...
     qsort(&str[0], count, sizeof(*str), &compare);

I always found it distracting and error-prone to have to go somewhere else in the code and create this comparison function (and make sure you get the casts correct). Wouldn't something like this be simpler:

   qsort(&str[0], cnt, sizeof(*str), {return strcmp(_1, _2);} );

[ Note that the above is not valid C (or C++) code.]

Why couldn't the compiler use the little bit of code {return strcmp(_1, _2);} to create an unnamed function and insert a pointer to that? Well with lambdas the compiler does that and more.

Function Literals

I have always thought of this idea of unnamed functions as being more like function literals. Please let me explain my approach to thinking about lambda functions...

First consider a basic C function. The code in a function never changes so it is analogous to a scalar constant (an identifier preceded by the word "const"). For example compare these two lines:

  const float pi = 3.141592653;       // float constant
  double sq(double x) { return x*x; } // function constant

The constant value above has a name [pi], a type [float], and a value [3.141592653]. Similarly, the function above has a name [sq], a type [double(*)(double)], and a value - the code [return x*x;].

Further a function pointer is like a variable since you can assign various function constants (addresses of functions) to it.

  float ang;                          // float variable
  double (*pf)(double x);             // function "variable"

  ang = pi; // assign constant "pi" to variable "ang"
  pf = sq;  // assign function "sq" to function pointer "pf"

Finally, you can think of a lambda function as a function literal.

  ang = 90.0;         // assign floating point literal to "ang"
  pf = [](x){ return x*x; }; // assign function literal to "pf"

In the last line we assign a lambda function to a function pointer.  This is valid in C++11 as long as the lambda function does not capture any variables (see below).

Lambdas and STL

The STL is amazingly powerful and useful, but one problem is its dependence on little fragments of code. Most algorithms and some of the containers rely on predicates to answer questions about different values. Predicates are simple functions (or function-like objects) that take one or two parameters and return a bool value about it/them. For example, the set container needs a predicate that takes two objects and returns a boolean indicating if one object is "less" than the other. Further, there are also other areas which use function-like objects such as the mathematical operations (plus, modulus, bit_and, etc).

When you need one of these fragments of code you can often use one of many predefined function objects that are provide by the STL (such as the less function-object), but sometimes you need just a little bit more. Often you can construct what you need using negate, bind2nd, ptr_fun, and other function adaptors, but this approach quickly creates a mess of function calls that can be difficult to create and impossible to decipher. As a simple example, here is an algorithm that finds all values less than or equal to max_value:

   partition(c.begin(), c.end(),
             bind2nd(less_equal<int>(), max_value) );

An alternative to using the STL's facilities is to use a pointer to a special function (as we saw for qsort() above). If you need to "capture" values required for the operation then you need to create a class that implements operator() and saves values in private members - that is, a function-like object or functor.

  class compare_to_max
  {
  private:
     int max_value;
  public:
     compare_to_max(int m) : max_value(m) { }
     bool operator()(int v) { return v <= max_value; }
  };

  ...
    partition(c.begin(), c.end(), compare_to_max(max_value));

 In either case you need to create a special function or class, which is tedious and distracting. Further, the actual code, even if just a simple expression, is isolated from its use - better to have it in the code where it is being used so you can see it at a glance.

Now, with C++11, you can just use a lambda function like this:

   partition(c.begin(), c.end(),
             [=] (v) { return v <= max_value; } );

I think anyone would agree that this is simpler and clearer than the above alternatives. The code is simpler, less error-prone and easier to read and maintain.

Capturing Variables

In the above example you may have noticed that I used  an outer scope variable, max_value, inside the lambda function. This is possible because I used [=] at the start of the lambda where the equals sign (inside the square brackets) indicates that variables are captured (by value) from the containing scope.

Behind the scenes the compiler is creating an object that stores the captured variables. This object would look rather like the class compare_to_max we saw above, with a private integer member (like max_value) which captures the value of the max_value in the outer scope where the lambda is defined.

There are actually many options for capturing values. You can capture by value, reference or even const reference. You can also control if and how individual variables are captured if you want to. Personally, I think all the options and facilities in C++11 for lambdas are unnecessary. The simpler way that lambdas are implemented in C# is just as useful. For example, in C# all variables if used are captured by reference which is efficient and simpler. I guess C++ provides capture by value (and capture by const reference) so you don't unintentionally modify variables outside the scope of the lambda function - but since the code for the lambda is right there where it is declared then you can instantly see what is being modified anyway.

Lambda functions in C# have the advantage of being simpler than those in C++ without any real disadvantage that I can see. For example, compare the lambdas for the simple operation we saw above:

    C++: [=] (int v) { return v <= max_value; }
    C#:    v => v <= max_value

The C# code might be confusing until you learn that => introduces a lambda function and is unrelated to <= (less than or equal to operator) or >= (greater than or equal to operator). This C# lambda says that the function has one parameter (v) and since it is an expression it just returns the result of the expression (v <= max_value). Of course, you can have more complex lambdas in C# that include statements such as return.

C++11: Too Much, Too Late

This brings me to my main disappointment with the new C++ standard that it was too slow in coming. It also shows all the signs of design by committee by trying to cater for every possible wish and whim but in the process being overly complex. You have heard of the saying too little, too late, but in this case I think it is a case of too much, too late.

Moreover, the new standard gives the impression that it is trying to play catchup with C#. I don't know if that is true. Maybe some of the new features were proposed for C++ a long time ago but managed to find their way into C# first because the C++ standardization process took so long.

Almost all the new features have been available in C# for many years: lambdas, auto keyword (var in C#), safer enums, initializer lists, threading support, etc.

It's surprising that regular expressions have not made it into the C++ or even C run-time library until now. After all regular expressions have been part of C and UNIX almost from the start, for example in the utility grep.
Regular expressions are another example. The regular expressions provided in C++11 are comprehensive, supporting almost all known implementations of regular expressions; just one type would have been sufficient. And again a standard regular expression library would have been so very useful to me if it had appeared sooner. Nowadays, many text files use XML with schema and regular expressions are less useful. (I will write about this in a blog post soon about: text files, .INI files, serialization, regular expressions, XML, etc.)

Conclusion

Lambda functions are an example of the trend towards making code easier to scan and understand by getting rid of superfluous stuff (in this case unnecessary identifiers). I talk more about this in my post from last year on Self Describing Code.

The main reason I like the addition of lambdas to C++11 is that they make using the STL so much easier in areas like creating custom predicates. Unfortunately, it has been 15 years since I first used the STL, so it could have happened a bit sooner!

If you have read my blog much, you may know that I like much of C#. Indeed, C# has had Lambda functions for years. However, the combination of the STL and lambdas is a great boon. [C# is lacking compared to the STL in simplicity and consistency esp. the hodge podge of containers. However, I do believe that .Net's LINQ is a great technology that actually builds on some of the ideas of the STL algorithms.]