Sunday 18 October 2020

Go Generics

There's been a lot written about generics in Go as they get closer to graduating into the language.  I just want to talk about a few important things that I've not seen discussed elsewhere.  In particular:

  • what parametric polymorphism actually is
  • why Go does not especially need it, but can benefit from it
  • C++ templates, and why Go doesn't need all that
  • latest proposed syntax (I'll update this post if/as it changes)
  • the importance of constraints
  • performance - of builds and generated code
  • how to try generics with the go2go tool or go2go playground
  • how generic channels may be a game changer

But first an introduction if you know nothing about it...

Parametric Polymorphism

As Rob Pike keeps reminding us, what is commonly called generics is technically parametric polymorphism. But even he admitted (when he talked at the Sydney Go Meetup last December) after saying "parametric polymorphism" many times, and stumbling on the words at least once, that generics is less of a mouthful. Personally I don't like the word either but it seems that generics is now the de facto, even official, name so I am going to stick with it. And, though not exactly the same, it is similar to what are often referred to as generics in other languages like C#, Java, etc, which I think is helpful for new Gophers.

So what is it? To simplify, there are two basic forms allowing you to specify type parameters either on a function or on a type. These are "compile time" not "run time" parameters as they evaluated and used by the compiler as it works.

Note that every programming language ever invented has generics built into the language.  For example, Fortran (the first high-level language) had these built-in:

  • arrays - a generic type where the type of the element(s) is the type parameter
  • min - a generic function where the type parameter is implied by the type of the parameter(s)
Similarly Go already has built-in generic types (arrays, maps, slices) and functions (eg copy() which can work with any type of slice). Generally, when we talk about generics in a language we are usually talking about the "user-defined" types and functions.  Go does not (yet) have this which is why you can't write a function that does what copy() does.

Function example

Here is a "user-defined" generic function in (the proposed new) Go, that prints the type of it's type parameter:

func f[T any]() {
    var v T // declare dummy variable of the type
    fmt.Printf("%T\n", v)
}

where the function f has one type parameter T [in square brackets] and no normal parameters (in round brackets).  T is followed by the constraint any which means that the type parameter can be of any type at all.  We can call the function like this:

func main() {
    f[int]() // type parameter T is int
}

Type example

Here is a generic type:

type Pair[T1, T2 any] struct { first T1; second T2 }

where the Pair type has two type parameters, T1 and T2, used for struct fields which may be of any (unrelated) types.

func main() {
    v := Pair[float64, string]{ 0, "abc"}
    point := Pair[int, int]{ 1, 2 }
    fmt.Printf("%v %v\n", v, point)
}

Does Go Need Generics?

This has been a big ongoing debate and I think that the Go community is finally leaning towards yes. People, who have used generics in other languages and have read about Go or only superficially tried it, are usually incredulous that Go does not have generics (myself included a few years ago).  However, it has managed to get by without them for a long time so let's look at how and why.

One rarely discussed reluctance for adding generics to Go is the concern that build times will be reduced. Some may scoff but part of the joy of Go is the quick builds. In 2009 Russ Cox gave a convincing argument but the go2go tool seems to be pretty quick (see benchmarks later).  My hope and firm belief is that when generics are added to the compiler proper, it will have negligible effect on compile times, at least for code that does not use generics.

Another argument for not adding generics to Go is that the inbuilt facilities are sufficient. I agree with this. Though enamoured with C++ templates, especially the STL, in all honesty 97% of what I used the STL for in C++ can be achieved just as nicely in Go using slices and maps.  I'd even say that I make much more use of the in-built generic maps in Go than I ever did with the analogous C++ containers (std::map/set/unordered_map/multimap, et al).

Another point is that the Go community is far more tolerant of duplicated code.  It's OK to have two functions with similar code - eg, one for ints and another for strings.  DRY is good but some repetition can be tolerated.

A final argument for not having generics is that you can use go:generate to run tools to convert some form of generic code into Go code. Sure you can do this but it is a bit tedious and "non-standard". I don't really buy this one.

“use of interface{} ...
affects performance
and type safety”

On balance this lead me (and I think the Go community) to conclude that we can probably get by without generics.  On the other hand, there has always been one thing that grates with me - the use of interface{} with containers and algorithms as the "poor mans" alternative to generics (for example, the linked list package from the Go standard library, which I have never heard of anyone liking or using). This affects performance due to boxing/unboxing (admittedly not always important) and type safety (always important unless you like Python or Javascript).

Generics will be useful for things like adding generic containers not already built into the language. But there is one killer reason Go should have generics - type-parameters on chans allow creation of some very useful facilities.  More on this soon!

C++ Templates vs Go Generics

Before we go further, I want to talk about C++ templates. If that's not what you want, then you can skip this section. It may be useful background to see what C++ templates have that Go generics don't have.

Contracts, constraints, concepts
Is it all a con?

Constraints allow the creator of a generic type/function to say what capabilities a type must have to be used as the type parameter. For example, a Min() function makes no sense for a type that has no inherent ordering.

I was a bit dubious about the point of constraints at first. Why not just let the compiler work out if a type is unsuitable when it goes to use it. For example, Min() would need to use the less than operator, so the compiler would be able to detect the problem when you try to take the minimum of two structs because structs can't be compared for order (in Go).

However, there are some good reasons to have constraints in Go (or any language) as I discuss below.

I should mention first the one thing that Go generics will have and C++ doesn't have (yet) is constraints, as they are called in the latest design draft - see Type Parameters - Draft Design. (Compilers implementing the new C++ 20 standard will have a similar thing called concepts.)

Constraints (and the functionally similar contracts in earlier draft designs) have been the major sticking point for implementing generics. I discuss the syntax and use of constraints in detail later.

Anyway, here's what C++ templates have that Go generics do not have:

Specialization

Sometimes you want a type parameter of a specific type to be handled differently.  For example, in C++ the STL provides the vector container (called vector<type T>).  There is a specialization for bool (called vector<bool>) that is primarily intended to save space by cramming the elements 8 to a byte. This can save lots of memory for large vectors over a non-specialised version that would use a byte per element.  However, vector<bool> is generally deprecated and rarely used.

C++ also supports "partial specialization" of a template that has more than one type parameter. Without getting into the details - some type parameter(s) are specialized with special code that keeps some type parameter(s) "generalized".

I don't believe we need these features in Go. Their usefulness is limited and it just adds complexity (and increases build times).

Member function templates

Originally C++ allowed type parameters on functions and types, but you could not add type parameters to methods of a class (called member function templates in C++ parlance). However, it was found necessary and added soon after.  In Go, these would be called generic methods and here is an example (not valid code):

type BoolStack []bool

func (s *BoolStack) PushEqual[T comparable](v1, v2 T) { // not valid
    *s = append(*s, v1 == v2)
}

I have occasionally found generic methods useful in C++ but only on certain large classes (which probably should have been avoided anyway). I'm not sure adding generic methods would be useful in Go. Moreover, there is a simple workaround - simply convert the method to a function like this.

func PushEqual[T comparable](s *BoolStack, v1, v2 T) {
    *s = append(*s, v1 == v2)
}

There is still the limitation with the workaround that a func cannot be used to allow a type to implement an interface the way that a method can.  We will have to see if Go needs generic methods.

Note that this is often confused with using a type parameter on a receiver of a generic type.  That is allowed and is particularly useful. For example:

type stack[T any] []T

func (s *stack[T]) Top() T { // OK: method receiver *can* be generic
    return (*s)[len(*s)-1]
}

Metaprogramming

C++ is famous for template metaprogramming -- there have been whole books written about it. To explain what it is, remember that when the C++ compiler processes templates it essentially generates internal code for the templates for each used "instantiation" of the type parameters.  With some bewildering (even by C++ standards) tricks the compiler can be induced to create all manner of code at compile-time.  Essentially you can write source code that generates source code (at compile-time) to become part of the resulting program.

What's the point?  Well it can be used different things such as what go:generate is used for. For example, many algorithms require pre-computed values or tables.  If you can't generate them during the build you could compute these values at program startup, but for efficiency (ie to avoid delays when the software starts running) it can be better to precompute the values and incorporate them into the code.  Before template meta-programming it was common in C++ to write code to generate header files that are then incorporated into the build. Template metaprogramming allows these sorts of things to be actually done by the compiler at compile-time.

There are some advantages to this, of course, such as the source code being self-contained.  However, it does complicate the compiler (and the code) with one consequence that builds become slower.  For Go I think we can safely say that nothing like this will ever be added to the language, where these things are typically done at run-time (where the increased startup time is unimportant) or if necessary can be performed as part of the build, eg using go:generate.

Non-type Parameters

As well as type parameters, C++ also supports non-type template parameters.  This allows a constant value to be used in the instantiation of a template. Why are they necessary? To see the point consider the declaration of an array in Go.  An array (as I said above) is a generic type built into the language.  In a declaration like this:

    var a [4]int

we are declaring the array with two (compile-time) parameters.  There is the type parameter (int) which says we want an array of integers. But there is also the non-type parameter (4) which says that the array has 4 elements.

In certain situations non-type compile-time parameters can be very useful in generic code (for example if arrays weren't built into Go). I would not be surprised if they were added to Go in the future - at least integer constant parameters.

One thing I would like to be able to create is a generic computational geometry package in Go.  For example, a Point type would take a generic integer parameter for the number of dimensions, so a lot of code could be shared between 2-D and 3-D (etc) points.

A workaround in Go, that could allow an integer generic parameter may be to use arrays, though you couldn't have an unlimited number of values as you would have to specify all the possible arrays sizes in the parameter's constraint.

type Sizer interface {
    type [0]int, [1]int, [2]int, [3]int // only used for size
}

[Ed 2021: the above is the old constraint syntax - here is what you need for Go 1.18+]

type Sizer interface {
    [0]int | [1]int | [2]int | [3]int // only used for size
}

You can try this now on the go2go playground. [Ed: try Go Playground]

Defaulted and Variadic Template Parameters

C++ also supports default values for template parameters, if they are left out by the user of the template.  It even supports variadic template parameters.  Go does not support these sorts of things for normal (non-generic function) parameters, so it is unlikely to ever have these features.

Go Generics Syntax

How exactly Go generics will work has been hugely debated especially after the 2019 Draft Design. It's been clear for awhile that generics will simply add type-parameters to functions and types (and not have any of the bells and whistles of C++ that I mentioned above). But there were still a few unresolved issues -- how to handle constraints plus the use of brackets and precise syntax.  I'll quickly explain the syntax then talk a bit more about the important subject of constraints.

Brackets

There has been too much debate about what sort of brackets are used to enclose type parameters - round (parentheses), angle, square, chevrons or guillemets (these are Unicode characters but that's OK since Go source code is UTF-8).  The obvious choice, and one I would have liked, would have been angle brackets (<>) but it turns out this creates crazy ambiguities in the language. Round brackets were tried for a while but there were also ambiguities (with simple but clunky workarounds).  Currently, perhaps finally, square brackets are being used and they work well enough.

A closely related change has been the removal of the need for the type keyword in a generic declaration.  So a generic function in the original Draft Design such as this:

func Print(type T)(s []T) {  // 2019 syntax

is now written like this:

func Print[T any](s []T) {   // current syntax

Note that the type keyword is gone and the round brackets have been replaced with square brackets.  You might also have spotted the addition of the word any - this is a type constraint.  Previously if there were no constraints on the types allowed for a type parameter then you could leave it out, but now you must provide a constraint, where any obviously means there's no constraint at all.

Speaking of constraints...

Constraints

Constraints allow you to say which types for a type parameter are suitable.  For example, in the Min[T]() function already discussed it should only be used with type parameter that supports the less than (<) operation.  How to handle constraints has probably been the most contentious part of adding generics to Go in the last year and has resulted in draft designs that (before this year) involved the contract keyword.  (I contributed to the debate but I won't go into that history now.)

I'll just tell you how constraints work, including a few gotchas, in a moment. First we need to understand why constraints are even needed (after all C++ got by without them for decades).

Having come from C++, I did not see the need for constraints, at first.  I knew it would make it easier for the compiler to give more meaningful error messages (rather than the monstrosities of C++), but there is a bit more to it than that.

Improved Error Messages

First, let's consider what the Min() function would look like without constraints:

func Min[T](v1, v2 T) T {  // NOTE: invalid (missing constraint)
    if v1 < v2 {
        return v1
    }
    return v2
}

Now if someone called this (hypothetical) generic function with bool arguments like this:

    Min(true, false)  // ERROR

they'd get an error message about operator < not being defined for bool, and you'd have to look at the source code for Min() to understand exactly what is happening.  This may not look too hard but with more complicated use of generics it becomes extremely difficult if not impossible for the compiler to produce easily decipherable error messages.

Now let's see how constraints help: because we are using less than (operator <) on the values of type T, we need to use the constraints.Ordered constraint:

func Min[T constraints.Ordered](v1, v2 T) T {  // OK
    if v1 < v2 {
        return v1
    }
    return v2
}

Now if you try to call Min() with bool parameters you get an error that tells you that you can't - ie an error message like: bool does not satisfy constraints.Ordered.

On the other hand if you don't constrain the type parameter correctly, eg:

func Min[T any](v1, v2 T) T {
    if v1 < v2 {  // ERROR: cannot compare v1 < v2 (op < not defined for T)
        return v1
    }
    return v2
}

you get an error since not all types support the less than operation.

Of course, you should ensure that you don't unnecessarily constrain a type parameter. So if you used constraints.Integer for the above Min() function's type parameters you'd be preventing Min() being called for float64 or string types.

The Other Benefit of Constraints

The other benefit of constraints, that I did not pick up on immediately, is that they act as a contract between the creator and the user of the generic function/type.  Without a constraint the creator of the generic function/type may update their code and inadvertently change the allowed types.  For example, if the creator of Min() made this code "improvement" (modifying our un-constrained Min() example from above):

func Min[T](v1, v2 T) T { // NOTE: currently invalid (missing constraint)
    if v1 - v2 < 0 {
        return v1
    }
    return v2
}

they may not realise they have made a change that will possibly break code of some clients.  Only when someone uses Min() with string parameters would the problem be detected.

    Min("abc", "xyz")  // ERROR: operator - not defined for string
    Min(1, 2)          // still OK

But using constraints, the creator will know right away if they have violated the "contract", since not all types of constraints.Orderered allow subtraction.

func Min[T constraints.Ordered](v1, v2 T) T {
    if v1 - v2 < 0 { // ERROR: invalid operation: (op - not defined for T)
        return v1
    }
    return v2
}

Without constraints the problem may not be found until some unsuspecting user updates the package and finds that their code won't build.

Creating Constraints

So how do you use constraints when you create a generic function or type? First, you need to decide on what capabilities are needed for a type parameter.  As we saw above the Min() function requires its type parameters to support the less than operation which is specified with a constraint of constraints.Ordered.

There are essentially three ways that you can constrain a type parameter, by specifying:

  1. specific method signature(s) that the type must implement
  2. explicit list of types (such as used in contraints.Ordered)
  3. built-in constraints (such as comparable) - determined by the compiler

1. Method Signature Constraints

Lets look at method signatures first as I think they are the reason for the latest change to use interface instead of contract. Currently in Go, a list of method signatures can be bundled together for the purpose of defining an interface type. A similar requirement for constraints is to restrict a type to only types that implement certain method signature(s).  Due to this similarity it was decided constraints can just recycle the syntax used for interface.

... it's important
to distinguish
the two different
uses of interface”

For example, say we want a generic function called Stringify() that requires its parameter to implement the String() method.  This constraint is specified using the following interface (which is already available from the standard library fmt package):

type Stringer interface {
    String() string
}

func Stringify[T Stringer](s T) string {
    return s.String()
}

Note that it's important to distinguish the two very different uses of interface that this change introduces:

  • a type for variables of any type that implement certain method(s)
  • new use of creating a generic constraint for a type parameter

I think of these as run-time and compile-time use of interfaces.

I have already seen questions in the Go issue tracker where this double duty causes confusion, so you need to be clear on how the new "compile-time" use of interfaces for generics differs from the original use.

There is also the further confusion that interfaces (since they are a type) can also be generic and these generic interfaces can even be used a constraints on other generic types. I'm not going to give an example as, to be honest, I am not fully cognizant myself. BTW I believe that my earlier proposal for constraints is simpler and less confusing (see Example Types as Contracts).

2. List of Types

To allow the interface syntax to serve double duty as type parameter constraints it has been extended to allow a list of types to be included in an interface. For example, if a generic function only works for unsigned integers then you could create a constraint like this:

type UnsignedInteger interface {
    //type uint, uint8, uint16, uint32, uint64
    // Ed 2021: the above syntax is out of date, use this now:
    uint | uint8 | uint16 | uint32 | uint64
}

func MinValue[T UnsignedInteger]() T { // min value for unsigned ints
    return 0
}

It's important to note that this interface syntax can only be used for a constraint.  An interface that includes a list of types cannot be used for the original "run-time" duty of interfaces like declaring an interface variable.  (Ian Lance Taylor has a further proposal to allow this.)

You can, of course, combine a list of types plus method signatures in the same interface.  Here is a constraint that only allows unsigned integers that have a String() method:

type Unsigned interface {
    uint | uint8 | uint16 | uint32 | uint64
    String() string
}

Looking back at the definition of our Min() function you might now guess where the constraints.Ordered constraint comes from.  It's defined in the constraints package, which undoubtedly will be added to the Go standard library once the language supports generics.  It is simply defined as a list of types that support comparison/ordering operations (==, <, >=, etc):

type Ordered interface {
    //type int, int8, int16, int32, int64,
    //     uint, uint8, uint16, uint32, uint64, uintptr,
    //     float32, float64,
    //     string
    // Ed 2021: the above syntax is out of date, use this now:
    int | int8 | int16 | int32 | int64 |
    uint | uint8 | uint16 | uint32 | uint64 |
    float32 | float64 | string
}

Most of the time you won't need to define your own constraints but use those included in constraints package (or built in to the language). In fact it's more robust to do so in case the language changes (for example more types might be added to constraints.Ordered in the future).

3. Built-in Constraints

Finally, there are some constraints that can't be handled by a simple list of types.  The only one (apart from any) that I am aware of is the comparable constraint, which just determines if two objects of a certain type can be compared for equality (ie support == and != operations). This is not a simple rule - for example some types of struct can be compared but others can't.

I think I talked about the rules for comparability of types before, but I'll explain it again in brief.  Some basic types are not comparable, for reasons we won't go into (except to nil) - these are slices, maps and funcs.  Further, if you create an array of a non-comparable type, or a struct that has one of these types as a field, or an interface that stores one of these types then that type is also non-comparable (though in the case of interfaces this cannot be determined at compile time and will cause a run-time panic).

Anyway, the simple rule is that if values of your generic type-parameter need to be compared for equality then you need to use the comparable constraint.

Trying Go Generics

You can try Go generics using the latest syntax very easily using either the go2go playground.  You can also build the dev branch of the Go compiler source then use the included go2go tool - see go2go README for instructions.

The go2go tool translates your code quickly and the code runs fast.  There are a few little deficiencies that I have found such as lack of support for external tests, build tags and unsafe.Sizeof but overall it works very well.

Conclusion

I can see some great packages being built with Go generics in the future. Especially important, I think, will be generic channels as this is something not offered in any other language and I think will greatly enhance the possibilities for concurrency.

I've also written a completely new generic set container called rangeset that implements sets (constrained to integers) using ranges.  This has some very interesting performance capabilities for some types of sets and also allows inversion and universal sets (something not supported by the example go2go set package). I'll talk about this in a post soon.

I was also going to say a few more things that I haven't had time for. I'll try to flesh out my notes and publish another post in the next few weeks.  This will include:

  • some benchmarks of generic code
  • + compilation times of generic code using the go2go tool
  • my experiments with generic channels