Sunday, 5 November 2017

Overview


Here are links to all of my posts so far, loosely categorized.

Note that there are a few bonus links to my Code Project articles - marked with [CP]


Software Design

Design Principles

Handling Software Design Complexity - what software design all boils down to
DIRE - an obvious thing we often forget
Developer Quality Attributes - or why fixing bugs is not important
Verifiability - software is useless unless you can verify its correctness
Why Good Programs Go Bad - risk avoidance causes software to "rust"
Book Review: 97 Things Every Architect Should Know

Design Practices

Fundamentals of Software Design - 8 ways to create a good design
Agile Design - how emergent design almost always works better than BDUF
Inversion of Control - IOC is a technique for better decoupling using DIRE
Dependency Injection - an example of IOC

Anti-Patterns

Gas Factory Anti-Pattern - a mistake even (or especially) good designers make
Reusability Futility - "Simplicity before Generality, Use before Reuse"
Shotgun Initialization - an example of the dangers of defensive programming
Layer Anti-Pattern - the problems of a common, obvious approach
Ignore Divide By Zero - commonly taught practices can be wrong
Defensive Programming - it can hide bugs

Agile

Principles

Agile's Fifth Element - favor simple design over re-usability and generality
JIT (Just In Time) - an example of DIRE that is core to much of Agile
DIRE (Don't Isolate Related Entities) - how you divide and conquer is the key
Agile Design - evolving software one small step at a time
Agile and Code Reuse - all about YAGNI (you ain't gonna need it)
Software Quality Assurance & Agile - how Agile evolved from, but is different to, SQA
Lean is not Agile - applying "eliminate waste" to software design leads to BDUF
Software Development Methodologies [CP] - Agile and other methodologies by analogy
Scrum is like Communism? - maybe it is, but that does not mean it is doomed to failure!

Team

Agile Roles - when Agile is done properly all roles do change (some even disappear)
Scrum Team Size - teams should be small to avoid social loafing and other phenomena
Scrum Team Composition - "feature" teams are the key
Collaboration - traditional development discourages collaboration + why Scrum works
Production Line Mentality - the reason teams don't collaborate

Making Agile Work

Scrum Standup - it's more about visibility than communication
Developer Quality Attributes - what benefits developers eventually helps users
Agile Version Control - Agile requires the right version control practices & software (Git)
Scrum Problems - management "buy-in" & other things that help Scrum work properly
Why Scrum Fails - intransigence, non-collaboration, etc
Written vs Verbal - when, who, why, and how of Agile documentation
JIT Testing - testing as you go (continuous testing) is an example of JIT (Just In Time)
Vendor Mentality - another "mentality" you need to overcome
Customer Management - the customer is not always right

Unit Tests

Change - how Unit Tests help you to embrace change
What's so great about Unit Tests - Unit Tests are not about finding bugs
White Box Testing - the best Unit Tests use "good" white box testing
Personal Experiences with Unit Testing - it took me 20 years to truly appreciate them
Challenges - why getting started with Unit Tests seems, but is not, insurmountable
Unit Tests Best Practice - a few things to avoid
Arguments Against Unit Tests - common arguments and why most are invalid
Summary - Unit Tests concisely summarized


Coding

Essentials

Zero - bugs are less likely if you don't treat zero as a special case
Asymmetric Bounds - in code and GUI design this is an important way to avoid bugs
Book Review: Clean Code - a great book on creating the best code

Go (golang) Coding

The Essence of Go - reading about Go you don't realize how nice it is to use
Improving Go Error Handling - how to fix one part of Go that is not so nice
Active Object - a powerful concurrency pattern that Go makes easy

C Coding

Best Practice in C for Modules - strong-coupling and other things to avoid
Defensive Programming - how it works and how it can hide bugs
Shotgun Initialization - a defensive programming practice to avoid
Alignment and #pragma pack - make structs "alignment agnostic" to avoid surprises
Making Code Testable - coding for testability improves correctness, reliability, etc
Ten Fallacies of Good C Code [CP] - 10 more things to avoid

C++ Coding

STL's Dark Secret - vectors are slower than they should be
Iterators Through the Looking Glass - subtleties of the STL reverse iterators
C++11 and Lambda Functions - lambda functions make STL so much better
Nested Functions using Lambdas - you can finally have nested functions in C++11

C# Coding

Why I Like C# - lots of nice things (and it's not really that much slower than C)
Overflow Checking using checked/unchecked [CP] - C# has some cool features
Nested Functions using Lambdas - includes an example of using C# lambdas

Maintainability

Long Identifiers make Code Unreadable - don't try to put too much info. into a name
Self Describing Code - why it's a bad idea and why you should comment your code

Other

The Phillips Scale of Code Quality - how good is your code?
Version Control - Personal Experiences - hands on version control

Active Object in Go

In Go, it is very easy to accidentally access the same data from different go-routines creating a race condition. Conventionally, you avoid this sort of problem with a mutex; and, in fact, you can easily do this in Go using sync.Mutex.

A way that is often better, and preferred in Go, is to simply avoid accessing the data from different go-routines at the same time. One way is to send messages (through a channel) to a confining go-routine responsible for all access and control of the data. By confining all access to a single go-routine no locks are required.

You can also confine use of a value by only using it within one go-routine at a time. This is idiomatically done in Go by transferring control of the variable through a channel, but I won't discuss that here as there are plenty of other articles about it. On another note, there are "low level" ways to do "lock-free" concurrency, using atomic operations, but that will have to wait for another time.


Avoiding Locks
“the main advantage
is that it simplifies
writing the code”

So what is the advantage of avoiding locks? Well, a great deal has been written about that (do a search on Google :), some of it misleading. In my opinion the main advantage is that it simplifies writing the code, by not getting side-tracked with other issues like race conditions. Using mutexes, in complex scenarios, is notorious for retaining subtle race conditions, or potential deadlocks or just performance problems such as unintentionally holding a lock while blocking on I/O.

Lock contention is often cited as a major advantage of avoiding locks but that is not really the issue. After all, using a confining go-routine (as described above) replaces lock contentions with contention for use of the go-routine. In fact proper use of locks is often more efficient than channels; it's just that it usually involves convoluted and/or repetitive code. For example, this is a typical scenario using locks:
  1. lock
  2. check "something"
  3. unlock
  4. do a lengthy operation based on knowledge obtained at 2
  5. lock
  6. check "something" again (repeating test at 2)
  7. update data using results at 4 (if "something" hasn't changed)
  8. unlock
However, there are some performance advantages. First, if many threads block on mutex(es) then thread-switching overhead becomes important. Even though the time for thread-switching is only measured in microseconds, if you have thousands of threads it can all add up. Of course, using go-routines lessens this effect, since Go multiplexes them onto a small numbers of threads, but it may still be significant.

Further, I think Go tries to run go-routines on the same core each time, which means that a confining go-routine may be better at maintaining the CPU cache which could have large performance benefits.


Active Object

A useful refinement of the confining go-routine is something that goes by many names but possibly the most common is the Active Object Concurrency Pattern. It is often used in server software involving a lot of simultaneous connections where the overhead of using a thread for every connection is too onerous. I first encountered this with Boost ASIO - the excellent C++ asynchronous I/O library. (Thanks Leon for introducing me to this and explaining it.)

However the code for Boost ASIO is complex, since it needs to create its own light-weight "co-routines" (called strands) to multiplex use of threads.  I wanted to do something similar in Go and I was amazed to find no advice on how to do this. It should be much simpler since Go provides all the requisite parts: go-routines (rather like strands), and channels of closures.

Active Object in Go can be implemented by a go-routine that reads closures from a channel (chan func() ) and executes them. This simple system means that all the closures, containing the code that accesses the data, are run on the same go-routine in the order they are posted to the channel. I guess the best way for you to understand this is with an example.

My example uses the quintypical example of concurrent coding - the bank account. First, we look at a race condition and how to fix it with a mutex, then using an Active Object. Of course, there are a few complications and things to be aware which I also explain and demonstrate in the code below.

Note that later code examples make heavy use of anonymous functions (closures), even nesting them. If you are unfamiliar with how they work you may need to read up on them first.


Race Example

Here is the code for a poorly-featured bank account that only allows deposits. Note that I could have written the Deposit() function more simply in one line (ac.bal += amt), but the code below is designed to trigger the race condition, which is there anyway, but the delay caused by the sleep should expose it. (This is one of the biggest problems with race conditions - they may be lurking but invisible - which is why you should get into the habit of using the Go Race Detector.)


type (
  Money int64 // cents
  Account struct {
    bal Money
  }
)

const Dollars Money = 100  // 100 cents to the dollar

// NewAccount creates a new account with bonus $100.
func NewAccount() *Account {
  return &Account{bal: 100 * Dollars}
}

// Deposit adds money to an account.
func (ac *Account) Deposit(amt Money) {
  current := ac.bal
  time.Sleep(1*time.Millisecond)
  ac.bal = current + amt
}

func (ac Account) Balance() Money { return ac.bal }


Now let's do a few concurrent deposits. Note that I tested all this code in a single (main) package. If you want to try it you could move all the "account" code to a separate package (eg bank), but then you need to call the function to create a new account as bank.NewAccount().


  ac := NewAccount()
  go ac.Deposit(1000 * Dollars)
  go ac.Deposit(200 * Dollars)
  time.Sleep(100*time.Millisecond)
  fmt.Printf("Balance: $%2.2f\n", ac.Balance()/100.0)


If you run the above code you will be disappointed to find that one of the deposits has gone missing. The deposits are run on separate go routines causing a race condition on bal.


Mutex Example

Luckily, this is easily fixed using a mutex to protect concurrent access to bal. We add a mutex to every account since if we had just one mutex for all accounts that would create a contention problem if many accounts were being updated at the same time.


type (
  Account struct {
    bal Money
    mutex sync.Mutex
  }
)

// Deposit adds money to an account.
func (ac *Account) Deposit(amt Money) {
  ac.mutex.Lock()
  defer ac.mutext.Unlock()

  current := ac.bal
  time.Sleep(1*time.Millisecond)
  ac.bal = current + amt
}

// Balance returns funds available.
func (ac *Account) Balance() Money {
  ac.mutex.Lock()
  defer ac.mutext.Unlock()

  return ac.bal
}


Note that Balance() now takes a pointer receiver, otherwise we would only be locking a copy of the mutex. We have to lock the mutex in the Balance() function even though it only reads from the value since there can be concurrent write operations. (If there are lots of reads and very few writes then a sync.RWMutex may be better than a sync.Mutex but that is another story.)


Active Object Example

OK that avoids the race condition by using a mutex, but how do we do this using the Active Object pattern? First, instead of a mutex we use a channel of functions. We also need to start a go-routine for each account in the NewAccount() function, which reads from the channel and runs the functions. Finally, instead of updating ac.bal directly in the Deposit() function we wrap the code in a closure (lambda function) and post this closure onto the account channel so that the account's go-routine will process it when it gets a chance.


type (
  Account struct {
    bal Money
    ch chan<- func()
  }
)

func NewAccount() *Account {
  ch := make(chan func())
  go func() {
    for f := range ch { f() }
  }()
  return &Account{bal: 100*Dollars, ch: ch}
}

// Deposit adds money to an account.
func (ac *Account) Deposit(amt Money) {
  ac.ch <- func() {
    current := ac.bal
    time.Sleep(1*time.Millisecond)
    ac.bal = current + amt
  }
}


Note that the unnamed function created in Deposit() and posted onto the account's channel is a closure (or lambda). Closures have the useful ability to capture variables from their enclosing scope (in this case ac.bal and amt).

Moreover if you make ch into a buffered channel, then the account can handle multiple concurrent deposits without ever blocking the calls to Deposit(). This means that transient spikes in activity on the account will be handled smoothly. Of course, a sustained onslaught of deposits, sent faster than they can be processed will eventually cause blocking when the channel buffer becomes full.


Returning Values

You may have noticed that the above code does not include a Balance() method. Before showing the code for Balance(), I need to explain how to "return" a value; because the closures are invoked asynchronously you can't simply use a function that returns a value. Even for methods that only update something we may want to return an error to indicate that something went wrong.

So how do we do it? We simply pass in a callback function (probably a closure) that is called when the operation completes (or fails with an error).

In the following code I have implemented the Balance() method but I have also replaced the Deposit() method with Add() since we are going to use it for withdrawals (allowing for negative amounts) too. Withdrawals may generate an error if there are insufficient funds in the account, so we pass a callback which can "return" an error.


// Adds transfers money to/from an account.
func (ac *Account) Add(amt Money, callback func(error)) {
  ac.ch <- func() {
    if ac.bal + amt < 0 {
      callback(fmt.Errorf("insuff. funds %v for w/d %v",
                          ac.bal, amt))
      return
    }
    ac.bal += amt
    callback(nil)   // successful transfer
  }
}

// Balance provides funds available.
func (ac *Account) Balance(callback func(Money)) {
  ac.ch <- func() {
    callback(ac.bal)
  }
}


Now here is some code that makes two deposits and attempts a very large withdrawal. Notice that for the deposits we provide a callback (closure) that does nothing - passing a +ve amount means the operation cannot fail so we ignore the possibility of an error. For the withdrawal, we check if there was an error and just print it out.


  ac := NewAccount()
  ac.Add(1000 * Dollars, func(error) {} )
  ac.Add(200 * Dollars, func(error) {} )
  ac.Add(-1e6 * Dollars, func(err error) {
    if err != nil { fmt.Println(err) }
  })
  ac.Balance(func(bal Money) {
    fmt.Printf("Balance: $%v\n", bal/100)
  })
  time.Sleep(100*time.Millisecond)


The first thing you may have noticed is that we don't have the keyword go before the call to ac.Add(), as we did above for ac.Deposit(). This is not necessary as most of the Add() function's code has been made asynchronous anyway. That is, the actual work is done in a closure posted onto the account's channel (for execution by the account's go-routine) allowing Add() to return almost immediately.

Notice also the call to Sleep() in the final line of code which is simply there to prevent the program exiting immediately. If you run the above in a main() function you may not see any messages. When main() returns the program ends and all active go-routines are silently terminated.  So the calls to Println(), executed in the account's go-routine may not get a chance to execute. Later I will look at how to wait for all pending operations on an account to complete.

A crucial thing to remember here is that the callbacks are run on the account's go-routine. This is important to keep this in mind since it is very easy to access a variable from the caller's go-routine in the callback. . If you need to send information back to the posting go-routine the callback can post to another Active Object channel as we will see.
“if you perform
a lengthy operation
... in the callback ...
delay other operations
… even cause deadlock”

Another important thing to remember is that if you perform a lengthy operation, or an operation that may block, in the callback then you will delay other operations on the account or even cause deadlock. In the example code above, I only call fmt.Printf() inside the callback but even that may be too much for a server that is handling hundreds of thousands of requests per second.


Transfers between Accounts

We have basic account features but more advanced features can introduce pitfalls to be aware of.  Here is the code for a method to transfer funds between accounts.


// WARNING: This code has problems

// TransferTo moves funds between accounts.
func (ac *Account) TransferTo(to *Account, amt Money,  
                              callback func(error)) {
   ac.ch <- func() {
    if amt > ac.bal {
      callback(fmt.Errorf("Insuff. funds %v for tfr %v",
                          ac.bal, amt))
      return
    }
    ac.bal -= amt
    to.Add(amt, callback)
  }
}


To understand what is happening here you need to remember that each account has its own go-routine. The code inside the above closure is executed on the "from" account's go-routine, the end of which calls to.Add() which posts to the "to" account's go-routine. The third parameter (callback) to TransferTo() is effectively a pointer to a function that is "captured" in the closure and passed on to to.Add() whence it is again captured and called to process the result of the final Add().

However, there are two problems with this code. First, you should not be able to transfer out more funds than are available (ie we need to check that -amt <= to.bal). The second problem is due to possible deadlocks - eg if two transfers are done simultaneously in opposite directions then each account may block the other - but we'll address that problem later.

How would we fix the first problem? My first thought was something like this:


// WARNING: This code has worse problems

// TransferTo moves funds between accounts.
func (ac *Account) TransferTo(to *Account, amt Money,  
                              callback func(error)) {
   ac.ch <- func() {
    if amt > ac.bal {
      callback(fmt.Errorf("Insuff. funds %v for tfr %v",
                          ac.bal, amt))
      return
    } else if amt < 0 && -amt > to.bal {
      callback(fmt.Errorf("Insuff. funds %v for tfr %v",
                          to.bal, -amt))
      return
    }
    ac.bal -= amt
    to.Add(amt, callback)
  }
}


Can you see a problem here? If not, think about which go-routine is used to run the above code.  All the code inside the closure (including the new code in red) runs on the ac account's go-routine, but it accesses to.bal, access to which should be confined to the to account's go-routine. (Remember that each account has it's own go-routine which is the only place where that account's bal should be used.)


// WARNING: There is still a problem

// TransferTo moves funds between accounts.
func (ac *Account) TransferTo(to *Account, amt Money,  
                              callback func(error)) {
   ac.ch <- func() {
    if amt < 0 {
      to.TransferTo(ac, -amt, callback)
      return
    }
    if amt > ac.bal {
      callback(fmt.Errorf("Insuff. funds %v for tfr %v",
                          ac.bal, amt))
      return
    }
    ac.bal -= amt
    to.Add(amt, callback)
  }
}


This fixes the problem with accessing to.bal on the wrong go-routine but as I mentioned before there is also a deadlock problem.


Deadlock

The first thing to note is that a channel in go has a fixed size; this means that any call to Add(), Balance() or Transfer() will block if the channel is full. If other concurrent requests can be posted to the accounts then the "to" account may be blocked waiting for the ac.Transfer() request to be posted which then blocks the "ac" account in the call to to.Add(). This causes a mutual deadlock between the two accounts.  A simpler scenario is where an account posts to its own channel causing itself to deadlock.

A solution to these problems is just to fire up another go routine like this.


// TransferTo moves funds between accounts.
func (ac *Account) TransferTo(to *Account, amt Money,  
                              callback func(error)) {
   ac.ch <- func() {
    if amt < 0 {
      go to.TransferTo(ac, -amt, callback)
      return
    }
    if amt > ac.bal {
      callback(fmt.Errorf("Insuff. funds %v for tfr %v",
                          ac.bal, amt))
      return
    }
    ac.bal -= amt
    go to.Add(amt, callback)
  }
}


This prevents TransferTo() from blocking and avoids the deadlock. The disadvantage is that we have no guarantees about when the request will be posted if the account channel is overloaded. In this case it may mean there is a delay between the "ac" account being debited and the "to" account being credited. In this example it is not a problem since the funds will eventually be transferred.

A solution that preserves the order of posted requests is to have two channels: one for external requests and one for priority requests (see priChan below) posted by an account to itself or to another account.


type (
  Account struct {
    bal Money
    pubChan, priChan chan<- func()
  }
)

func NewAccount() *Account {
  pub := make(chan func(), 2)
  pri := make(chan func(), 20)  // "private" chan

  go func() {
    for {
      if len(pri) > 0 {
        f := <- pri
        f()
      } else {
        select {
        case f := <- pri:
          f()
        case f := <- pub:
          f()
        }
      }
  }()
  return &Account{
    bal: 100*Dollars, 
    pubChan: pub, 
    priChan: pri,
  }
}


The idea is not to have any "circular" posts - that is any closures posted to a channel should never end up posting back to the same channel. In this way deadlock is not possible.


Conclusion

I hope I have demonstrated how easy it is to use the Active Object Concurrency Pattern in Go. As long as you understand how it works and are aware of the pitfalls it provides a simpler, and possibly more efficient, solution than using mutexes.

One pitfall is that, even though there is no visible locking, it is easy to create a deadlock if an Active Object's method posts (directly or indirectly) back to it's own channel, since channels have a fixed size once created. But this can be avoided as discussed above.

One thing that is very easy to do in Go is accidentally access a confined variable from the wrong go-routine. In another language like C it would be easy (though not portable) to use an assertion to verify that code is running on the right thread. Unfortunately, Go does not provide any identifier for go-routines (for arguably good reasons), but this hinders attempts to ensure that the code behaves correctly. Luckily there are (deprecated) ways to determine an go-routine's "indentity" which I will explore and keep you informed.

Also I have not explored how to wait for pending asynchronous operations to complete as I promised above. This post is long enough so we will look at that next time.

Sunday, 22 October 2017

Improving Go Error Handling

Last time I mentioned that I had a way to improve error-handling in Go but I didn't get into the details. The idea is to use the compiler to eliminate all the boilerplate error-handling code but without the problems of full-blown templates.

But first, let's try to understand why Go is designed as it is. An obvious, and major influence was experience with C so I will look at my experience with error-handling in C. (If you are not familiar with C then you may be inclined to skip the following section but please at least skim it.)

C Error-Handling

C (and C++)...
are less forgiving

Generally, error-handling in C is poor to non-existent. This has probably caused more aggravation for users than any other software deficiency of the last few decades. That's not to say that C programmers are of a lower standard (probably the contrary :), just that most of the commonly used software was written in C (and C++) and these languages are less forgiving if you do not know what you are doing.

Here I share some of my experience with C, which is typical.

I started programming in C in the early 1980's and one of the biggest tediums (tedia ?) was having to write masses of error-handling code, and making sure the code worked. As an example, consider this code to copy a file. (I know there are simpler ways to copy a file but this sort of code was like a lot of C code I was writing at the time.)

bool copy_file(const char *in_name, const char *out_name) {
  FILE *fin, *fout;
  const size_t BUF_SIZE = 1024;
  char * buf;
  size_t count;

if ((fin = fopen(in_name, "r")) == NULL)
  {
return false;
}
  if ((fout = fopen(out_name, "w")) == NULL)
  {
  fclose(fin);
return false;
  }
  if ((buf = malloc(BUF_SIZE)) == NULL)
  {
fclose(fin);
fclose(fout);
return false;
}

  for (;;)
{
  if ((count = fread(buf, 1, BUF_SIZE, fin)) < BUF_SIZE)
{                                // WARNING: see below
  if (feof(fin))
break;

  fclose(fin);
  fclose(fout);
  free(buf);

  return false;
  }

  if (fwrite(buf, 1, count, fout) < count)
    {
fclose(fin);
fclose(fout);
free(buf);
return false;
  }
  }

  fclose(fin);
  fclose(fout);
  free(buf);

  return true;
}
Listing 1. C function to copy a file.

“most C
error-handling code
is never tested”

An obvious problem with Listing 1 is that the large amount of error-handling tends to obscure the essence of the code.  (Compare it with the same code without error-handling, in Listing 3 below.)  This sort of code is hard to write, hard to read and hard to modify.  Further, it is difficult to verify that error-handling code is correct; in fact most C error-handling code is never tested and causes all sorts of chaos in the field when the actual errors do occur.

WARNING: I should point out that Listing 1 (and Listing 2), despite attempts at thorough error-handling has at least two problems.  Can you spot them? Furthermore, production code should have more informative error handling - such as trying to diagnose, and inform the user why different errors occurred. For example, if the input file could not be opened was that because it did not exist, was not accessible or some other reason?

Part of the complexity of Listing 1 is due to all the fclose/free/return statements in the error-handling which are repetitive and error-prone (remember DRY). It would be quite easy to forget a call to free()  and cause a memory leak. In fact the code I would typically write (if the coding standards in effect allowed use of goto :) would be more like this:

bool copy_file(const char *in_name, const char *out_name)
{
  bool retval = false;
  FILE *fin = NULL, *fout = NULL;
  const size_t BUF_SIZE = 1024;
  char * buf = NULL;
  size_t count;

  if ((fin = fopen(in_name, "r")) == NULL)
  goto handle_error;
  if ((fout = fopen(out_name, "w")) == NULL)
  goto handle_error;
  if ((buf = malloc(BUF_SIZE)) == NULL)
goto handle_error;

  for (;;)
  {
  if ((count = fread(buf, 1, BUF_SIZE, fin)) < BUF_SIZE)
  {
  if (feof(fin))
  break;
  else
goto handle_error;
  }
  if (fwrite(buf, 1, count, fout) < count)
  goto handle_error;
  }
  retval = true;  // indicate success

handle_error:
  if (fin != NULL) fclose(fin);
  if (fout != NULL) fclose(fout);
  if (buf != NULL) free(buf);

  return retval;
}
Listing 2. Using goto to avoid repeated code.

Note that the Go language neatly addresses this problem with the defer statement as I mention later.

Now look at the same function (Listing 3) without any error-handling code.

void copy_file(const char *in_name, const char *out_name)
{
  FILE *fin = fopen(in_name, "r");
  FILE *fout = fopen(out_name, "w");
  const size_t BUF_SIZE = 1024;
  char *buf = malloc(BUF_SIZE);
  size_t count;

  while ((count = fread(buf, 1, BUF_SIZE, fin)) > 0)
  fwrite(buf, 1, count, fout);

  free(buf);
  fclose(fout);
  fclose(fin);
}
Listing 3. No error handling.

This is plainly much simpler than the previous versions (and fixes the major bug). This is why many examples you see in textbooks omit the error-handling code to make it easier to understand.

Unfortunately, a lot of production code is actually written like this!  Moreover, even more (95% or more) has inadequate error-handling to some extent.  Why is that?

  1. Blind following of example code from textbooks, as I just mentioned.
  2. Lack of awareness that special values indicate errors, due to C's use of "in-band" signalling. I discuss how Go addresses this below.
  3. Lack of awareness that some functions even return errors. For example, in Listing 1, the final fclose(fout) may return an error. (If buffered data can't be written to disk for some reason then fclose() will return an error.).  Unfortunately, Go does not really address this problem.
  4. The attitude (laziness?) of many C programmers. For example, many security threats have been caused by buffer overflow problems. Most C programmers are aware of the dangers of strcpy() but neglect using safer functions like strncpy().
  5. Poor reasoning. Sometimes you can ignore errors but often there may be subtleties of which you are unaware. For example, I have seen a lot of software that assumes you can write a file in current directory which is not always true
  6. Code changes made without full understanding of the existing code.
  7. Finally, occasionally, despite the best of intentions, errors are ignored by accident or oversight.

How Go Improves on C

Go has several facilities, such as the error type and multiple return values, that make error-handling simpler and safer than in C. (To be honest the standard C library's error-handling strategy, especially use of the global errno variable, could not be much worse.)

The error type

Go eschews the common C error-handling pattern of "in-band signalling". That is, in C, when a function returns a value of a certain type it reserves a special value of that type to indicate a failure. When returning an integer, -1 is often used, or when returning a pointer, NULL is used. You can see this above in the calls to fopen() and malloc() which can return NULL pointers. A different example in the above code is fwrite(), which indicates an error by returning a written count less than the requested count.

The are a few problems with in-band signaling:

  1. sometimes there is no spare value that can be used as the error value
  2. you often want to return more information than just that something went wrong
  3. it is easy to ignore the error return value and continue blithely
  4. it may not even occur to the uninitiated that there is a special error return value

An example of the problem (1) can be seen in the code above - fread() will return zero on error to indicate that nothing could be read, but zero is also returned when you attempt to read at the end of file, which, in general, is not an error condition. One way this is handled in the C standard library is to check errno after the call (remembering to ensure it is zero before the call); in the case of fread() you need to do a further call to ferror() or feof() to distinguish between an error and reading at EOF.

(2) The C standard library uses the global errno variable to indicate more about the nature of the error.  However, this is a poor strategy which has been discussed at length elsewhere.

(3) and (4) are common in C and the cause of countless bugs.

Go addresses these problems using the error type (and allowing functions to have multiple return values - see below). A function that may encounter an error returns a value of type error as well as its normal return value(s). If the error value returned is not nil then it indicates there was an error and the value can be further inspected to determine the exact nature of the problem.

Defer

Although it is rarely mentioned when talking about error-handling in Go, understanding how to use the defer statement is critical.

First using defer avoids lots of repetitive cleanup code (as seen in Listing 1 above).  Moreover, I believe it greatly reduces the chances of accidentally forgetting cleanup code, and makes it easier to visually inspect code to check that cleanup is done.  I have seen countless bugs in C code (and even created a few myself early on in my career) where some resource is allocated (eg file opened, resource handle allocated, mutex locked, etc) but then never released/closed/freed/unlocked causing a resource leak or worse.

In C it is easy to forget to free something because the allocate and free functions are necessarily called at different places (classic example of the DIRE principle). The problem is also often due to complex control flows or later code changes such as someone adding an early return from a function. Using defer in Go to free a resource immediately after it has been (successfully) allocated very neatly avoids these problems.

Though the creators of Go may deny it, I think that defer is inspired by C++ destructors (which inspired with/using statements of other languages).  As well as being called in normal return circumstances, destructors are called when an exception is thrown in C++ (during stack-unwinding); similarly defer statements are called when the code panics.

The addition of defer to Go is especially useful for error-handling but there are a few pitfalls for the unwary:

• Deferred functions are called at the end of the function not the enclosing block.
          (I expected behavior like C++ destructors which are called at the end of the block.)
• Only defer freeing of a resource after checking that it was successfully allocated.
• It’s common to see a deferred Close such as this:

if file, err = os.Open(fileName); err != nil { return err }
defer file.Close()


The problem with this code is that it ignores the error-return value from Close(). Generally, it should be written like this:

if file, err := os.Open(fileName); err != nil { return err }
defer func() {
if err = file.Close(); err != nil { return err }
}


Multiple Return Values

A function in Go can return more than one value. I believe that a reason, probably the main reason, this was added to the language is to allow a function to return a value and an error condition without resorting to C's in-band signalling due to the problems discussed above (especially problem 3).

Go forces you to explicitly say you are ignoring an error by using the blank identifier (a single underscore).  For example:

i,  _  := strconv.Atoi(str)

which converts a string into an integer. In this case if there is an error during the conversion then it is ignored, and i retains the zero value. However, there are some problems with this system.

First, even if you know an error will not occur, you cannot use the return value in an expression.  I often want to use the return value of strconv.Atoi() in an expression knowing that the string represents a valid integer (or just being happy with a zero value, if not).  It is tedious and error-prone to have to assign the return value to a temporary variable, which is why I usually wrap Atoi() in my own function which returns a single value.

A bigger problem is that you can ignore easily ignore an error return value when you are not interested in any of the the returned values. It is all too common to see Go code that ignores the error return value of functions like os.file.Close().

For example, this generates a compile error:

file := os.Open(fileName)   // compile error: multiple-value in single-value context

If you know the error will not occur, in a particular circumstance, then you can explicitly state that you do not want to use the error-return value like this:
when a function
only returns
an error in Go, it is
too easy to ignore it

file, _  := os.Open(fileName)

However, you can call Close() like this:

file.Close()         // compiles OK!!

whereas having to do something like this would be preferable (if we know that Close() cannot return an error):

_ = file.Close()   // ignore error from Close()


A Go Example

As an example, of how Go error-handling compares with C here is the same function as in Listing 1 but written in Go (or a language like Go), using defer and multiple return values.  Note that this is not real Go code as the standard file handling functions are different and Go has no need for anything like malloc().

func copy_file(in_name string, out_name string) error
{
  var err error
  var fin, fout *file
  if fin, err = fopen(in_name, "r"); err != nil {
  return err
  }
  defer fclose(fin)
  if fout, err = fopen(out_name, "w"); err != nil {
return err
}
  defer fclose(fout)

  var buf []byte
  if buf, err = malloc(1024); err != nil {
  return err
  }
  defer free(buf)

  for {
  if eof, err := fread(buf, fin); err != nil {
  return err
  }
    if eof {
  break
  }
  if err = fwrite(buf, fout); err != nil {
  return err
  }
  }
  return nil
}
Listing 4. Error handling in a "GO" like language

Comparing Listing 4 with Listing 1 you can see that there are some improvements, but it is still far from ideal when you compare it with Listing 3. You might argue that the only way to get anything like Listing 3 is to have exceptions added to the language - I agree that exceptions have advantages, but they also have negatives - so we take a slight detour to consider why Go does not have exceptions.

Exceptions

exceptions
simply can't be
... ignored!     
There are two major advantages with exceptions, plus a disadvantage.

Advantages

A. If you search on the Internet for the advantages of exceptions you find all sorts of things mentioned (eg the first hit at Google I get is to the Java documentation on exceptions). What they fail to mention is the most important one - exceptions simply can't be (accidentally or intentionally) ignored! This was the first thing that struck me when I first read of exceptions in
* Note that nowadays these sort of C bugs are sometimes detected in some way and the software terminated by the operating system but when I started using C on operating systems without hardware memory protection (eg MSDOS, AmigaDOS, etc) an ignored error could cause mayhem, eg: behave erratically, even corrupt its own data and save it to disk. It might also trash other running software, or bring down the OS!
Stroustrup's The C++ Programming Language, 2nd Edition as I had spent many painful years tracking down bugs of this nature in C code* - if an error occurs you throw an exception and the program stops, unless steps are taken to catch it.

B. Of course, the other major advantage of exceptions is reduced complexity (see Listing 3). The error-processing code does not obscure the "normal" control flow.  This makes the software more understandable and even easier to get right. And of course, it relieves the tedium of writing lots of similar, uninteresting code.

Disadvantages

From the above advantages you can see that exceptions are great to (A) ensure that errors are not ignored and (B) to make "normal" code easier to write and understand.  In other words they are great when:

A. exceptions are thrown and not caught (error causes program to terminate)
B. exceptions are never thrown (no error encountered)

The real problems occur when:

C. exceptions are thrown and caught

The reason is that exceptions are often thrown when you are not expecting it. At the point they are caught it is easy for the software to be left in an inconsistent state causing memory/resource leaks and even more serious bugs. In fact there are many examples of simple, seemingly innocuous, C++ code where an exception causes that most heinous of coding-crimes: undefined behavior.

This is not too serious if exceptions are used properly and sparingly but the last decade has revealed a new problem - the gross overuse and misuse of exceptions for normal control flow as seen in a lot of Java code.
To elaborate, you can write safe code using exceptions but it is hard. The trouble is you have to always be thinking about what exceptions could be thrown in addition to the actual problem that you are trying to solve. And the human brain is not good at multi-tasking.

Furthermore, exceptions and concurrency do not mix well. (Eg: see the section on Mismatch With Parallel Programming in Exception Handling Considered Harmful).  Go makes writing concurrent code easy, so it seems better to avoid exceptions in the language.

Go

So Go does not have exceptions due to their major problem (C above). Go attempts in other ways to obtain their benefits (A and B), but does does not do so effectively. My proposal is to enhance the Go compiler's support for error handling to obtain the benefits that exceptions give to A and B.

The Proposal

My proposal relies on the compiler generating some hidden code. (Alternatively, this could be done by some sort of preprocessing of Go code.)

In summary, I propose these changes to Go:

  • If a function is called which returns one or more values, the last of which is of type error, and
  • if that last returned (error) value of the function is not assigned or used AND
  • the calling function also has a (last) error-return value THEN
  • the compiler will automatically add code to check the error return value AND
  • if the called function returns an error then the calling function should return the same error
Returning tto our original example, Listing 5 has the same copy function as Listing 4 but with no error-handling, which is proposed to be implicitly added by the compiler.

func copy_file(in_name string, out_name string) error
{
  fin := fopen(in_name, "r")
  defer fclose(fin)
  fout := fopen(out_name, "w")
  defer fclose(fout)
  buf := malloc(1024)
  defer free(buf)

  fmt.Println("Copying", in_name, "to", out_name)
  for fread(buf, fin) != eof {
  fwrite(buf, fout)
  }

}
Listing 5 Copy function with implicit error-handling.

the compiler would generate code equivalent to:

func copy_file(in_name string, out_name string) error
{
  fin, err := fopen(in_name, "r")
  if (err != nil) {
  return err
  }
  defer fclose(fin)
  fout, err := fopen(out_name, "w")
  if (err != nil) {
  return err
  }
  defer func() {
    err := fclose(fout)
  if (err != nil) {
  return err
  }
  }
  buf, err := malloc(1024)
  if (err != nil) {
  return err
  }
  defer free(buf)
  _, err = fmt.Println("Copying", in_name, "to", out_name)
  if err != nil {
  return err
}
  for {
  tmp, err := fread(buf, fin)
  if (err != nil) {
  return err
  }
    if tmp == eof {
      break;
    }
  err = fwrite(buf, fout)
    if err != nil {
  return err
  }
  }
}
Listing 6 Same function showing compiler-generated error-handling.

In this way errors can be propagated up the call stack without the need for explicit error-handling code at each level. Even accidentally forgetting to check the error return value of a function like close() will automatically be handled.

Of course, at any level you can override the compiler generated code, if it's error-handling is insufficient. This is done by simply using the error return value.

Or, if you know the error condition won't occur then you can explicitly assign the error to the blank identifier to ignore it.

A further advantage is that functions which return a result and an error can be used in expressions like this:

calc(strconv.Atoi(str))

instead of:

tmp, _ := strconv.Atoi(str)
calc(tmp)


which avoids the use of error-prone temporaries and has the added advantage that if Atoi() unexpectedly does get an error that it will be detected and returned.

Auxiliary Proposal

In addition, I have a related, but independent, proposal.
  • If a function just returns an error then you must use it (or assign to the blank identifier)
That is, this code should generate a compile-time error:

file.Close()          // compile error: error return cannot be ignored

To explicitly ignore the error you must do this:

_ = file.Close()   // ignore error return value from Close

Combined with the above main proposal this makes code much safer. In fact a lot of existing code that ignores the return value of Close() will now be safer without any code changes! Of course, without my main proposal this proposal would mean modifying a lot of existing code, such as most uses of fmt.Printf().

Summary

Two of the biggest problems I found in decades of programming in C were:

1. code that ignored error return values, often with severe consequences
2. having to write lots of boilerplate error-handling code

Exception handling (as first implemented in C++) was a great boon in addressing both of these problems.  However, exception handling introduced problems of it's own (as mentioned above) and the creators of Go chose not to add exception handling to the language, which I endorse.

Unfortunately, Go's approach to error-handling is not that much better than C. It attempts to address problem 1 but does not do so convincingly.  For example, it is easy to accidentally ignore an error-return value from a function that only returns one value.

Problem 2 has been alleviated somewhat by the introduction of the defer statement, but it is still tedious and error-prone - the sort of thing that a computer can do. Many people, including the Go creators, have debated this subject at length but their proposed strategies can be as tedious as the problem and are not generally applicable (at least until generics are added to Go. :).

My proposal addresses both the above problems without using exceptions. It relieves the tedium of writing a lot of very similar code, and makes the code easier to scan, and hence less likely to have bugs.

The benefits are obvious from comparing Listing 4 and Listing 5 above.

It also has the added advantage that functions that return two values, like strconv.Atoi(), can be used in expressions when the error return value is not needed. Again this can make the code simpler, by avoiding error-prone temporaries, and easier to read.