Tuesday, 27 August 2013

Iterators Through the Looking Glass

I have been talking a lot about STL (the brilliant C++ template library) lately because it is brilliant, but also because I have been teaching a class on it at my son's local high school - NBHS (Normanhurst Boys High School).

Several boys at NHBS have been winning programming competitions recently - for example, three went to the Informatics Olympiad training camp which takes the top 15 high school students throughout Australia to be trained and selected for the International Informatics Olympiad. My son Daniel's team also recently won the open round of the UNSW Prog Comp (see https://cgi.cse.unsw.edu.au/~progcomp/open/prov.php).

However, they do all their stuff in Python. I like Python (a lot) but I am trying to convince them that C++ and STL is more flexible (and efficient) for a lot of problems. Most of these competitions are won using C++ or Python.

One of the questions that recently came up in the class was to do with reverse iterators. There is some confusion about reverse iterators so I thought I should explain... (I promise that this is the last post on STL for a while.)

Missing STL Algorithms?

Recently in the class we looked at some STL algorithms. I mentioned that one way to find a character in a string is to use the find() algorithm, which is analogous to the C standard library strchr() function (except that it can be used to search a range of any type not just a string of chars).

// C
const char * s = "Hello world.";
const char * p = strchr(s, 'e');      // returns pointer to 'e'

// C++ using a string as an STL container
std::string s("Hello world.");
auto p = std::find(s.begin(), s.end(), 'e');

The obvious question is then: how do you search for a character from the end of a string instead of the start? That is, what is the analogous function in STL to the standard C library function strrchr()? (Note the extra 'r' denoting a reverse search.)

If you are looking for an STL algorithm that searches for a value from the end you will probably immediately think of find_end(). As it turns out there is a function called find_end, but it searches for the last occurrence of a sub-range within a range. You could use it to search for the last occurrence of a sub-string within a string. Ie, it is like search() [which is analogous to the standard C function strstr()] but searches from the end. Obviously it should have been called search_end not find_end.
It turns out there is no STL algorithm to do this. Is this an oversight? Hardly, since it is trivial to do a reverse find using a reverse iterator. However, it is not so simple to use the returned iterator unless you understand how reverse iterators work.

Reverse Iterators

Reverse iterators are exactly the same as normal iterators except in reverse. When you increment a reverse iterator it moves to the previous element in a range. When you dereference a reverse iterator you access the element before the address.



My first use of reverse iterators was very pleasant. I was working on software that manipulated 3-D geophysical data and I had a huge array (actually a vector of vectors of vectors) of magnetic data that was indexed by Cartesian coordinates (x, y, z). To process this data I had a 3 nested loops (for each of the 3 dimensions). At some point I realized that I was processing one of the dimensions in the wrong direction. It was trivial to simply change the iterators used for that loop to reverse iterators. Doing the same thing with array indices would have taken time (maybe 10 minutes) and would have been a little more error-prone too.

Rather Backward

Reverse iterators are perfect if you just want to use or modify the value they "point" to. The problem occurs when you want to do something like erase the value from its container. It is easy enough to erase from a container given a forward iterator. Unfortunately, you can't erase using a reverse_iterator - container erase() methods only take a normal iterator.

The other problem is that you need to think carefully about whether you want the "forward" value or the "reverse" value given an address.

Let's look at an example. The following code searches for the first occurrence of a left bracket and adds a space before it.

std::string pal("A(good) test.");
auto first = std::find(pal.begin(), pal.end(), '(');
pal.insert(first, ' ');   // "A (good) test."

Note that I use the new C++11 keyword "auto" to save typing "std::string::iterator". Also note that I am using a std:string as an STL container but I could just as easily use a std::vector, std::list, etc.



That's OK. Now how do we insert a space before the last bracket? The obvious thing to try is:

std::string pal("A(good) test. It(really)is.");

auto last = std::find(pal.rbegin(), pal.rend(), '(');
pal.insert(last, ' ');   // compile error

But std::string::erase() takes an iterator parameter not a reverse_iterator, so this does not even compile. Luckily, you can convert a reverse_iterator to the corresponding iterator by using reverse_iterator::base().

std::string pal("A(good) test. It(really)is.");

auto last = std::find(pal.rbegin(), pal.rend(), '(');
pal.insert(last.base(), ' '); // "A(good) test. It( really)is."

But this is also wrong as it inserts the space after the bracket. The problem is that an iterator "points" to the next element, whereas a reverse_iterator points to the previous one. See the diagram.



In this case we want to insert before the address of the element pointed to by the iterator, so we need to decrement the reverse iterator before we convert it to a forward iterator.

std::string pal("A(good) test. It(really)is.");

auto last = std::find(pal.rbegin(), pal.rend(), '(');
pal.insert((++last).base(), ' '); // "A(good) test. It (really)is."

In the Looking Glass

So to use a reverse iterator you may need to increment it before converting it to its corresponding forward iterator. However, this is not the case if the natural order of the data is reversed, in which case a reverse iterator reflects the correct processing direction. An example, would be a text editor for a language with right to left reading order - if you insert a character in such a text editor then the existing character at the cursor (and all those to the left of it) gets pushed left. This is analogous to "inserting" a space at the right bracket in the above example string like this:

last = std::find(pal.rbegin(), pal.rend(), ')');
pal.insert(last.base(), ' '); // "A(good) test. It (really) is."



However, even in data with a natural reversed order deleting an element (with the container's erase() method) requires adjusting the reverse iterator first as erase() only takes a forward iterator and consequently deletes the next element, not the previous one as we would like.

Conclusion

Reverse iterators are useful and essential to be used with some algorithms when you want to work from the end instead of the start.  (An alternative is completely reversing the order of the range first using reverse() but that may be inefficient.)

The problem with reverse iterators is that many functions only use normal iterators. You can use base() to convert a reverse iterator but you need to be careful you use the right slot. Most of the time you want to use the slot before in which case you should increment the reverse iterator before calling base() on it.