Lexicographical Disordering

 4 min read

About 18 months ago, back in November 2017, I made a note to write about an interesting mistake I made in a C++ program. Time passed and my note proved to be insufficiently detailed—I’d forgotten what made the mistake interesting at all. Last week, memories came flooding back as I made exactly the same mistake again. If only I’d written the original article, then history could never have possibly repeated itself, right?

Le Problem

I was using a std::map to store a relationship between two types of object. The Key type looked something like this:

struct Key final
{
    const std::string group;
    const std::string name;

    Key(std::string group_, std::string name_) :
        group { std::move(group_) },
        name { std::move(name_) }
    {
    }
};

Dead simple.

C++‘s standard library is clever in its simplicity. Every algorithm depends on a single comparison being defined, operator<. All other comparison operators can be derived from this, as shown in the table below. There’s even a helper for this in the standard library.1

OperationImplementation
a >= b!(a < b)
a > bb < a
a <= b!(b < a)
a == b(a < b) == (b < a)
a != b(a < b) != (b < a)

So why is this relevant to std::map? The standard mandates that it’s a sorted container, with ordering enforced by a comparator object which, by default, makes use of operator<. So I implemented the following operator:

struct Key final
{
    ...

    bool operator< (const Key &rhs_)
    {
        if (group < rhs_.group)
            return true;

        return name < rhs_.name;
    }
};

Proud with my brevity, I moved onto the matter at hand; I didn’t actually care about enforcing an ordering, I just needed the associative data-structure. If you’ve already spotted the mistake: you’re doing better than I did!

Le Limpet Mine

This went unnoticed for a surprising amount of time. Yes I know I should have written a test for it, but it was simple and I felt cocky.

The mapping was being used to track plugins being loaded into an application, so the data being run through the faulty comparison was carefully curated (by accident). I ended up with a situation like this:

GroupName
computemaximise
computeminimise
computeoptimise
renderpng

All’s well; clearly the ordering is being enforced correctly. Then I add two more plugins into the system, let’s call them { "logging", "file" } and { "logging", "network" }. Much to my frustration, I end up with an exception being thrown from map::at because one of the keys wasn’t found. I knew it existed because I was in quite close control of what plugins were loaded, but map::at was failing nonetheless.

I spent ages checking the plugin loading logic, making sure that registration and discovery functions were being called. I stepped through the code watching what happened. I even added cout’s as plugins were registered just in case.

Everything was working as intended…

Analysis

As a sanity check, I enumerated the map, printing out the keys to verify what was being stored was as I expected it to be:

GroupName
loggingfile
computemaximise
computeminimise
loggingnetwork
computeoptimise
renderpng

*Facepalm*. The second I saw it, I knew what I’d done. No wonder map::at was failing to find any of the compute plugins because it wouldn’t advance beyond the two logging groups (remember that it’s a sorted container). Every cloud has a silver lining, and this lining was a jolt to the memory: I’d made a note to write about this mistake before.

Last time this happened (yes, I’m still reeling from embarrassment…) I produced a plot using matplotlib and was presented with a pattern that was surprisingly pleasing to look at. Given that I have this nice, shiny, new blog now, I saw it as an opportunity to play with D3.js. After a bit of playing and learning, I had this figure below:

True positive
True negative
False positive
False negative
Heatmap showing the non-transitive behaviour of the erroneous comparison operator. Note the lack of false negatives, helping to reinforce the illusion of correctness...

I’ve mentioned before that I use Gatsby for this blog which uses React internally. I’m a huge fan of React, but it clearly has its own view of how the DOM should be managed, so I saw integrating a D3 plot into a React site as an interesting challenge. Perhaps I should write about that too…

Anyway, back to the analysis: the lack of false negatives in the above plot demonstrates how easy it is to forget to be a scientist. When we conceive a hypothesis, we instinctively jump to confirming it to be true. In this case, all positives are true positives, so naively the implementation clearly works. Science, however, is not about proving things to be correct (that’s maths); it’s about proving things to be not wrong. That difference is subtle but extremely powerful.

Uncovering the problem shown very clearly in the figure is a two step process:

  1. Does the hypothesis stand up against some indicative stimulus? If not: great, you’ve just ruled out a bad idea before spending much time on it. If so: awesome, now we can verify its usefulness.
  2. Given that it works some of the time, when does this model break down? This part is the real scientific work. If the model breaks down too soon, it clearly isn’t useful. If it breaks down at some point that can be justified, then it’s still useful2.

When writing my operator, I wore the wrong hat. My software engineer mind was pleased with the simplicity and brevity of the implementation while my scientist and mathematician hats were still on the rack. Ah well, live ‘n’ learn. My mathematician hat needs an awful lot of work anyway…

Le Fix

As with almost every single problem in C++, the fix is trivial and requires only juggling a handful of characters:

bool operator< (const Key &rhs_)
{
    if (group == rhs_.group)
        return name < rhs_.name;

    return group < rhs_.group;
}

It’s almost as terse as before! That makes it even more embarrassing!

I’ve recently started Jeremy Kun’s A Programmer’s Introduction to Mathematics which neatly explains this process in the first chapter. As programmers/coders/software engineers/whatever, we all deal with these frustrations as we progress towards the mastery of a language or framework. We make these stupid mistakes and solve them, banging our heads against these walls thinking “but this is clearly correct!”

Thankfully, the more we bang our heads, the harder they become and the weaker the walls become. The walls always give way, but our heads never do, right? So, I suppose I should celebrate that my head is just a little bit harder now. 🙂


  1. The implementation of this requires that operator== is also defined, but that can itself be derived from operator< too. I’m not sure why the implementation of operator!= depends on operator==. I posit that it might be because all rel_ops operators use exactly one comparison, but deriving either operator== or operator!= from operator< would require two.
  2. Newtonian physics is probably the most accessible example of this. It doesn’t capture the nuances of quantum or relativistic physics, but it’s still pretty good in the environments in which we humans tend to operate. A huge chunk of engineering uses Newtonian physics and it works just fine.