Goodhart's law, credentials and social order - Part 1
--
I am trying out a new format of writing and reading. I find that when I go in with questions I get a lot more out of them, rather than reading blindly. Below I have written some questions to think about before reading this. I also made it possible for you to find the answers to these questions without having to read the whole thing (although I still encourage you to read the whole thing!). Click “Command” + “F” (if you’re on a mac), or “CTRL” + “F” if you’re on anything else. Copy and paste the question you have in the bar, and it should direct you to the section in my blog that will answer it for you (click enter until it shows you 2/2).
Questions:
What’s the TL;DR version?
What’s wrong with Goodhart’s law?
Why should I care about this in general?
How does this impact knowledge workers?
As an employer, why should I care right now?
As an employee why should I care about this in re to work?
--
"When a measure becomes a target, it ceases to be a good measure." - Goodhart's Law
Measurements (that are public) dictate behaviour, there is no escaping this. Most people also want to game things. We can’t really help it. So when you introduce a metric to people that dictates how they will be rewarded or punished, expect behavioural change and expect everyone to try and game it.
— As an employer, why should I care right now? —
A good example would be promoting employees based on the financial success of the projects they worked on, something a lot of companies do. They do this because they can not directly measure how an employee performed, so they use this as a proxy of their performance. The assumption is this:
If project X is successful, and employee A worked on it, then employee A must have had influence on this success, so employee A is performing well. If employee A is always working on high performing projects, then employee A must be a high performer.
— As an employee why should I care about this in re to work? —
Unsurprisingly, employees will start working on projects that are bound to be successful regardless of their input. This naturally decreases an employees’ (and therefore company’s) risk appetite. Probably not the behavioural change companies wanted, and it definitely isn’t measuring what employers actually want. Here is a good anecdotal example of how an employee changed their behaviour based on the metrics that dictated promotion.
— What’s wrong with Goodhart’s law? —
Interestingly enough, this does not apply to all things. For example this does not really happen in start-ups or sports. Why? What makes start-ups and sports so special?
Firstly, notice how I mention the employee performance metric was a proxy for what they actually wanted to measure. More often that not, this is a common theme amongst game-able metrics. They don’t actually measure the thing they claim, they measure proxies. Most proxies follow this pattern:
We want to measure X, which produces Y. Since we can not directly measure X, we will measure C.
I think you can guess where I am going with this. If the rules of the game are that you get rewarded based on C, and I find out I can achieve C using a simpler means, then ‘don’t hate the player, hate the game’.
— How does this impact knowledge workers? —
Start-ups and sports are different because what they want to measure is observable, which means it can be directly measured. In football for example there is no proxy for the number of assists. You either assisted and it was recorded on camera, or you did not. There is no proxy. Same goes with start-ups, you are either growing your number of additional users or you are not.
You might think ‘this is why employees need to be in the office. I need to see what work they are doing, it’s the only way to make it observable, and therefore measurable!’, but you are wrong. The reason why this does not apply for knowledge workers in particular (it’s different for manual labourers, which we will cover in part 2) is that the observable aspect of their work is still a proxy. Unless you are seeing every single thing they are doing on their computer, their work is not actually observable to you. You are seeing the ‘act of working’ not the work itself. Don’t be fooled.
Secondly, because the way in which the measurement was conducted was never a direct measurement to begin with, it’s the publicness of the indirect measurement that actually caused the unintended behaviour. In other words a measurement being a target, doesn’t automatically become a bad measurement. A measurement based on proxy, that becomes a target ceases to be a good measurement . Let’s use the employee performance example to better understand this. Had that metric been made private it probably wouldn’t have caused a change in behaviour, or be gamed, because it wouldn’t have been a target (although it would have eventually been found out and be made a target). Better yet, had we actually been able to directly measure an employees performance, becoming a target would not have been a problem at all. It would probably be a good thing as it would encourage the right type of behaviour, and it wouldn’t be game-able.
Thirdly, one direct measurement never tells the full complex story, rather multiple individual measurements together tell the full complex story. This is often the problem with trying to measure and predict an individual's performance, people are complex. People are also lazy, so we naturally look to find one single metric that can directly capture the complexity of an individual, when this is not possible. Take our example of startups. Start-ups are complex, and have a lot more moving parts at the same time, similar to people. Similar to people, we want to quickly know how well they are doing. In the case of the start-up however, we would look at multiple individual measurements such as revenue, user growth, user retention, customer acquisition cost etc to answer this complex question and tell the full story. In the case of people, we would turn to a single form of measurement, oftentimes a credential, to tell us everything about an individual. In other words, they don't have bundled forms of measurement.
So what does this mean for credentials that try to measure and predict performance? Let’s examine an educational credential such as Harvard. What does it actually measure? What does it actually predict? Most people will mention something to do with performance. Well, what is performance? And this is where we pretty much all fail. Performance is a term a lot of us hide behind, because the truth is we don't really know. We have a vague sense of it, but when it’s time to get down to the details we really struggle.
Let’s give people the benefit of doubt and say they were actually able to describe aspects of performance. Some would mention ‘intelligence’ (what does that mean?), ‘‘work ethic’ (what does that mean? x 2) or ability to succeed (what does that mean? x 100). Some would mention all 3 and more. The point is even then we can’t agree. We are trying to indirectly measure too many things that we don’t even understand at a basic level. It’s a bundled indirect measurement.
Richard Hamming said it best with this anecdote:
So long as you asked us ‘was Joe better than Pete?’ Hank and I would agree. The moment you said ‘why?’ Hank and I were at each other's throats because we couldn’t agree on why.
This is actually one of the biggest competitive advantages current credentials have. How can you compete with something that nobody really understands, or can even directly point to in its’ rawest form? So when you try to offer an alternative, you will usually be met with ‘yeah well, it doesn’t really quite capture A,B,C….X,Y and Z’, even though they themselves don’t even know what ‘A,B,C’ looks like. Your very real solution will always fail in comparison to the grand illusion everyone has bought into. In other words, educational credentials (and most credentials) are a form of simulacra.
-- record scratch, you’re probably wondering how I got here --
?? What is ?? A ?? SiMulacRa ??
Simulacra is a term popularized by french sociologist Jean Baudrillard. It means a copy without an original. It is not to be confused with a simulation. A simulation has an original, a simulacra’s original is either destroyed or extremely hard to locate/define.
In the case of an educational credential, such as ‘Harvard’ it is meant to represent the measurement of a number of things, for now we will focus on intellect, work ethic and ability to succeed. If educational credentials are meant to be a ‘copy/ representation’ of intellect, work ethic and ability to succeed then what do they look like in their original form? Answer: Unknown!
So it is a simulacra. A copy with no original.
Back to the current post.
-- unscratch --
— What’s the TL;DR version? —
We’ve covered a lot, so let’s do a quick recap:
Measuring a proxy, and directly measuring something are two very different things that can have similar impact in private, but very different impact once made public
Goodhart’s law doesn’t seem to apply to things that are observable, and therefore can be directly measured.
Measurements seem to work well for things like start-ups and sports because they’re directly measured and their measurements are unbundled
Complex things can be measured as seen in startups, but it’s near impossible to capture complexity with one measurement
Credentials are based on things we can not identify in their original form. They are a form of simulacra.
This is all well and good to know, but why does this problem matter to me, you and society as a whole?
— Why should I care about this in general? —
Well it dictates social order and societal progress. Sounds like a wild jump but hear me out, let’s follow the logic.
Social order:
What we measure dictates behaviour. So however we measure people, will dictate how they behave. In particular, how we convey this measurement will dictate how an individual behaves, and how others behave towards them. This dictates where an individual is placed within the society. This dictates a lot of their contributions and influence on society, in other words the social order of things.
Societal progress:
What we measure dictates behaviour. So however we measure people, will dictate how they behave. Measurement of people predominantly occurs within workplaces. Companies contribute to society through jobs and their products, so this will dictate how much an individual gets paid and what they produce. The income from jobs and output of this production contributes to things like GDP, which is just one indicator of societal progress.
So the jump isn’t so wild you see. The transition from an industrial society to a knowledge economy, has also meant the shift away from machines being the primary source of capital to people being the most important form of capital. As this trend continues, the importance and urgency of this problem also continues to rise.
In part 2 I will explore in more detail the impact of measurement and signalling on social order and societal progress.
As always, please feel free to reach out to me to bounce ideas, collaborate or whatever frankly. I am currently working on solving this problem (full time) so I am always looking to meet cool people.
If you would like to join the credentials and meritocracy community click here, here is a good guide describing who we are looking for. Or you can follow me on twitter @adaobiadibe_