How do you measure the difficult stuff (empowerment, resilience) and whether any change is attributable to your role?

In one of his grumpier moments, Owen Barder recently branded me as ‘anti-data’, which (if you think about it for a minute) would be a bit weird foranyone working in the development sector. The real issue is of course, what kind of data tell you useful things about different kinds of programme, and how you collect them. If people equate ‘data’ solely with ‘numbers’, then I think we have a problem.

To try and make sense of this, I’ve been talking to a few of our measurement wallahs, expecially the ones who are working on a bright, shiny new acronym – take a bow HTMB: Hard to Measure Benefits. That’s women’s empowerment, policy influence or resilience rather than bed nets or vaccinations. It fits nicely with the discussion on complex systems (Owen’s pronouncement came after we helped launch Ben Ramalingam’s book Aid on the Edge of Chaos).

A few random thoughts, (largely recycled from Claire Hutchings and Jennie Richmond):

Necessary v Sufficient: there was an exchange recently during the debates around the Open Government summit. IDS researcher Duncan Edwards pointed out that it simply cannot be proven that to “provide access to data/information –> some magic occurs –> we see positive change.” To which a transparency activist replied ‘it’s necessary, even if not sufficient’.

But the aid business if full of ‘necessary-but-not-sufficient’ factors – good laws and policies (if they aren’t implemented); information and data; corporate codes of conduct; aid itself. And that changes the nature of results and measurement. If what you are doing is necessary but not sufficient, then there is no point waiting for the end result and working backwards, because nothing observable has happened yet (unless you’re happy to wait for several decades, and then try and prove attribution).

Example: funding progressive movements that are taking on a repressive government. Can you only assess impact after/if the revolution happens? No, you have to find ways of measuring progress, which are likely to revolve around systematically gathering other people’s opinions of what you have been doing.

Even if the revolution does come, you need to establish whether any of it can be attributed to your contributions – how much of the Arab Spring was down to your little grant or workshop?! Process tracing is interesting here – you not only investigate whether the evidence supports your hypothesis that your intervention contributed, but also consider alternative narratives about other possible causes. You then make a judgement call about the relative plausibility of different versions of events.

Counterfactuals when N=1: Counterfactuals are wonderful things – find a comparable sample where the intervention has not happened, and if your selection is sufficiently rigorous, any difference between the two samples should be down to your intervention. But what about when N=1? We recently had a campaign on Coca Cola which was rapidly followed by the company shifting its policy on land grabs. What’s the counterfactual there – Pepsi?! Far more convincing to talk to Coke management, or industry watchers, and get their opinion on the reasons for the policy change, and our role within it.

All of these often-derided ‘qual’ methods can of course be used rigorously, or sloppily (just like quantitative ones).

Accountability v Learning: What are we doing all this measurement for? In practice M&E (monitoring and evaluation) often take precedence over L (Learning), because the main driver of the whole exercise is accountability to donors and/or political paymasters. That means ‘proving’ that aid works takes precedence over learning how to get better at it. Nowhere is this clearer than in attitudes to failure – a goldmine in terms of learning, but a nightmare in terms of accountability. Or in timelines – if you want to learn, then do your monitoring from the start, so that you can adapt and improve. If you want to convince donors, just wait til the end.

People also feel that M&E frameworks are rigid and unchangeable once they’ve been agreed with donors, and they are often being reported against by people who were not involved in the original design conversations. That can mean that the measures don’t mean that much to them – they are simply reporting requirements, rather than potentially useful pieces of information. But when you actually talk to flesh and blood donors, they are increasingly willing to adjust indicators in light of the progress of a piece of work – sorry, no excuse for inaction there.

Horses for Courses: Claire has been playing around with the different categories of ‘hard to measure’ benefits that Oxfam is grappling with, and reckons most fall into one (or more) of the following three categories

The issue is complex and therefore hard to understand, define and/or quantify (e.g. women’s empowerment – see diagram, resilience). Here we’re developing ‘complex theoretical constructs’ – mash up indices of several different components that try and capture the concept
The intervention is trying to influence a complex system (and is often itself complex) eg advocacy, or governance reform. For both, we’re investing in real time information to inform short- and medium term analysis and decisions on ‘course corrections’. In terms of final asessments on effectiveness or impact, policy advcocay is where we think process tracing can be most useful. The units are too small to estimate the counterfactual and so it’s about playing detective to try and identify the catalysts and enablers of change, and see if your intervention is among them. We’re still learning how best to approach evaluation of governance programming. Some will lend themselves to more traditional statistical approaches (eg the We Can campaign on Violence Against Women), but others have too few units to be classified as ‘large n’ and too many to be considered truly ‘small n’. We trialled outcome harvesting + process tracing last year in Tanzania.
The intervention is taking place in a complex, shifting system. This might be more traditional WASH programming, or attempting to influence the complex systems of governance work in fragile states. The evaluation designs may be similar to categories one and two above, but the sustainability of any positive changes will be much less certain, and like 2, we are most interested in realtime learning informing adaptation, which requires a less cumbersome, more agile approach.

Got all that? My head hurts. Time for a cup of tea.

How do you measure the difficult stuff (empowerment, resilience) and whether any change is attributable to your role?

Comments