In one of his grumpier moments, Owen Barder recently branded me as âanti-dataâ, which (if you think about it for a minute) would be a bit weird foranyone working in the development sector. The real issue is of course, what kind of data tell you useful things about different kinds of programme, and how you collect them. If people equate âdataâ solely with ânumbersâ, then I think we have a problem.
To try and make sense of this, Iâve been talking to a few of our measurement wallahs, expecially the ones who are working on a bright, shiny new acronym â take a bow HTMB: Hard to Measure Benefits. Thatâs womenâs empowerment, policy influence or resilience rather than bed nets or vaccinations. It fits nicely with the discussion on complex systems (Owenâs pronouncement came after we helped launch Ben Ramalingamâs book Aid on the Edge of Chaos).
A few random thoughts, (largely recycled from Claire Hutchings and Jennie Richmond):
Necessary v Sufficient: there was an exchange recently during the debates around the Open Government summit. IDS researcher Duncan Edwards pointed out that it simply cannot be proven that to âprovide access to data/information â> some magic occurs â> we see positive change.â To which a transparency activist replied âitâs necessary, even if not sufficientâ.
But the aid business if full of ânecessary-but-not-sufficientâ factors â good laws and policies (if they arenât implemented); information and data; corporate codes of conduct; aid itself. And that changes the nature of results and measurement. If what you are doing is necessary but not sufficient, then there is no point waiting for the end result and working backwards, because nothing observable has happened yet (unless youâre happy to wait for several decades, and then try and prove attribution).
Example: funding progressive movements that are taking on a repressive government. Can you only assess impact after/if the revolution happens? No, you have to find ways of measuring progress, which are likely to revolve around systematically gathering other peopleâs opinions of what you have been doing.
Even if the revolution does come, you need to establish whether any of it can be attributed to your contributions â how much of the Arab Spring was down to your little grant or workshop?! Process tracing is interesting here â you not only investigate whether the evidence supports your hypothesis that your intervention contributed, but also consider alternative narratives about other possible causes. You then make a judgement call about the relative plausibility of different versions of events.
Counterfactuals when N=1:Â Counterfactuals are wonderful things â find a comparable sample where the intervention has not happened, and if your selection is sufficiently rigorous, any difference between the two samples should be down to your intervention. But what about when N=1? We recently had a campaign on Coca Cola which was rapidly followed by the company shifting its policy on land grabs. Whatâs the counterfactual there â Pepsi?! Far more convincing to talk to Coke management, or industry watchers, and get their opinion on the reasons for the policy change, and our role within it.
All of these often-derided âqualâ methods can of course be used rigorously, or sloppily (just like quantitative ones).
Accountability v Learning: What are we doing all this measurement for? In practice M&E (monitoring and evaluation) often take precedence over L (Learning), because the main driver of the whole exercise is accountability to donors and/or political paymasters. That means âprovingâ that aid works takes precedence over learning how to get better at it. Nowhere is this clearer than in attitudes to failure â a goldmine in terms of learning, but a nightmare in terms of accountability. Or in timelines â if you want to learn, then do your monitoring from the start, so that you can adapt and improve. If you want to convince donors, just wait til the end.
People also feel that M&E frameworks are rigid and unchangeable once theyâve been agreed with donors, and they are often being reported against by people who were not involved in the original design conversations. That can mean that the measures donât mean that much to them – they are simply reporting requirements, rather than potentially useful pieces of information. But when you actually talk to flesh and blood donors, they are increasingly willing to adjust indicators in light of the progress of a piece of work â sorry, no excuse for inaction there.
Horses for Courses: Claire has been playing around with the different categories of  âhard to measureâ benefits that Oxfam is grappling with, and reckons most fall into one (or more) of the following three categories
- The issue is complex and therefore hard to understand, define and/or quantify (e.g. womenâs empowerment â see diagram, resilience). Here weâre developing âcomplex theoretical constructsâ â mash up indices of several different components that try and capture the concept
- The intervention is trying to influence a complex system (and is often itself complex) eg advocacy, or governance reform.  For both, weâre investing in real time information to inform short- and medium term analysis and decisions on âcourse correctionsâ.  In terms of final asessments on effectiveness or impact, policy advcocay is where we think process tracing can be most useful.  The units are too small to estimate the counterfactual and so itâs about playing detective to try and identify the catalysts and enablers of change, and see if your intervention is among them. Weâre still learning how best to approach evaluation of governance programming. Some will lend themselves to more traditional statistical approaches (eg the We Can campaign on Violence Against Women), but others have too few units to be classified as âlarge nâ and too many to be considered truly âsmall nâ. We trialled outcome harvesting + process tracing last year in Tanzania.
- The intervention is taking place in a complex, shifting system. This might be more traditional WASH programming, or attempting to influence the complex systems of governance work in fragile states. The evaluation designs may be similar to categories one and two above, but the sustainability of any positive changes will be much less certain, and like 2, we are most interested in realtime learning informing adaptation, which requires a less cumbersome, more agile approach.
Got all that? My head hurts. Time for a cup of tea.