Back in March there was a fascinating exchange on this blog between Ros Eyben and Claire Melamed on the role of measurement in development work (my commentary on that debate here). Now one of Oxfam’s brightest bean counters (aka ‘Programme Effectiveness Adviser’), Karl Hughes, explains where Oxfam has got to on this: Eric Roetman, in a recent 3ie working paper, A can of worms? Implications of rigorous impact evaluations for development agencies, tells a provocative tale of the experiences of International Child Support (ICS) in Kenya carrying out randomised control trials (RCTs) in partnership with several world-renowned quantitative impact evaluation specialists. ICS saw itself evolve into a “development lab”, where the bulk of its staff became devoted to supporting the organisation’s research, as opposed to development, operations. Given ICS’s desire to revert back to its roots, it eventually opted to get out of the RCT business. ICS’ story relates directly to issues further explored in another recent 3ie working paper I recently co-authored with Claire Hutchings, another one of Oxfam GB’s global MEL advisers, entitled Can we obtain the required rigour without randomisation? Oxfam GB’s non-experimental Global Performance Framework. The central issue is this: We in the international NGO community are all too aware of our need to up our game in both understanding and demonstrating the impact – or lack thereof – of the various things we do. But what really baffles us is just how to do so without going down the “development lab” route. (This is not to imply that “development labs” are bad; in fact, the more their findings inform our programming, the better.) The bottom line, as outlined in our paper, is that evaluation is research, and, like all credible research, it takes time, resources, and expertise to do well. This is equally true no matter what our epistemological perspective – positivist, realist, constructionist, etc. This is perhaps why, rather than using those offered by mainstream academia, we as a sector are so quick to experiment with seemingly more doable alternatives such as Most Significant Change, social return on investment (SRI), outcome mapping, and participatory M&E. They’re all very well, but those of us who feel a need to go further find ourselves at a loss. One popular way of attempting to demonstrate effectiveness, being pursued by several international NGOs, which we comprehensively bash in the paper, is dubbed “global outcome indicator tracking.” Here, the organisation in question gets all its programmes/partners to collect common data on particular outcome measures, e.g. household income. All these data are then aggregated (only the gods know how) to track the welfare of global cohorts of programme “beneficiaries” over time. If there is positive change in relation to the indicator from time 1 to time 2, the organisation can boast about how much impact it is generating. Aggregation complexities aside, the underlying foundations of this approach are inherently precarious. In general, outcome level change is influenced by numerous extraneous factors, e.g. rainfall patterns in rain-fed agricultural communities. Consequently, even if we are able to capture reliable data on a decent outcome indicator, its status will go up and down and all around not matter what our interventions are and/or how well they are implemented. Any consideration of attribution is entirely absent. But what of the fact that donors have been encouraging us to pursue outcome indicator tracking for decades now through instruments such as the logframe, as part of ‘good practice’? In a paper entitled, The Road to Nowhere, Howard White argues that the United States Agency for International Development (USAID) identified the futility of the outcome indicator tracking strategy some years ago and, consequently, abandoned it. I worked on a USAID funded orphan and vulnerable children (OVC) programme from 2005 to 2010, and yes we were only required to report on outputs, so perhaps this was the consequence of this realisation. (Incidentally, USAID also came to the realisation that there was no evidence-base established on what works and what does not in OVC programming after all the billions that it spent and seems to regret not having supported the rigorous evaluation of key OVC care and support interventions.) To what extent have the other donor agencies recognised the fallibility of outcome indicator tracking? Sadly, there is plenty of evidence to suggest that many are still operating in this outdated paradigm. So where does this leave us as NGOs? While Oxfam GB has not come up with a panacea, it is attempting to pursue a strategy that is reasonably credible. Each year, we are randomly selecting and then evaluating, using relatively rigorous methods by NGO standards, 40-ish mature interventions in various thematic areas. The causal inference strategy differs depending on the nature of the intervention. For community-based interventions, for instance, where we are targeting many people (aka large n interventions), we are attempting to mimic what RCTs do by statistically controlling for measured differences between intervention and comparison populations. Evaluating our policy influencing and “citizen voice” work (aka small n interventions), on the other hand, requires a different approach. Here, a qualitative research method known as process-tracing is being used to explore the extent to which there is evidence that can link the intervention in question to any observed outcome-level change. It is not that the above approaches are free of limitations. In the case of large n interventions, for instance, given that programme participants have not been randomly assigned to intervention groups, coupled with the conspicuous absence of proper baseline data, we cannot absolutely guarantee that any observed outcome differences are the result of the workings of the intervention in question. The process tracing approach is also retrospective in nature, when ideally the research should take place throughout the life of the advocacy or popular mobilisation initiative. But, hey, what we are doing is not too shabby, especially considering that we are no “development lab.” Moreover, every evaluation design – even the golden RCT – has inherent limitations. Nonetheless, if anyone has any suggestions on how NGOs in general and Oxfam in particular can do a better job at both understanding and demonstrating impact, I’d love to hear them.]]>