When we (rigorously) measure effectiveness, do we want accountability or learning? Update and dilemmas from an Oxfam experiment.

Claire Hutchings, Oxfam’s Global MEL Advisor, brings updates us on an interesting experiment in measuring impact – randomized ‘effectiveness reviews’.

For the last two years, Oxfam Great Britain has been trying to get better at understanding and communicating the effectiveness of its work. With a global portfolio of over 250 programmes and 1200 associated projects in more than 55 countries on everything from farming to gender justice – grappling with the scale, breadth, and complexity of this work has been the challenge. So how are we doing? Time for an update on where we’ve got to, with apologies in advance for a hefty dollop of evaluation geek-speak.

After much discussion and thought, we developed our Global Performance Framework (GPF). The GPF is comprised of two main components. The first is Global Output Reporting, where output data are aggregated annually under six thematic headings to give us an overall sense of the scope and scale of our work.

Secondly, in addition to the headline numbers, we need to drill down on the effectiveness question, which we’ve been doing via rigorous evaluations of random samples of mature projects. These evaluations – known as ‘Effectiveness Reviews’ – were launched in 2011/12 and today we’re releasing the first batch of effectiveness reviews from 2012/13, covering everything from strengthening women’s leadership in Nigerian agriculture to building sustainable livelihoods for Vietnam’s ethnic minorities .

The measurement approaches have developed quite a lot from the first year, so let’s start there. For the reviews of our ‘large n’ interventions (i.e. those targeting large numbers of people directly), we have been adapting the approach used by OPHI for the measurement of complex constructs, to measure both women’s empowerment and resilience. This has improved our overall measures.

For example, our initial women’s empowerment reviews primarily considered women’s influence in household and community decision-making. This has now been expanded to cover dimensions such as personal freedom, self-perception and support from social and institutional networks. Our resilience framework has expanded to include food security and dietary diversity.

We’ve already written about the challenges and lessons learnt from ‘small n’ interventions (where there are too few units to allow for tests of statistical differences – such as advocacy and campaigns), but essentially we continue to learn about how best to ensure consistency in application of the process tracing protocol, and how to answer the question ‘do we have sufficient evidence to draw credible conclusions?’ We also experimented with bringing outcome harvesting and process tracing together in a review of the Chukua Hatua programme in Tanzania.

We’ve smartened up the benchmarks for the humanitarian indicator toolkit and changed some of the standards we’re considering (contingency planning is out, replaced by resilience and preparedness.) And have piloted an Effectiveness Review on our own accountability (report out soon), bringing in an external reviewer to look, in depth, at the leadership, systems and practices of OGB and partner staff to reach conclusions on the ‘evidence’ available on the degree to which Oxfam’s work meets its own standards for accountability at project level.

We’re waiting to finalise the full set of reports – the final batch will be published in November/December – so it’s premature to start presenting summary findings, but I’m keen to dive into what I think remains one of the key outstanding challenges of this exercise: learning. The two key drivers for the GPF – accountability and learning – are often competing for attention and arguably require different approaches to the design, implementation and use of evaluations.

At the end of the first year, I think it’s fair to say that the consensus was that the GPF, and the effectiveness reviews in particular, were too heavily weighted towards ‘upward’ and ‘outward’ accountability (to donors and northern publics). So the challenge has been how to reorient them to better serve a learning agenda.

There are some noteworthy examples of where the reviews are contributing to project level learning. In Honduras, for example, the results of a review of community banks were disseminated to people in both intervention and comparison communities. In the “comparison” municipality, Oxfam’s partner organisation highlighted some points that the local government and community banks there could learn from those in the project area; in Tanzania, the effectiveness review of Chukua Hatua is being used to develop phase 3 of the programme; in the Philippines the Oxfam team has decided to use the effectiveness review as a baseline for the next phase of the project, and conduct the exercise again themselves in 2 years time.

At an organisational level, we’re starting to pull out thematic learning, as well as lessons on the design and implementatin of interventions. We’re also seeing an increasing appetite for including impact assessments in the initial programme design (rather than as an after-thought). For all that, though, I think it’s fair to say that learning from the effectiveness reviews remains a challenge (that means a problem, btw). Let’s talk through some of the main sticking points (in case any of you out there can help).

As with last year, there is a tension over the choice of projects: randomly selecting them at ‘the centre’ avoids cherry picking and arguably gives us a more honest picture of effectiveness, but it doesn’t always mesh with what countries and regions most want to learn. Evaluative questions often can’t be fully explored until an intervention is mature, but by then the intervention may no longer be topical and the questions may have moved on. Without a continuing programme, it may feel too late for evaluation to feed into learning” (it’s not, btw, there’s lots that we can draw into current and future programming).

It is also, crucially, about ownership – do the evaluation questions being asked by the effectiveness reviews sufficiently mirror the project’s own theories of change? Do project teams feel engaged by those questions, and therefore able to respond to the findings?

The evaluation designs for the ‘large n’ interventions in particular are complex and often unfamiliar to programme staff, and may fail to ‘tell the story’ in ways that are meaningful to the team’s broader understanding of their operating environment. And while they help us to answer the question of whether or not our programme has had an impact, they often cannot explain why that is the case.

We’re working to address some of these challenges. We have revised our sampling criteria to ensure we are selecting larger, more mature projects to increase the relevance to country staff. We are undertaking more qualitative research (or linking up with already-planned qualitative evaluations) to support learning.

We are creating more chances for project teams to engage with and inform the evaluations, to more thoroughly unpack their theory of change and build understanding and ownership of the questions that the reviews are trying to answer – recognising that we need to get this up front, to ensure teams are able and willing to act on the findings.

We’re also doing more to support learning from the reviews – undertaking more follow up research, working with project teams to think through how they might act on recommendations, and at an organisational level working with the relevant thematic advisors to feed learning from the reviews into future programming. And from all the reviews, there is a lot that we can learn about about how to improve programme design and implementation more broadly.

But, at the core, I still worry that there is an inherent tension between organisational accountability and programme learning. Are they compatible/achievable with the same tool (has anyone ever actually killed two birds with one stone?), and if so, how can an organisation get the balance right? And if not, where do we draw the line between these two agendas?

When we (rigorously) measure effectiveness, do we want accountability or learning? Update and dilemmas from an Oxfam experiment.

Comments