After 6 years and 100+ impact evaluations: what have we learned?

December 12, 2017

     By Duncan Green     

Longer projects don’t generate better results; women’s economic empowerment doesn’t seem to shift power imbalances in the home. Just two intriguing findings from new ‘metanalyses’ of Oxfam’s work on the ground. Head of Programme Quality, impact evaluation champion and all-round ubergeek Claire Hutchings explains.

On this blog in 2011 we first shared our approach to ‘demonstrating effectiveness without bankrupting our NGO.’  A lot has happened in the last six years and we’ve hit a milestone this year.  After 100 effectiveness reviews, we finally have enough impact evaluations to help us ask and credibly answer important questions across a portfolio of programme work.  Or for the geeks amongst us: we’ve got meta-analyses!

Over the past few weeks be have begun publishing five separate meta-analyses (or three meta-analyses and two meta-syntheses for the pedants amongst us) on the Policy & Practice site, looking at evaluations of women’s empowerment, resilience, livelihoods and policy & governance interventions; as well as across our accountability reviews.  Each will be accompanied by a blog on REALGEEK, diving into more detail on methodology and findings.  Subscribe here if you don’t want to miss out, or read the first blog on women’s empowerment now.

I struggled to pull out the key insights to share (there are so many) – so I asked Karl Hughes, the original achitect of Oxfam GB’s Global Performance Framework (now Head of ICRAF’s Impact Unit,, what he’d most want to know, and then tried to answer:

What are they telling us about Oxfam GB’s overall effectiveness? 

While the findings from the individual studies were mixed, reassuringly they confirm an overall positive impact across all outcome areas.   We looked to benchmark results against other organizations and were sometimes challenged by a lack of data for comparable interventions, or our bespoke measurement approach.  That said:

In Sustainable Livelihoods programmes, we saw an average increase in incomes of 6.6%. This is good – though benchmarking isn’t totally straightforward, it is comparable with results from published independent reviews (for example, 6 recent RCTs of Ultra-poor “graduation” projects, or this systematic review of Agricultural certification schemes). For Women’s Empowerment and Resilience, where we use bespoke multi-dimensional indices that we’ve developed to conceptualise and measure these traditionally hard to measure outcomes, it’s harder to find external benchmarks.  I’ve been warned (by Duncan, natch) against using technical terms like ‘standard deviation’, and despite best efforts, I’ve struggled to find ways to translate what we’ve found into ‘plain English’ (for a more technical intro to meta-analyses, you might want to check out ‘5 Key Things to Know About Meta-Analysis‘).  Perhaps suffice to say that in both studies we see positive and significant impact.  For women’s empowerment it is broadly in line with the literature on the impact of micro credit or self-help groups.

I’m by nature very cautious about making big claims – but I think the results reassure us that our programmes are having a positive effect on the things they’re trying to influence.  This is the single most important recurring question that our staff, partners and supporters have – to know whether we’re making a difference – and for the first time we’re able to answer this across our portfolio of programme work, not just for individual projects and programmes.

How have all these studies contributed to learning and improving Oxfam’s practice? 

We have some great examples of how individual Effectiveness Reviews have contributed to adaptations in programme design – both our own programmes, but also a handful of powerful examples where the evidence they generate is being taking up by others.  But Oxfam is limited in how much we can simply ‘scale lessons across’ a portfolio Because, rather than testing and adapting single intervention models, we work with partners, communities and individuals to design bespoke programming, grounded in a rights-based approach.  Of course, there are transferrable lessons, but many useful insights are very context-specific.  And we still suffer from knowledge management challenges.  Documenting and storing information in accessible ways, and getting the incentives right for uptake and use are huge challenges in INGOs and beyond.  But that’s a whole other blog post.

How much of the variation in results is down to different programming approaches and/or different contexts?

Perhaps most interestingly, some of our major assumptions aren’t born out in the results.  For example, livelihoods and resilience meta-analyses found no evidence of a relationship between the duration of the projects evaluated and impact.  This is surprising – as we might well assume that longer-term interventions will have better results.  A likely explanation is that the duration of a project is closely related to the types of interventions carried out.  For resilience for example, many of the shorter-term projects were focused on reducing vulnerability to natural hazards, whereas longer-term projects generally had more emphasis on strengthening the resilience of livelihoods activities, a longer-term endeavour for which the outcomes are less certain.

Similarly, our women’s empowerment meta-analysis confirms that increased access to income and/ or changes in the perceived role of women in the economy of their household or communities is not sufficient to change power dynamics within the household.  Assumptions that there would be a spillover effect, and that this kind of work would de facto lead to change power dynamics within the household do not hold.  If we want to support women to shift intra-household power dynamics, we need to target this explicitly.

We do see some regional variation, particularly in the meta-analysis of resilience interventions, and are exploring whether this is down to the nature of the shocks, differences in programming approaches or the measurement/ evaluation process itself.

This is a lot to digest – and we’ve only scratched the surface of the trends and patterns these studies are allowing us to pick up. We hope you’ll join us as we dive into the individual studies in a little more depth.

And a reminder that our Impact Evaluation Team is always looking to learn and improve our practice in a range of areas.  We’re continuing to build measurement approaches for hard-to-measure outcomes, look for ways to strengthen both rigour and appropriateness of impact evaluation designs – and much more.  Comments, constructive criticism, advice are always welcome – starting off in the comments section below, or reach out to me ( or Simone Lombardini ( who heads up our impact evaluation team.