Current aid design and evaluation favour autocracies. How do we change that?

I loved the new paper from Rachel Kleinfeld, a Senior Associate at the Carnegie Endowment for International Peace, and asked her to write a post on it

What strategy can make a government take up smart development programs, better policing techniques, or tested education initiatives? RCT and regression-based studies have taught us a great deal about “what works”, but we still know very little about how to get what works into government policy. The problem is whisked under the carpet with the familiar phrase “political will is essential”, which is disturbingly close to: “and then a miracle occurs”.

The development field tends to conflate the “what” and “how” questions (or ignores the latter entirely). The recognition that most development involves politics and policy is on a collision course with the canonization of logic-frame models for program design and the spread of RCTs and regression-based analysis for evaluation.

What’s the difference between the two questions? Traditional science has proven that good hygiene reduces the spread of disease. Smart RCTs and regressions have shown us counterintuitive ways to get people to use better hygiene. But nothing has helped us figure out how to get a government to invest in, for example, sanitation education rather than more photo-op friendly, but less useful, toilet building.

It’s not that traditional studies are not looking at this sort of problem. It’s that they aren’t suited to that type of question. We are building ever-more sophisticated tape measures, but they can’t determine the volume of a pond.

At an abstract level, the issue pits David (Ludwig von Bertalanffy) against Goliath (Sir Isaac Newton). Measurement and design based in Newtonian science relies on testing separable variables, and assumes change progresses at a steady rate in a single direction. But political and policy reform involve complex systems, where variables affect each other rather than being separable, and their interactions can also change the system itself. As Bertalanffy realized, the system is more than the sum of the parts.

For instance, you can deconstruct a watch (a Newtonian system) learn the interactions of each part, and put them together to understand how a watch works. But if you kill a cat (an interdependent system), and test each part of its anatomy, you understand only some of how it behaves in real life. Political and social systems are like cats, not watches.

Reforms that involve policy and politics are punctuated, not steady or linear. Nothing may happen for years – and then a new government causes change to skyrocket. A sudden event may harness a public mood and create an opportunity, or block one. Reformers or their opponents may “create” a sudden event with good pr. There is always an opposition who may disagree with the means or the ends, and the relationship between reformers and opponents is interdependent. The adversary can learn, so a method that works well at one time may flop if reused. Successful reform can galvanize opponents, sowing the seeds for a counter-reform. Implementation is often two steps forward then two or three steps back—and sometimes it moves sideways

When we design and evaluate programs that involve politics and policy based on the standard logic-frame model, regressions, and RCTs, unfortunate things happen.

First, measuring progress at single pre-set times can lead to lauding organizations that caught a wave, while blaming good groups that have the bad luck to have an assessment fall just after a counter-reform. Meanwhile, such measurement says nothing about reform sustainability.

Second, measurements based on achieving a “best practice” can actually obstruct the process of cobbling-together agreements of coalition politics that yield so-so policies that are the best that can be achieved with sustainable local buy-in.

Perhaps most nefariously, these design and evaluation systems inadvertently favor autocracies. After all, the grit that mucks up a steady linear march towards a best practice is politics – the need to satisfy different interest groups who vote or finance parties. These forces can’t be ignored by democratic governments, but they can be squelched by more authoritarian rulers. Thus, many indices that use traditional design and measurement techniques find the Rwandas and Ethiopias of the world doing far better than the messy democracies, ignoring the brittleness of more authoritarian systems.

So how can we design and measure better when reform trajectories look like sailboats, not trains? Complexity, or systems, theory offers a few pointers.

To start, problem driven iterative analysis (PDIA) with hypothesis testing is a better programming model for these dynamic implementation problems than logic frames. Setting firm outcome goals while leaving activities (and, crucially, funding streams) flexible, then constantly testing what works, allows for more impactful work.

But we can do more than just “be flexible”. Development agencies can anticipate the punctuated reform/counter-reform rhythm of policy change by planning in advance for multiple battles.

For instance, programs can focus less on whether the outcome was achieved in a set period, and more on the process of building coalitions of reformers locally who will carry on the fight even after the initial reform has been achieved. Evaluation can rank how strong and sustainable the long-term, broad coalition and/or elite influencer group is, as much as the outcome itself.

Second, instead of ending grants when a reform is achieved, funding should continue to help reform groups stay together. That way, there will be organized constituencies who can fight back when opposition rallies to undermine the initial reform.

Similarly, they can plan for windows of opportunity before they open, by funding preliminary work so that policy ideas and an organized group of vocal, powerful, politically-savvy supporters are ready to go when the moment arises.

Complexity theory also tells us what kinds of outcomes can yield the greatest impact. Programs should aim to alter the “rules of the game” or the incentive systems that shape and constrain a system. Changing a single policy is like throwing a rock in the middle of a stream – it makes a big splash, but the stream returns to normal. Changing incentive systems is like moving rocks along the water’s edge – each one doesn’t make much difference, but move enough rocks and the stream flows in an entirely different direction.

So, for instance, to alter a political outcome, it makes sense to focus on changing who gets to vote, how votes are aggregated for representation, and how campaigns are financed – the structural constraints in all democracies. To affect the culture of a profession, changes to hiring, promotion, and firing standards, or professional accreditation standards and processes will bring about the deepest change.

And in nearly all cases, reforms that enable local people to organize, increase transparency and public voice in policy, reduce violence against reformers, and increase avenues to power can yield systemic change that allow reformers to carry on after outside funding ceases.

Political and policy reform is not just complicated – it is a fundamentally different kind of problem than those tackled by our current design and evaluation processes. But we can do better than just leaving it to the gods of “political will”. Now the challenge is to reform ourselves.

Current aid design and evaluation favour autocracies. How do we change that?

Comments