In September 2017, I headed up to the Oxfam head office in Oxford to present our research paper: Big Data Opportunities for Oxfam – Text Analytics. Like all good research titles, it’s a mouthful. The paper explored the potential application of text analytics in response to Oxfam’s call for proposals on how big data tools could aid the organisation. Given that we use text analytics software, daily, it seemed a good fit. I began with a primer on text analytics and why it is useful.
Data lives on a spectrum between structured and unstructured. Structured data is what it sounds like – basically anything you can neatly arrange in a spreadsheet. Unstructured data, is much harder to analyse and comes in the form of social media posts, pictures and archived documents, to name a few.
The general rule of thumb, is that most of the data organisations have access to is unstructured. For example, research indicates that 80% of the data the US government has access to is in unstructured text, more commonly known as normal human language. But unstructured text poses problems for humans and machines alike; for the lowly analyst, the sheer volume of data can be daunting, and for man’s intellectual steed – the computer – the unstructured nature of the text is difficult to decipher. In development, we are making strides toward data-driven decision making, so it makes sense to have tools that will help you analyse more than 20% of your data.
But analysis for its own sake is pointless, so our paper focused on the use of text analytics software to solve the two most prevalent problems we have observed while helping other organisations use their text data.
Problem 1: People don’t tap into the organisation’s collective wisdom. Put simply, we generally start planning projects as if no-one within the organisation has ever worked on something similar. We look outside for useful information and signals, instead of starting from the existing internal knowledge. But we ignore institutional wisdom at our peril – we end up repeating mistakes that would have been easily avoided if we had learnt from the experience of others. I watched a large project fail because of something that we could have known and avoided – it is a crushing feeling.
There are many reasons we don’t use our institutional wisdom. The ODI cites, amongst other factors, an incentives system that rewards staff for new ideas; leading to a neglect of lessons previously learned. I once worked with an organisation whose internal database was so difficult to search that no one ever used it, except to deposit yet more project documents that no one would ever read. Even if they were able to find the relevant documents, reading them all to synthesize the key lessons would be a monumental task. This is especially true in large organisations like Oxfam, which has offices around the world. For this problem, we introduced a text analytics solution that makes it easier to search through internal documents.
Problem 2: Organisations struggle to study their own behaviour. Put more simply, figuring out what’s working and what isn’t, particularly in large organisations, is extremely challenging. The directors and managers of development organisations should be able to ask (and get answers to) questions like “What is working well across our WASH initiatives?” The answers inform decisions on where resources should be allocated.
Considering the mountain of paperwork our sector produces in the form of proposals, project plans, project completion reports and monitoring, evaluation, accountability and learning documents – such data should exist. The problem here, as it often is, is a resource one. Analyzing all these documents would require a team of researchers that most organisations could not afford. One of the outcomes of technology should be to allow us to do more with less. With that in mind, I presented to the Oxfam staff a text analytics powered tool that, with some customization, would allow analysis of a large corpus of documents to get to those answers.
Examples of text analytics in development
Using text analytics to create new knowledge is not limited to the analysis of documents. The UN Global Pulse Lab in Kampala created a toolkit that makes talk radio broadcasts machine-readable through the use of speech recognition technology and translation tools that transform radio content into text. Once the conversation is in text, one can use the tool to explore relevant public conversations on topics of interest, in this case:
- Early warning and influx of refugees in Uganda
- Recording losses associated with local disasters
- Local Governance and public health service delivery
On the first topic, the lab was able to uncover the main topics of conversation related to refugees over the month of analysis; and emerging vulnerabilities related to refugees later in the year.
The goal of the project is to support the Government of Uganda and development partners in incorporating the voices of Ugandan citizens into the development process. With an estimated 7.5 million words spoken on Ugandan radio every day, this kind of analysis would be impossible without text analytics. For those interested in other tools for working with qualititative data in development, I highly recommend this overview.
I concluded the presentation with recommendations on how to choose the right analytics solution:
- Focus on the highest value opportunities
This is the best way to get buy-in from all the people whose work you are going to disrupt, because they can see the value of what you are proposing.
- Start with questions, not data or solutions
Figure out what the problem or what the question is, then try to assess whether or not you have the data to answer that question, and only then should you pick an appropriate solution.
- Run short, inexpensive experiments
The most important thing is to quickly validate, using the organisation’s own experiments – which solutions can work.
In my opinion, data is a fancy word for history. The more tools we have to study, understand and learn from our history the better.
We would love to hear from other organisations on how they are using data to improve their project work.