When will they ever learn?

Originally posted on Medium here.

In many parts of the world, we have created a learning crisis: more kids are in school, but they are not learning. “We are failing the children on a massive scale,” says celebrated development economist Esther Duflo, John Bates Clark Medal winner and author of Poor Economics. “There has been improvement in enrolment and in the physical capacity of schools. But learning is not about enrolment, teacher-student ratio, having latrines in school; it’s about if we are serious about learning.”

In September, world leaders will get together and agree that improving the quality of education should be one of the Sustainable Development Goals that replace the MDGs. Something must be done, they will say. But what?

It would be nice to think that an answer to that question might be found in a fascinating recent publication from the World Bank. In What Really Works To Improve Learning in Developing Countries?, authors David Evans and Anna Popova synthesise the results of six systematic reviews covering 227 individual studies — a kind of meta-meta-analysis.

But while the paper is short, clear and well worth a read, anyone hoping to find the silver bullet will search in vain. Partly this reflects the relative thinness of the underlying evidence base, at least in comparison to developed country settings. Compare the 6 systematic reviews and 227 studies examined here to the 800 reviews and 50,000+ studies analysed by John Hattie for his seminal Visible Learning books on the relative effectiveness of educational interventions in OECD countries.

Another challenge, note Evans and Popova, is that these 6 systematic reviews are not very consistent in their approach. Only 3 studies appear in all 6 reviews, and three-quarters of the studies only crop up in one review. This makes separating the signal from the noise a little difficult, because it is not clear how much of the difference in the conclusions drawn by the different reviews is down to which studies they chose to include.

What works and what doesn’t

That said, the paper still contains a number of interesting and practicable nuggets. For example, it is fairly definitive on what, according to the literature, doesn’t work. It turns out that deworming and other health interventions like nutritional supplements help to get kids into school but don’t help them learn. They might still be worth doing and be highly cost effective, just not for the reasons we used to think. (There’s an analogy here with the exhaustive recent research on micro-credit, which finds that it does some good, just not in the ways or on the scale that its supporters had claimed.)

Controversially, but in some sense hardly surprisingly, lowering the cost of education by removing fees or giving grants also helps enrollment but without improving learning outcomes.

The findings on what does work seem to me a bit less useful — or at least, a bit less like a roadmap and more like a guy on a street corner pointing down the street and saying “I think it’s thataway.”

The basic answer is: good teaching. More specifically, three types of interventions seem to be effective, although in practice they are likely to be highly interlinked.

The first is about pedagogical innovations that address the challenge of differentiation: tailoring teaching to the needs of different learners, whether by grouping students of similar ability levels, using computer-aided learning to allow students to move at their own pace, coaching teachers on how to adjust their strategies for different learners, or improving diagnostic information about a child’s readiness for different types of material.

The second is targeted training and coaching to build specific skills and practices. Ongoing, on-the-job support is more useful than one-off training programmes; specific tools and techniques are more useful than general guidelines.

Third, holding teachers to account, either through performance incentives or, in the case of contract teachers, through a greater ability to sack failing teachers, is effective, albeit with the usual caveats about the perverse incentives that arise from applying too much pressure to a single performance measure.

Is this as good as it gets?

I say a bit less useful for two reasons.

First, my initial reaction was to wonder whether any of this was very new. I am puzzled by the firewall that often seems to exist between education research in developed versus developing countries. The overwhelming conclusion from the former is that, in Michael Barber’s phrase, “The quality of an education system cannot exceed the quality of its teachers.” Says Harvard’s Bob Schwartz: “What is the most important school-related factor in student learning? The answer is teachers.” So to read in the Evans and Popova paper that, “pedagogical interventions…are more effective at improving student learning than all other types of interventions combined” is not particularly surprising. Why should these systems be any different?

But second, what’s also striking is that the effect sizes discussed in these reviews seem pretty small by comparison to what we see in the literature from the developed world. In his work, Hattie sees an effect size of 0.4 as a kind of cut-off point: interventions scoring above that likely mean students are visibly learning more than they otherwise would; below that, it is hard to know whether the intervention is doing any good since even “a student left to work on his own, with the laziest supply teacher, would be likely to show improvement over a year.”

By contrast, in the World Bank paper the largest reported effect size — for ‘adaptive instruction’ —is 0.48. Several of the interventions described in the paper as being among the most effective have effect sizes in the range o.10–0.15, levels which would see them languishing near the bottom of Hattie’s league table, and for good reason.

To illustrate, suppose a sample of JSS1/Grade 7 (~12–13 year old) students in Sierra Leone is tested using the word reading component of the British Ability Scales (BAS). As part of this assessment, they are required to read out a list of 90 words that increase in complexity and difficulty, and from the number and difficulty of the words which they read correctly an inference can be drawn about their reading age based on UK norms. At baseline, students read an average of 40 words correctly, implying a reading age of about 7 years 7 months, with a standard deviation of 18 words. Teachers at the school then participate in a teacher training programme for a new phonics programme that has been shown to have an effect size of 0.12 (which is what researcher Patrick McEwan found, in one of the systematic reviews being synthesised by the World Bank, was typical for teacher training interventions). An effect size of 0.12 means that, one year later (which is about the average ‘treatment exposure’ in these studies), students will be reading a grand total of an additional 2 words correctly, which for the BAS corresponds to an improvement in reading age of about one extra month, over and above what a comparison group ‘learning as usual’ would have achieved. Now, an extra month is better than nothing. But the problem is that if the underlying ‘learning as usual’ rate of progress for these students is anything like as low in the next six years of their school career as it was in the first six, they will reach the end of senior secondary school having only developed a reading age equivalent to a British 9.5 year old. At that point, the fact that these students have accelerated relative to their peers (who will have a reading age six months lower), will seem much less pertinent than the fact that in both groups, students are expected to sit their terminal exams for secondary school, prepare for college or enter the workforce with the reading age of a 4th grader. And to repeat, such an intervention would qualify as one of the more successful to have been documented in this paper.

The point I’m trying to make with this example is that given the depth of the learning crisis in many places, we need interventions with much larger effect sizes than this if we want students to even begin to start making up the ground they have lost relative to their peers in other settings. I am hardly the first person to note that, in developing countries, learning curves — or what Lant Pritchett calls learning profiles — are ‘too darn flat’. But it’s extremely depressing that, even on our best days, we’re barely bending these learning curves upward.

What’s to be done?

So what does this mean for researchers and practitioners?

First, I’d like to see us bank the insight that good teaching is what really matters, and start focusing on how we get more of it. It is surely the case that what you should focus on to get more good teaching will depend on your starting point, and in particular whether you are trying to move from awful to okay or from okay to great. But can we at least agree that the problem of not enough good teaching is the right starting point? Arguments about how much money needs to go into education, or about the proper role of the private sector, look like a distraction from this central question. As McKinsey&Co noted in a 2010 report:

There is too little focus on “process” in the debate today. Improving system performance ultimately comes down to improving the learning experience of students in their classrooms. The public debate…often centers on structure and resource due to their stakeholder implications. However, we find that the vast majority of interventions made by the improving systems in our sample are “process” in nature; and, within this area, improving systems generally spend more of their activity on improving how instruction is delivered than on changing the content of what is delivered.

Second, the devil is really in the detail. As Evans and Popova acknowledge, “examining the specific programs is crucial” to draw out the right lessons, and understand why one programme failed when another worked. (Full disclosure: I’m unlikely to get round to this, and irritating academic publishers won’t let me even if I wanted to, which is why I count on clever World Bank policy analysts and think-tank researchers to produce pithy summaries for me.)

If we are going to find a way to usefully generalise from all these individual studies, it feels like there are a couple of areas where we need more, or at least a different kind, of detail.

One is about specificity. As an end user of research, what I want to understand is which specific changes in teaching and learning practices offer the most bang for buck.

Take something like Assessment for Learning. Almost 20 years ago, Professors Paul Black and Dylan Wiliam wrote their seminal Inside The Black Box. It led to a revolution in the use of formative assessment, but it started from the simple proposition that if we want to understand how students are (and are not) learning we need to understand specific things about how teachers are teaching. As Black and Wiliam showed, it’s perfectly possible to observe, measure and then train for these practices — indeed, when I ask my students what differences they notice between our lessons and lessons in their previous schools, one of the things they talk about most often is the way in which my teachers check for understanding. “At my old school,” Michael, 13, told me recently, “they did not teach us to let us understand.” But is a huge RCT really the best way to get traction on the value of a specific change in practice like this?

Or take something like feedback. John Hattie finds that “the most powerful single influence enhancing achievement is feedback”. But, crucially, it is the quality not the quantity of the feedback that matters. Once again, the devil is in the detail. The image on the left, an extract of a report card one of my students was given by their previous school, illustrates the point. A massive amount of feedback, none of it useful, and much of it potentially harmful.

The point is not that we shouldn’t try to measure these things. On the contrary, pinpointing which specific strategies make the most difference is exactly what we should be doing. But it feels like that requires a more fine-grained approach than some of these programme and evaluation designs seem to allow.

The other aspect is conditionality. As Evans and Popova note in a number of places, the impact of any intervention is likely to be highly conditional on other factors. Imagine two schools. In one school, there is a high trust atmosphere between the headteacher and her staff. Classroom doors are routinely kept open to allow for frequent lesson observation and feedback. The headteacher provides ongoing challenge and support to her staff using a clear performance management framework. Regular assessment creates a data-rich environment in which teachers know how well their students are doing. Both pay and progression are directly and visibly linked to performance. In the other school, none of these conditions hold. If teachers in both schools participate in the same teacher training programme, do we really think it will have the same effect? No, of course not. But then are we really measuring the programme, or other characteristics of the school?

I am reminded of the often painful discussions about whether aid ‘works’. As Lee Crawfurd put it, “next time you hear ‘does aid work?’ think ‘does policy work?’. It’s a silly question, and obvious when you put it like that.” A better question, Andy Sumner suggests, is when does it work: under what conditions.

By the same token, we need a much better sense of what the conditions are under which different educational interventions are likely to be more or less effective. For example, Evans and Popova note that One Laptop Per Child failed in Peru because potentially useful technology was distributed without any accompanying training or support. Other examples of this conditionality abound. The distribution of textbooks to Sierra Leonean primary schools failed to raise student achievement because headteachers did not believe they would ever see so many books ever again, and so decided that instead of distributing them they would to try to ration them to make them last longer.

What about the leaders?

I’m not an education researcher, but one of these conditional factors that crops up in the debate on developed country systems a lot and which seems to be under-explored in the developing country literature is leadership. Although it is not straightforward to measure, because its effect is primarily indirect and mediated through other channels that are very important in their own right (like the quality of the teachers who they hire and fire, and how they train them), we know in developed country settings that school leadership matters a lot. As Michael Barber argues:

The quality of leadership at school level is really critical. The head is like the conductor of an orchestra in a way, the person who is pulling together all the different human and other resources in a way that enables the whole school to achieve that ethical underpinning as well as the standards that are expected and makes sure that each individual person, each child, is properly valued. So the quality of school leadership is fundamental and you see it again and again in the system research.

The stuff I’ve seen on the impact of school leadership in developing country settings is interesting. A multi-country study last year by Nicholas Bloom and his colleagues found “robust evidence that management practices vary significantly across and within countries and are strongly linked to pupil outcomes.” Intriguingly, the study suggests that school management may be particularly important in less developed settings: the authors estimate the impact of their management variable to be greater in India than in some of the OECD countries in their sample.

A better understanding of the role and importance of school leadership and management might even help us move past the sterile public vs private debate. For all their disagreements, advocates of low cost private schools and some of their fiercest critics in the NGO world both share an optimistic view of the power of bottom-up pressure to hold school leaders to account and force them to lead school improvement. For the former, it is the market power of parent-consumers who will vote with their feet and move their children to other schools if they are not satisfied with performance. For the latter, community governance arrangements, citizen audits, and ASER-style independent learning assessments empower citizens to use their voice more effectively. But in both cases, bottom-up pressure is seen as a critical factor in keeping school leaders honest.

The question is: is it enough? David Booth of the Overseas Development Institute has argued that “public service improvement comes when there is successful action to improve provider motivations. And this doesn’t come mainly from the bottom-up.” I have some sympathy for that view. To give a recent example, some of our parents recently learned for the first time their child’s score in last year’s National Primary School Exam (getting results was delayed by the Ebola crisis). But in a number of cases where the child had done less well than expected, their response was not to blame the primary school for failing to equip their child with the knowledge and skills they needed to do well, but to blame the child and decide they needed to resit the year at the same school.

Don’t get me wrong: I strongly support the right of citizens to exercise both choice and voice. But my hunch is that in these settings or, to use McKinsey’s phrase, in countries at this ‘stage in their improvement journey’, you also need strong top-down pressure to perform. Figuring out how we build organisations — in the public, private and charitable sector — with the right scale and the right capabilities to apply that kind of pressure effectively seems like a big question for the future.

Without strong school management to drive and champion them, the prospects for the sorts of classroom interventions Evans and Popova find effective look bleak. With it, we might finally start to bend those learning curves upwards a bit faster.