‘Gold standard’ evaluation finds positive early gains for Partnership Schools and for Rising.
The evaluation team behind the Randomised Controlled Trial (RCT) of Partnership Schools for Liberia (PSL - okay, that’s enough three letter abbreviations) has just released their midline report. The report covers just the first year of PSL (September 2016-July 2017). A final, endline report covering the full three years of the PSL pilot is due in 2019.
While much anticipated, this is only a midline report with preliminary results from one year of a three year programme. The report therefore strikes a cautious tone and the evaluation team are careful to caveat their results.
Nevertheless, there are important and encouraging early messages for PSL as a whole and for Rising in particular. Put simply, the PSL programme is delivering significant learning gains, and Rising seems to be delivering among the largest gains of any school operator.
For PSL as a whole, the headline result is that PSL schools improved learning outcomes by 60% more than in control schools, or put differently, the equivalent of 0.6 extra years of schooling.
These gains seem to be driven by better management of schools by PSL operators, with longer school days, closer supervision of staff and more on-task teaching resulting in pupils in PSL schools getting about twice as much instructional time as in control schools. PSL schools also benefited from having more money and having better quality teachers, particularly new graduates from the Rural Teacher Training Institutes. But the report is clear that, based on their data and the wider literature, it is the interaction of these additional resources and better management that makes the difference; more resources alone is not enough. (Anecdotally, I would add that our ability to attract these new teachers was at least in part because they had more confidence in how they would be managed, which illustrates the point that new resources and different management are not easily separated.)
The report also looks at how performance varies across the 8 operators that are part of PSL. Even more than the overall findings, the discussion of operator performance is limited by the small samples of students the evaluation team drew from each school. For operators (like Rising) operating only a small number of schools, this means there is considerable uncertainty around the evaluators’ estimates. That said, the evaluation team do their best to offer some insights.
Their core estimate is that compared to its control schools. Rising improved learning outcomes by 0.41 standard deviations or around 1.3 additional years of schooling. This is the highest of any of the operators, though it is important to note the overlapping confidence intervals between several of the higher performing providers.
However, this core estimate is what’s known as an “intent-to-treat” or ITT estimate. It is based on the 5 schools that were originally randomly assigned to Rising. But we only actually ended up working in 4 of those (* see below). The ITT estimate is therefore a composite of results from 4 schools that we operated and 1 school that we never set foot in. A better estimate of our true impact is arguably offered by looking at our impact just on those students in schools we actually ended up working in. This “treatment on the treated” or TOT estimate is considerably higher, with a treatment effect of 0.57 standard deviations or 1.8 extra years of schooling. This, again, is the highest of any operator, and by a considerably larger margin, though again the confidence intervals around the estimate are large.
Whether the ITT or TOT estimate is the more useful depends, in my view, on the policy question you are trying to answer. At the level of the programme as a whole, where the policy question is essentially "what will the overall effect of a programme like this be?”, the ITT estimate seems the more useful because it is fair to assume that some level of ’non-compliance’ will occur and the programme won’t get implemented in all schools. But at the inter-operator level, where the salient policy question is “given that this is going to be a PSL school, what will be the impact of giving this school to operator X rather than operator Y?”, the TOT estimate seems more informative because it is based solely on results in schools where those operators were actually working.
A further complication in comparing across operators is that operators have different sample sizes, pulled from different populations of students across different geographical areas. It cannot be assumed that we are comparing like with like. To correct for this, the evaluators control for observable differences in school and student characteristics (e.g. by using proxies for their income status, geographic remoteness etc), but they also use a fancy statistical technique called 'Bayesian hierarchical modelling'. Essentially, this assumes that because we are part of the same programme in the same country, operator effects are likely to be correlated. It therefore dilutes the experimental estimate for Rising by making it a weighted average of Rising's actual performance and the average performance of all the operators. It turns out that adjusting for baseline characteristics doesn’t make too much difference (particularly for Rising, since our schools were more typical), but this Bayesian adjustment does. It drags Rising back towards the mean for all operators, with the amount we are dragged down larger because our sample size is smaller. We still end up with the first or second largest effect depending on which of the ITT or TOT estimate is used, but by design we are closer to the rest of the pack.
Some reflections on the results
So what do we make of these results?
First of all, we are strongly committed to the highest levels of rigour and transparency about our impact. We had thought that the study wouldn’t be able to say anything specific about Rising at all for technical reasons to do with the research design (for nerdier readers: it was originally designed to detect differences between PSL schools and non-PSL schools, and was under-powered to detect differences among operators within PSL). We're glad the evaluation team were able to find some ways to overcome those limitations.
Second, it is interesting and encouraging that the results largely confirm the strong progress we had been seeing in our internal data. Those data looked promising, but absent a control group to provide a robust counterfactual, it was impossible to know for sure that the progress we were seeing was directly attributable to us. As we said at the time and as the evaluation team note in an appendix to this report, our internal data were for internal management purposes and were never meant to have the same rigour as the RCT. But as it turns out, our internal data and the RCT data are pretty consistent. Our internal data suggested that students had made approximately 3 grades' worth of progress in one academic year; the TOT estimate in the RCT is that they had made approximately 2.8 grades’ worth of progress in one academic year. Needless to say, knowing that we can have a good amount of conviction in what our internal data are telling us is very important from a management point of view.
Third, while making direct comparisons between operators is tricky for the reasons noted above, on any reasonable reading of this evidence Rising emerges as one of the stronger operators, and this result validates the decision by the Ministry of Education to allocate 24 new schools to Rising in Year 2. In both absolute and relative terms, this was one of the larger school allocations and reflected the Ministry’s view that Rising was one of the highest performing PSL operators in Year 1. It is good - not just for us but for the principle of accountability underlying the PSL programme as a whole - that the RCT data confirm the MoE’s positive assessment of Rising’s performance.
Taking up the challenge
I also want to be very clear about the limitations of the data at this stage. It is not just that it’s very early to be saying anything definitive. It’s also that these data do not yet allow Rising, or really any operator, to fully address two of the big challenges that have been posed by critics of PSL.
The first challenge is around cost. As the evaluators point out, different operators spent different amounts of money in Year 1, and all spent more money than would be typically made available in a government school. In the end, judgments about the success of PSL or individual operators within it will need to include some assessment not just of impact but of value for money. PSL can only be fully scaled if it can be shown to be effective and affordable. Rising was one of those operators whose unit costs were relatively high in Year 1. That’s because a big part of our costs is the people and the systems in our central team and with just 5 schools in year 1, we had few economies of scale. These costs should fall precipitously once they start to be shared over a much larger number of schools and students. But that’s a testable hypothesis on which the Ministry can hold us to account. In Year 2, we need to prove to them that we can deliver the same or better results at a significantly lower cost per student.
The second challenge is around representativeness. One criticism that has been aired is that Year 1 schools were the low hanging fruit. As the evaluation makes clear, it is simply not true that Year 1 schools were somehow cushy, but it is true that Year 1 schools were generally in easier to serve, somewhat less disadvantaged communities than the median Liberian school. And that’s precisely why the Ministry of Education insisted that the schools we and other operators will be serving in Year 2 be disproportionately located in the South East of Liberia, where those concerns about unrepresentativeness do not apply. If we can continue to perform well in these more challenging contexts, it will go some way to answering the question of whether PSL can genuinely become part of the solution for the whole of Liberia.
In short, the RCT midline provides a welcome confirmation of what our own data were telling us about the positive impact we are having. Our task for the coming academic year is to show that we can sustain and deepen that impact, in more challenging contexts, and more cost effectively. A big task, but one that we are hugely excited and honoured to be taking on.
A little over a year ago, Education Minister George Werner showed a great deal of political courage not just in launching this programme but in insisting that it be the subject of a ‘gold standard’ experimental evaluation. One year on, and these results show that his vision and conviction is beginning to pay dividends. This report is not the final word on PSL, but the next chapter promises to be even more exciting.
* Footnote: as the evaluators note in their report, the process of randomly assigning schools in summer 2016 was complex, made even more challenging by the huge number of moving pieces for both operators and the Government of Liberia as both endeavoured to meet incredibly tight timescales for opening schools on September 5th. Provisional school allocations changed several times; by August 14th, three weeks before school opening, we still did not know the identity of our fifth school and it was proving very difficult to find a pair of schools near enough to our other schools to be logistically viable. Faced with the choice of dragging the process out any longer and potentially imperilling operational performance or opting to run a fifth school that was not randomly assigned, we agreed with the Ministry on the latter course of action.