evaluation

Evaluation results show strong gains - especially for girls

Isatu’s story

Schools are supposed to be a safe space for learning, but for Isatu Kabba, 14, that’s not how it felt at her old primary school. “There was no proper monitoring of who comes in and out of the school,” she recalls. “There was a lot of noise, a lot of bad language, bad behaviour from some of the boys, and less concentration.”

That’s changed since she enrolled for secondary school at one of Rising’s schools in Sierra Leone. She says she feels safer in the school environment, and more motivated because of the encouragement and quality teaching she gets from school staff. In last year’s public exams, she received one of the highest marks of any Rising student.

As a first-generation learner - neither her mother, a petty trader, nor her father, currently out of work, ever had the opportunity to go to school - Isatu is passionate about her education. She has continued on to the senior secondary level, focusing on the science stream. When she finishes, she wants to study medicine at university and dreams of becoming a doctor.

The experience of girls like Isatu, and just how different it is from the experience of girls in other schools, is one of the highlights of the comprehensive final report of a three year impact evaluation of Rising’s work in Sierra Leone. The study, by Dr David Johnson and PT Jenny Hsieh of Oxford University, finds that girls in Rising schools make faster progress than the boys, and 2-4 times the progress of girls in comparison schools.

As ever, if you want to jump straight to the study, it’s here. We’ve also put together this factsheet with the headline findings. For more of the background to the study and our reactions to it, read on.

Background to the study

I’ve written about the background to the study previously (here), but to briefly recap:

  • The study assessed the learning gains of Rising junior secondary students in reading and maths from January 2016-May/June 2018. 

  • These learning gains were benchmarked against the progress of comparison groups of students attending comparable private schools and government schools in the same neighbourhoods. 

  • The incoming ability levels of these students at baseline were approximately similar.

  • This period of 2.5 calendar years actually represents 3 academic years: the September 2015-July 2016 academic year was truncated as a result of the Ebola epidemic. 

  • The study used an innovative computer-adaptive testing software to estimate the progress made by individual students more precisely than is usually the case with a paper and pencil test.

  • In addition to cognitive measures, the study also explored progress in non-cognitive domains.

Figure 1. How the evaluation defines impact

Figure 1. How the evaluation defines impact

There are some limitations to the study:

  • It is a quasi-experimental not a fully experimental research design. Although the specific students sampled in each school were randomly selected, the schools themselves were purposively sampled. This creates the risk of sample selection bias, where the outcomes achieved by different schools are caused in part by differences in the backgrounds of the students themselves. 

  • This is dealt with in part by checking that students’ prior achievement was approximately similar, and then focusing on progress from that baseline i.e. comparing learning gains (differences-in-differences) rather learning levels. But there are differences in the socio-economic status of students (SES) in the study, particularly between students at private schools (both Rising and other private schools) and their peers in government schools. The Oxford team plans to do more work to explore the impact of these differences.

  • The research team also experienced significant sample attrition (that is, students surveyed at baseline not presenting for follow-up assessments) across all three school types. Again, this creates a risk of outcomes being shaped by differences in the composition of students presenting for assessments rather than because of anything to do with the schools themselves.

  • The team have dealt with this mainly by reporting average scores for three sub-groups of students:

    a. All students presenting on that particular test (irrespective of whether or not they presented on any other test)

    b. All students who presented at the baseline, each end of year test, and the endline 

    c. All students who presented at the baseline and endline

  • Group (a) is not useful for making comparisons over time as the students taking a test in one period may not be the same students taking that test in the next period. Group (b) is in some ways the most interesting as it allows us to see the shape of their learning trajectories over time. But the sample that took all 4 of these assessments is very small. Group (c) is larger, allows for a fair comparison of progress over time and is therefore the best group to focus on. While the pattern is broadly the same whether you look at Group (b) or Group (c), there are important differences in the results which is why it is helpful for the team to present both.

The baseline reports, and Year 1 and Year 2 midline reports, are all on our website if you want to check them out.

Headline findings

The key findings largely echo what the two midline reports found: RAN students make significantly larger gains than their peers in other schools, and these gains are more equitably distributed.

Finding 1. RAN students make significantly more progress than students in either comparison group in reading. 

  • RAN students make 48% more progress in reading than the private comparison students. That is, RAN students learn as much in 1 year of schooling as their peers learn in 1.5-1.75 years. For those who care about such things, this is an effect size of 0.41.

  • RAN students make 160% more progress in reading than the government comparison students. That is, RAN students learn as much in 1 year of schooling as their peers learn in 2.5 years. This is an effect size of 0.77.

Finding 2. RAN students make significantly more progress than students in either comparison group in maths. 

  • RAN students make 120% more progress in maths than the private comparison students. That is, RAN students learn as much in 1 year of schooling as their peers learn in 2.2 years. This is an effect size of 0.42.

  • RAN students make 133% more progress than the government comparison students. That is, RAN students learn as much in 1 year of schooling as their peers learn in 2.3 years. This is an effect size of 0.46.

Finding 3. Girls do much better in RAN schools than in comparison schools.

  • In all school types, girls start behind boys at baseline. In government and private schools, they fall further behind every year - in fact, girls’ progress is less than half of the progress of boys. By contrast, in RAN schools girls actually progress faster than boys, and much faster (2-4x) than girls in either comparison group.

  • Indeed, a lot of the difference in overall learning gains between Rising and non-Rising schools seems to be driven by the differential performance of girls.

Figure 2. Reading gains by gender

Figure 2. Reading gains by gender

Figure 3. Maths gains by gender

Figure 3. Maths gains by gender

Finding 4. RAN schools also do a better job of improving performance of students with lower prior achievement. 

  • The Oxford team set thresholds to divide the sample into four ability bands.

  • In reading 74% of RAN students who started in the bottom two ability bands at baseline had progressed to a higher band by the endline, compared to 48% and 47% in private and government comparison schools respectively. 

  • In maths, 65% of of RAN students who started in the bottom two ability bands at baseline progressed to a higher band by the endline, compared to 34% and 39% in private and government comparison schools.

Reactions and reflections

First and foremost we’re grateful to the Oxford team for their work on this project over the last few years, and to the staff and students of the comparison schools for taking part.

We’re particularly delighted by the progress of our girls. Globally, the conversation about girls’ education can, at times, fall into one of two traps. The first is to get stuck at the why. As Echidna Giving argues. “it’s time to move from asking ‘Why serve girls’ to asking ‘How?’” The second is to talk about educating girls as if they are a different species. Yet as Dave Evans and Fei Yuan have recently argued, “one of the best ways to help girls overcome the learning crisis may be to improve the quality of school for all children.” A lot of the most effective interventions for improving outcomes for girls, they find, don’t specifically target girls. This study perhaps lends some weight to that view. Get the basics right for all students, and girls will flourish.  

I’m very proud of the hard work of our teachers, school leaders and central team in achieving these results. I heard a talk from an impact evaluation expert the other day who made the argument that most organisations try to do evaluations too quickly, before they have worked out all the kinks and settled on a stable model. I confess we’ve probably been a little guilty of that, launching this evaluation when we were less than 2 years old and when our schools had only been up and running for 6 months. But it also says a lot about our team that they welcomed this kind of rigorous scrutiny of their work when there was so much we were still figuring out. 

That relates to a third point. To be honest, the very idea of a stable model isn’t one that particularly resonates with us. Yes, I think we do have a more defined model today than we did three years ago. But whether it’s our curriculum or our school oversight systems or our approaches to teacher coaching, we’re always looking for ways to innovate and improve. 

The results themselves show why this is so important. A mantra at Rising is that “however well we do, we always strive to do better.” Compared to other schools, the learning gains here are impressive. It’s particularly striking that they are achieved by enabling more disadvantaged groups of learners - girls, and those with lower prior achievement - to progress much faster than is the case in other schools. 

But in absolute terms, our students still lag behind where they should be. Their learning trajectories are much steeper than their peers in other schools, but there’s so much more we need to do to keep bending them upwards even further.

So, to repeat something else we like to say round here, “our first draft is never our final draft.” Here’s to the next draft.

Positive early gains for Partnership Schools and Rising

‘Gold standard’ evaluation finds positive early gains for Partnership Schools and for Rising.

The evaluation team behind the Randomised Controlled Trial (RCT) of Partnership Schools for Liberia (PSL - okay, that’s enough three letter abbreviations) has just released their midline report. The report covers just the first year of PSL (September 2016-July 2017). A final, endline report covering the full three years of the PSL pilot is due in 2019.

While much anticipated, this is only a midline report with preliminary results from one year of a three year programme. The report therefore strikes a cautious tone and the evaluation team are careful to caveat their results. 

Nevertheless, there are important and encouraging early messages for PSL as a whole and for Rising in particular. Put simply, the PSL programme is delivering significant learning gains, and Rising seems to be delivering among the largest gains of any school operator.

For PSL as a whole, the headline result is that PSL schools improved learning outcomes by 60% more than in control schools, or put differently, the equivalent of 0.6 extra years of schooling.

These gains seem to be driven by better management of schools by PSL operators, with longer school days, closer supervision of staff and more on-task teaching resulting in pupils in PSL schools getting about twice as much instructional time as in control schools. PSL schools also benefited from having more money and having better quality teachers, particularly new graduates from the Rural Teacher Training Institutes. But the report is clear that, based on their data and the wider literature, it is the interaction of these additional resources and better management that makes the difference; more resources alone is not enough. (Anecdotally, I would add that our ability to attract these new teachers was at least in part because they had more confidence in how they would be managed, which illustrates the point that new resources and different management are not easily separated.)  

Rising Results

The report also looks at how performance varies across the 8 operators that are part of PSL. Even more than the overall findings, the discussion of operator performance is limited by the small samples of students the evaluation team drew from each school. For operators (like Rising) operating only a small number of schools, this means there is considerable uncertainty around the evaluators’ estimates. That said, the evaluation team do their best to offer some insights.

Their core estimate is that compared to its control schools. Rising improved learning outcomes by 0.41 standard deviations or around 1.3 additional years of schooling. This is the highest of any of the operators, though it is important to note the overlapping confidence intervals between several of the higher performing providers.

RCT - ITT estimate chart.jpg

However, this core estimate is what’s known as an “intent-to-treat” or ITT estimate. It is based on the 5 schools that were originally randomly assigned to Rising. But we only actually ended up working in 4 of those (* see below). The ITT estimate is therefore a composite of results from 4 schools that we operated and 1 school that we never set foot in. A better estimate of our true impact is arguably offered by looking at our impact just on those students in schools we actually ended up working in. This “treatment on the treated” or TOT estimate is considerably higher, with a treatment effect of 0.57 standard deviations or 1.8 extra years of schooling. This, again, is the highest of any operator, and by a considerably larger margin, though again the confidence intervals around the estimate are large.

RCT - ToT estimate chart.jpg

Whether the ITT or TOT estimate is the more useful depends, in my view, on the policy question you are trying to answer.  At the level of the programme as a whole, where the policy question is essentially "what will the overall effect of a programme like this be?”,  the ITT estimate seems the more useful because it is fair to assume that some level of ’non-compliance’ will occur and the programme won’t get implemented in all schools. But at the inter-operator level, where the salient policy question is “given that this is going to be a PSL school, what will be the impact of giving this school to operator X rather than operator Y?”, the TOT estimate seems more informative because it is based solely on results in schools where those operators were actually working. 

A further complication in comparing across operators is that operators have different sample sizes, pulled from different populations of students across different geographical areas. It cannot be assumed that we are comparing like with like. To correct for this, the evaluators control for observable differences in school and student characteristics (e.g. by using proxies for their income status, geographic remoteness etc), but they also use a fancy statistical technique called 'Bayesian hierarchical modelling'. Essentially, this assumes that because we are part of the same programme in the same country, operator effects are likely to be correlated. It therefore dilutes the experimental estimate for Rising by making it a weighted average of Rising's actual performance and the average performance of all the operators. It turns out that adjusting for baseline characteristics doesn’t make too much difference (particularly for Rising, since our schools were more typical), but this Bayesian adjustment does. It drags Rising back towards the mean for all operators, with the amount we are dragged down larger because our sample size is smaller. We still end up with the first or second largest effect depending on which of the ITT or TOT estimate is used, but by design we are closer to the rest of the pack.

Some reflections on the results

So what do we make of these results?

First of all, we are strongly committed to the highest levels of rigour and transparency about our impact. We had thought that the study wouldn’t be able to say anything specific about Rising at all for technical reasons to do with the research design (for nerdier readers: it was originally designed to detect differences between PSL schools and non-PSL schools, and was under-powered to detect differences among operators within PSL). We're glad the evaluation team were able to find some ways to overcome those limitations.

Second, it is interesting and encouraging that the results largely confirm the strong progress we had been seeing in our internal data. Those data looked promising, but absent a control group to provide a robust counterfactual, it was impossible to know for sure that the progress we were seeing was directly attributable to us. As we said at the time and as the evaluation team note in an appendix to this report, our internal data were for internal management purposes and were never meant to have the same rigour as the RCT. But as it turns out, our internal data and the RCT data are pretty consistent. Our internal data suggested that students had made approximately 3 grades' worth of progress in one academic year; the TOT estimate in the RCT is that they had made approximately 2.8 grades’ worth of progress in one academic year. Needless to say, knowing that we can have a good amount of conviction in what our internal data are telling us is very important from a management point of view.

Third, while making direct comparisons between operators is tricky for the reasons noted above, on any reasonable reading of this evidence Rising emerges as one of the stronger operators, and this result validates the decision by the Ministry of Education to allocate 24 new schools to Rising in Year 2. In both absolute and relative terms, this was one of the larger school allocations and reflected the Ministry’s view that Rising was one of the highest performing PSL operators in Year 1. It is good - not just for us but for the principle of accountability underlying the PSL programme as a whole - that the RCT data confirm the MoE’s positive assessment of Rising’s performance.

Taking up the challenge

I also want to be very clear about the limitations of the data at this stage. It is not just that it’s very early to be saying anything definitive. It’s also that these data do not yet allow Rising, or really any operator, to fully address two of the big challenges that have been posed by critics of PSL.

The first challenge is around cost. As the evaluators point out, different operators spent different amounts of money in Year 1, and all spent more money than would be typically made available in a government school. In the end, judgments about the success of PSL or individual operators within it will need to include some assessment not just of impact but of value for money. PSL can only be fully scaled if it can be shown to be effective and affordable. Rising was one of those operators whose unit costs were relatively high in Year 1. That’s because a big part of our costs is the people and the systems in our central team and with just 5 schools in year 1, we had few economies of scale. These costs should fall precipitously once they start to be shared over a much larger number of schools and students. But that’s a testable hypothesis on which the Ministry can hold us to account. In Year 2, we need to prove to them that we can deliver the same or better results at a significantly lower cost per student.

The second challenge is around representativeness. One criticism that has been aired is that Year 1 schools were the low hanging fruit. As the evaluation makes clear, it is simply not true that Year 1 schools were somehow cushy, but it is true that Year 1 schools were generally in easier to serve, somewhat less disadvantaged communities than the median Liberian school. And that’s precisely why the Ministry of Education insisted that the schools we and other operators will be serving in Year 2 be disproportionately located in the South East of Liberia, where those concerns about unrepresentativeness do not apply. If we can continue to perform well in these more challenging contexts, it will go some way to answering the question of whether PSL can genuinely become part of the solution for the whole of Liberia.

In short, the RCT midline provides a welcome confirmation of what our own data were telling us about the positive impact we are having. Our task for the coming academic year is to show that we can sustain and deepen that impact, in more challenging contexts, and more cost effectively. A big task, but one that we are hugely excited and honoured to be taking on.

A little over a year ago, Education Minister George Werner showed a great deal of political courage not just in launching this programme but in insisting that it be the subject of a ‘gold standard’ experimental evaluation. One year on, and these results show that his vision and conviction is beginning to pay dividends. This report is not the final word on PSL, but the next chapter promises to be even more exciting.

 

* Footnote: as the evaluators note in their report, the process of randomly assigning schools in summer 2016 was complex, made even more challenging by the huge number of moving pieces for both operators and the Government of Liberia as both endeavoured to meet incredibly tight timescales for opening schools on September 5th. Provisional school allocations changed several times; by August 14th, three weeks before school opening, we still did not know the identity of our fifth school and it was proving very difficult to find a pair of schools near enough to our other schools to be logistically viable. Faced with the choice of dragging the process out any longer and potentially imperilling operational performance or opting to run a fifth school that was not randomly assigned, we agreed with the Ministry on the latter course of action.