The Good of Small Things - Sig+ for School Data

I recently read Jonathan Simons’ thought-provoking blog post on Ofsted inspections and school standards in which he explains that the task of identifying underperforming schools now rests solely with Ofsted. There are no other accountability levers, no national threshold measures such as floor standards to add a degree of standardisation to the process and perhaps that is an issue. Without such devices the system is at risk of being subjective and left to the whims of human decision (and error). It’s an interesting point.

The article’s suggestion that we could return to some kind of threshold measure caused me to revisit the definitions of the old floor standards and coasting measures, which were scrapped in 2019, and re-read some of my posts on the subject. I had various concerns. First, the floor standards were engineered via the progress measures to catch a fixed proportion of schools. Any progress score that fell above the extremely low thresholds (-5 in reading and maths at KS2) was deemed to be ‘sufficient’, despite an inevitable ‘well below average’ classification in the performance tables. And second, the coasting measures, which at least took account of results over three consecutive years, tended to capture a similar group of schools to the floor standards. Mostly, these measures netted schools in more deprived areas, which were already more likely to receive a less than good judgement from Ofsted. To brand such schools as ‘coasting’ was an insult.

But that’s not why I’m here. It was Jonathan’s suggestion of using three year rolling averages as a way of tracking school performance that really got me thinking. And it got me thinking about small schools in particular.

There are approximately 16,000 primary schools in England. Most are one-form entry but around 1900 primary schools have fewer than 100 pupils. And for this group of schools, data has always been a problem. Even for a one form entry school, where each pupil accounts for 3% of the cohort’s result, data is spiky, but for a school with fewer than 15 pupils in a year group, any attempt to present data as meaningful is ridiculous. But that is exactly what happens: in the performance tables, Analyse School Performance, LA reports – the results of small schools are out there. And bad decisions are made on the basis of bad data.

In 2018, Amanda Spielman gave a speech at the Bryanston Education summit in which she highlighted the issue of analysing the results of pupil groups at school level:

“Nor do I believe there is merit in trying to look at every individual sub-group of pupils at the school level. It is very important that we monitor the progress of under-performing pupil groups. But often this is best done at a national level, or possibly even a MAT or local authority level, where meaningful trends may be identifiable, rather than at school level where apparent differences are often likely to be statistical noise.”

The same obviously applies to cohorts as a whole in small schools and some mechanisms have been put in place to mitigate the issue. Whilst not without fault – ‘statistically significant’ differences are too often interpreted to signify the ‘school effect’ on pupils’ outcomes – confidence intervals placed around measures do at least act as a sort of protective jacket for small schools. Because one pupil can make such a difference, the confidence interval is wide so as to ensure that their impact is not detrimental (or advantageous) to the overall result. But there are many inconsistencies in the way the DfE and Ofsted handle data. The DfE, for example, only apply confidence intervals to progress measures whilst Ofsted (and FFT) apply them to all data, both attainment and progress. This means that results – for example, the percentage of pupils achieving expected standards – are presented in ASP and the performance tables without any form of mitigation. Well almost. Results in the performance tables are ‘suppressed’ if there are fewer than six pupils in the cohort (yes, six!). And here’s another inconsistency: whilst the performance tables relies on a cut-off of six pupils, Ofsted’s Inspection Data Summary Report (IDSR) ‘greys out’ the data of cohorts with fewer than eleven pupils. There really should be some joined up thinking on this, but either way – whether fewer than six or eleven pupils is the definition of ‘too small’ – we are still presenting data that will result in wild fluctuations over the years. Bear in mind that in those cases where a cohort is considered to be just big enough, a pupil will account for 17% and 9% of the overall result respectively.

School data has certainly moved on from the dark days of RAISE when 103 pages of binary, broken down into every conceivable subgroup of pupils, could be interpreted in pretty much any way you liked. If anything, it’s gone from one extreme to the other. The IDSR has shrunk from 22 pages in length in 2017, to 11 pages in 2018, to around 6 pages in 2019. Headteachers printing out their most recent IDSR could be forgiven for returning to the printer to check for paper jams, such is the brevity of the report. The phrase ‘there is nothing to highlight…‘ repeats across the pages like a lyric from 1990s techno track, an admission by Ofsted that few conclusions can be drawn from the data in most cases. But there is more that can be done to help small schools, to improve the validity of the data, to turn down the ‘statistical noise‘ and lessen the chance of bad decisions being made, which brings me back to three year rolling averages.

There was a time when three year rolling averages were much discussed; a good solution to the problem of monitoring the performance of primary schools, especially those with small cohorts. Several frameworks ago, the Ofsted handbook contained the following useful guidance:

Where numbers of pupils are small and achievement fluctuates considerably from year to year, inspectors should take into account individual circumstances when comparing with national figures, and should consider any available data of aggregate performance for consecutive cohorts

Three year rolling averages reduce the noise and stop us from comparing one cohort to the next and expecting results to go up each year. Sadly, implementation was half hearted and the measures never gained much attention, the data buried deep in ASP and the performance tables where the latest year’s results still rules. As far as I’m aware, such data is completely missing from the IDSR. Considering how noisy even a one form entry school’s results are, this really needs to change.

First, the headline performance figures for primary schools – both attainment and progress – should be based on a three year rolling average. Yes, this would mean you don’t see the sudden steep improvements but you wouldn’t see the sudden steep declines either because – for the average sized school – the figures would be calculated for a cohort of 90 not 30. And for the small schools a caveat: the aggregated cohort must reach a certain size – say 30 or more pupils – before any measures can be published. This is nearly three times the current threshold used in the IDSR and five times that applied in the performance tables. To put results into context, the difference between the school’s result – for example, the percentage meeting expected standards – and the national figure should be presented as a number of pupils to help prevent audiences focussing on trivial gaps.

Of the 1900 primary schools with fewer than 100 pupils, 12% of them are judged to be inadequate or requiring improvement by Ofsted compared to 7% of two form entry schools*. This is a notable difference but not as stark as the figures for outstanding schools: 20% of two form entry schools* hold the top Ofsted grade; just 9% of those with fewer than 100 pupils are judged to be in the same category. Whilst Ofsted base their judgements on more than just results, results help, and if yours is a forever ‘average’ school – or a school whose results are eternally suppressed or greyed out – then you are almost certainly at a disadvantage. And if your results peak and trough from one year to the next because of the size and characteristics of the cohorts then that’s not going to do you any favours either.

12% of primary schools have fewer than 100 pupils. There is probably one near you. Yes, they are small but they are just as important to the communities they serve. If we must have performance measures then steps should be taken to turn down the noise and increase the signal. Small schools have suffered meaningless data long enough.

*schools with between 400-499 pupils.