Primary school accountability: a manifesto for change

If there was ever a time to review the nature and purpose of primary school accountability measures, it is now. No data will be collected this year: no early years foundation stage profile, no phonics outcomes, no key stage 1 teacher assessments , no multiplication tables check scores, no key stage 2 results. There will be no performance tables, ASP data or IDSR ‘areas of interest’. 2020 will be a hiatus in the record, an entire year’s results wiped from the map.

And next year? One assumes normal service will be resumed. But should it?

This is the perfect opportunity to reassess the whole system.

The problem

Unlike in secondary schools, which have two statutory assessment points at key stage 4 and 5 relying almost entirely on standardised tests, primary schools have numerous statutory assessment points (currently five, soon to be joined by a sixth: the reception baseline) which rely mainly on teacher assessment. This would not be a problem if the data was used for internal purposes of supporting pupils, and for transition to secondary schools. But that is not its main purpose. The data is also used to judge school performance, which results in tension and perverse incentives, with assessment exposed to competing stresses pulling it in opposite directions. Consequently, schools may want to ‘err on the side of caution’ when it comes to data that may be viewed as a baseline (eg foundation stage profile and key stage 1), and give the benefit of the doubt when it comes to anything that is purely a result (eg writing at key stage 2, or key stage 1 results in infant and first schools). Even the results of the phonics check in year 1 – a fairly straightforward assessment – looks odd when viewed at national level, with a veritable cliff edge between 32 marks (the pass mark) and 31. One has to wonder what the national distribution of phonics scores would look like if they weren’t collected.

Using data for multiple purposes is a problem, and surely everyone can see the risks of using teacher assessment to measure school performance. Teacher assessment is a vital component of school data but it is essentially human opinion, which is unavoidably broad and subjective at best. And using human opinion to measure the performance of humans is probably not going to work out well. This is the paradox of accountability in education: the high stakes undermine the very thing the system relies on to function.

To put it bluntly, you can have reliable teacher assessment or you can use it to measure performance. That’s the choice.

So, what should the DfE do?

The answer is not to increase the use of standardised tests in primary schools or even to reduce the number of statutory assessments. The answer is to reduce the amount of data that is collected and reduce what is done with it.

In January 2020, the DfE published guidance on the new engagement model, which is due to replace P scales 1-4 in 2020/21. The model is intended to assess those pupils with severe or profound and multiple learning difficulties who are not engaged in subject specific study. One of the striking things about this new method of assessment, from an accountability point of view, is that whilst there is a statutory duty to follow the model (for settings with applicable pupils of primary age), there is no statutory requirement to submit any data (nor, incidentally, is there any prescribed format to that data).

Could such an approach be taken with other assessments made in primary schools, such as the phonics check or the foundation stage profile? Again, that question: what might the data look like if no one was collecting it, if the tension was removed from the system, if there were no three year trends in ASP or IDSR? What if phonics was purely done for diagnostic purposes?

Primary schools would have a statutory responsibility to administer and collate the outcomes of the assessments, but would have no requirement to make annual data submissions. Instead, the DfE could carry out an annual sampling exercises, collecting results from a proportion of schools to monitor national trends.

And what of key stage 1?

Key stage 1 assessment ceases to be statutory in 2022/23. As with other assessments, there will be no data this year and next year’s submission is likely to be of dubious value. The DfE should therefore seriously consider scrapping it now. Yes, this will mean that there will be three cohorts without a baseline for progress measures at key stage 2, but perhaps a moratorium is no bad thing. And if we find that we can live in a world without progress measures, then maybe we don’t need a reception baseline for such a purpose. A reception baseline can (and no doubt will) be made, but there would be no cloak and dagger ‘black boxing’ of data for a shadowy measure seven years later. It would be a diagnostic assessment to inform next steps.

Where does that leave key stage 2?

Scrapping key stage 2 tests is probably not going to happen anytime soon, and removing them would signal the end of Progress 8 (unless secondary schools administered their own baseline tests). They do provide useful (and quite reliable) information on school standards, the success (or otherwise) of national curriculum reforms, and gaps between vulnerable pupils and their peers, but there are certainly issues that need to be addressed and solutions that are worthy of consideration:

Assessment of writing should be a statutory but the data should not be collected as a matter of course, and the results should not be used for any key measures of school performance. It is worth noting that writing no longer forms part of the baseline for progress 8 (it was removed when KS2 writing test scores ran out) which speaks volumes about how the data is perceived by the DfE.
Progress measures are flawed and should be reviewed. All-through primary schools have a perverse incentive to achieve lower results at the start point, whilst two tier primary systems have the opposite problem. Schools being in charge of your own baselines is as problematic as athletes timing their own races. The desire for progress measures is understandable but they need to be reliable and game-proof. We also need to consider whether they can lead to lower expectations for those with lower start points. And, of course, you can’t have a progress measure without a baseline: the measure has to start somewhere and right now that somewhere is the beginning of reception.
Performance tables need some serious thought: complex data presented in an attractively simple format with very little narrative and context hidden away from view. This needs turning on its head. There is a place for providing parents with information about schools but I’m not convinced that the current data-centric approach is the right one. Audiences need more narrative and it could be argued that the school website and Ofsted report already provide that.
Test practice. Some schools do a little, some schools do a lot. Test practice is inevitable when the stakes are so high. A radical solution could be to not publish the test papers and scaled score conversion tables online. Instead, put all the questions into an online question bank for schools to use more formatively.
Sampling. It’s already done for science. Could it be done for all subjects? The tests would then be used solely for monitoring national trends; not individual school performance. This would spell the end of the performance tables and annual reporting but would that be a disaster?

This year has been a year like no other, and one which we hope we will never see again, but perhaps good things can come out of it.

A radical overhaul of the accountability system is desperately needed and the time to do it is now.