It shows how junior school pupils are at a huge disadvantage in the progress race because the school does not have control over the baseline, and how pupils that make good progress in reality end up with negative scores when compared against supposedly similar pupils nationally. It’s like entering a fun run only to discover that the other competitors are elite athletes in disguise.
And so we need a better baseline and this is the hot topic in the recently launched consultation on the future of primary assessment. Most seem to favour a baseline taken early in the reception year and this is most likely the direction of travel. After all, surely it makes sense to measure progress from when pupils start primary school rather than from a point 3/7ths of the way through. Whatever the start point, any future baseline assessment needs to be principled, robust, and should be refined enough to provide a suitable number of prior attainment groups. Unfortunately, and inevitably, those perverse incentives to ensure a low start point will still exist so how do we avoid them?
Continue with the current arrangement of moderating a sample of schools each year. I would argue that this has not proved to be particularly effective. If it had been then we wouldn’t have all these issues and I wouldn’t be writing this blog post. It’s probably time to consider other options. Alternatively moderation could be carried out after submission of data, which might help ensure schools err more on the side of caution. More likely though it would just create resentment.
This could take a number of forms: schools moderating each other’s baseline assessments (this already happens a lot anyway), teachers from a neighbouring school invigilating the assessment in the classroom (think national lottery independent adjudicator with a clipboard), or actively administering the assessment. I’m not sure how popular the latter would be either with staff or with children.
Use of technology
If pupils were to do the assessment via an iPad app there are benefits in terms of instant data collection and feedback, which is useful for the user. Plus – and here’s the sinister bit – algorithms can spot unusual patterns (think betting apps), which can help discourage gaming. However, there are no doubt access issues for some pupils and what if they struggle to complete tasks at the first attempt? Do they get another go? Plus it means the purchase of a lot of iPads. I recall that one of the six providers of the last attempt at a baseline assessment had such a solution and evidently it wasn’t particularly popular – it didn’t make it to the final 3 – but that doesn’t mean it’s not worth another look.
This would probably only work if the assessment was carried out in all schools on the same day. I’m assuming this won’t happen. It is more likely that assessment will be carried out over a number of days, which would mean schools submitting the dates of assessment in advance like an athlete declaring their whereabouts. Also, who would carry out random checks? This is probably a non-starter. It would be massively unpopular.
Unlike levels, which were broad, vague and non-standardised, and therefore lacked an accurate reference point (yes, 2B was the ‘expected’ outcome but no one could really decide what a 2b was), a standardised assessment based on sample testing will provide a more reliable measure. Schools or areas with consistently low baseline scores, where all or nearly all pupils are below average, may warrant further investigation.
I understand that all of this sounds rather big brother but the alternative is we carry on as we are with unreliable progress measures against which critical judgements of school performance are made. If we are going to have progress measures – and who wants to have their performance based on attainment alone – then it absolutely has to be based on credible data. That means having an awkward conversation about gaming arising from perverse incentives and what steps can be taken to avoid it, because the current situation of high stakes performance measures, floor standards and coasting thresholds based on utterly unreliable data is unsustainable.