A red RAG to a bull - Sig+ for School Data

Last month the schools causing concernguidance was updated. This guidance sets out local authorities’ statutory responsibilities for tackling underperformance, and includes details on the issuing of warning notices and their powers of intervention. My fear is that it’s more about stick than carrot and effectively turns LAs into boot boys, going out to round up as many schools as possible on the basis of flimsy evidence and their own interpretation of vague guidance. Ironically, this round up will result in more schools being taken out of local authority control thus making LAs the architects of their own downfall. I was going to blog about it at the time but life got in the way and it faded from view. But my concerns have been reignited by one particular local authority’s approach to rating its schools, and I feel compelled to discuss their methodology in the public domain.

So, let’s return to the schools causing concern guidance, and to the definition of low standards of performance contained therein. Page 11 contains the following definition:

The definition of what constitutes “low standards of performance” is set out in section 60(3) of the 2006 Act. This is where they are low by reference to any one or more of the following:

I. the standards that the pupils might in all the circumstances reasonably be expected to attain; or,

II. where relevant, the standards previously attained by them; or

III. the standards attained by pupils at comparable schools.

For the purpose of this guidance, “unacceptably low standards of performance” includes: standards below the floor, on either attainment or progress of pupils; low standards achieved by disadvantaged pupils; a sudden drop in performance; sustained historical underperformance, performance of pupils (including disadvantaged pupils) unacceptably low in relation to expected achievement or prior attainment, or performance of a school not meeting the expected standards of comparable schools.

This is so vague and open to interpretation that I reckon I could make half the schools I work with fit the criteria. Take ‘below floor’ for example. Officially, to be below floor the school has to be below the attainment threshold and progress medians, but here it’s a case of either/or. A sudden drop in performance can of course be down to a drop in prior attainment and one would hope that LAs would always take that into account but this depends on the quality and depth of their analysis. Another thing that bothers me is ‘comparable schools’. What are comparable schools? The DfE and Ofsted have their own definitions of similar schools based on prior attainment (see Ofsted dashboard and DfE performance tables); the Perspective system, used by LAs, also had a similar schools measure but one based on contextual factors, whilst FFT use a complex range of factors in their CVA analysis in order to compare like with like. There is no single definition of ‘comparable schools’.

And all this means that LAs can devise their own methods of identifying ‘schools causing concern’ based on their own interpretation of the guidance. The model I’m going to deal with here is one such example of an oversimplified and flawed approach to risk rating that definitely warrants further scrutiny.

LAx

When I wrote the first draft of this blog I named the LA but realised that this is likely to cause problems for certain schools, so I’ve decided against it. However, I still think it’s important to share the method because stuff like this should be exposed and challenged. Hopefully it’ll encourage others to do likewise.

This particular LA – let’s call them LAx – attempted to resist FOI requests made by the local media to publicise their risk rating methodology on the grounds that it was not in the public interest. This has been rejected and so the method has now been shared and will be made public in the next week or so. The LA are obviously not too happy about this decision, and neither are many schools, particularly those that will have to share their ‘red’ RAG rating with parents.

And so we get to the important bit: the methodology that LAx are using to quantify, categorise and RAG rate the performance of its schools. Here we go!

The Model

The following table shows an example of the method. There are 11 indicators and a score is given depending on how the school’s results compares against the indicator. Generally speaking this means -1 if below the national average and +1 if equal to or above it. For L4+RWM, schools below floor are -1 and those equal to or above the national average are +1, whilst those that are above floor but below the national average get 0. With regards 2 levels of progress, a school will be -2 if below the national average, and +2 if above the floor standard. Those schools between the national average and the floor standard medians get 0. Hope that all makes sense.

Finally, the school’s latest result is compared to the previous year and a trend indicator is assigned in order to show apparent improvement or a decline in standards.

*note: 2LP/3LP averages in RAISE aren’t really averages, they are national proportions (as if all pupils are in one school). The floor is the only real average (i.e. a median), but that’s beside the point.

The schools’ scores and trend indicators are then collated and assigned a RAG rating thus:

Schools with scores of +4 or more are ‘green’, those with scores between -4 and +3 are ‘amber’, and those below -4 are ‘red’. I’m not entirely sure if the trend influences the RAG rating in any way but considering School H has seen a decline in standards 2 years running and still gets a ‘green’ I assume not.

Of this methodology, the LA says this:

In the recent 2015 Ofsted inspection, Ofsted said the risk tool “places the local authority in a stronger position to identify schools at risk of deteriorating early enough to prevent their decline.” (Ofsted, May 2015).

and

It is important to note that the risk tool is intended to be the start of a process that is used to inform discussion with Headteachers. The core function of the risk tool is to indicate where schools are at risk of not being judged as Good or Outstanding at their next Ofsted inspection.

So, what’s wrong with this model then? This leads neatly on to the next section entitled…..

So, what’s wrong with this model then?

Well, a few things really:

1) It takes no account of cohort size.

This is obvious. Pupils in smaller schools have a bigger % impact on results than those in larger schools. RAISE deals with this in 2 ways: the confidence intervals used in statistical significance testing, and the (rather flawed) methodology in the closing the gap section that produces those lovely red boxes. However, if we borrow the latter for this process then any gap that is smaller than the percentage value of one child should be ignored. Actually, any data that is not significantly above or below should treated as statistically in line, and also be ignored.

2) It takes no account of pupils’ start points

Schools with high proportions of pupils that were L2C and L3 at KS1 are far less likely to have high or even average percentages making 3 levels of progress. We all know this.

3) It takes no account of context

Schools with high percentages of SEN and lower attaining pupils are seriously disadvantaged by this model. For schools with high percentages of SEN, FFT’s CVA analysis is much fairer.

4) It takes no account of VA

Why? Next year this will be the only progress measure so may as well build it in now to ensure the whole thing is future proofed. VA is fairer for schools that have high percentages of lower attainers; CVA is even fairer.

5) It takes no account of progress across KS1

The model just compares L2B+ against national averages. What about those schools with low ability intakes, where pupils make good progress but %L2B is still below average? FFT KS1 VA/CVA percentile ranking would be a much fairer measure that would recognise the good progress made by pupils in schools where KS1 attainment is relatively low.

6) It favours coasting schools

Schools that are above the attainment comparators and above all the national averages and floor standards for 2LP in reading, writing and maths, but below the 3LP measures, will get a score of +8. Schools with smaller cohorts, and high percentages of SEN and low attaining pupils, will get lower scores despite pupils making good progress in real terms. Incorporating VA and carrying out a comparison of FFT attainment and progress ranking at both KS1 and KS2 would help to even things out.

7) It doesn’t take account of phonics

This surprises me. It’s the only test carried out during KS1 and judging by the pages devoted to it in RAISE, evidently the DfE and Ofsted take it seriously. Also, it is quite common to see low phonics results sandwiched between high attainment at EYFS and KS1.

8) It doesn’t take account of gaps

Considering the importance of the closing the gap agenda, this seems like a glaring omission. There are plenty of relatively low attaining schools where FSM pupils make fantastic progress, and there are plenty of high attaining schools where FSM pupils make relatively poor progress. This should be recognised.

9) The thresholds are arbitrary

Why is -4 amber and -5 red? Is it statistically relevant? I’m interested.

10) “You’re red!” doesn’t seem like the ideal start point for a conversation

Just sayin’.

A little more information, a little less action

I once attempted to create a risk rating tool for an LA. It took account of cohort sizes and percentages of SEN, FSM and EAL, alongside at least 12 other key performance measures including VA. Schools were ranked on the various key measures and those with the highest average rank scores were the supposedly the most at risk. It was an interesting academic exercise but I became increasingly uneasy with it because I only had to make a few minor tweaks and the top 10 most ‘at risk’ schools could completely change. In the end I realised that the only effective way to evaluate school performance was to go through every single RAISE report and FFT dashboard for every school, and, most importantly, combine that with our own intelligence about those schools from recent visits (e.g. new Headteacher, changes in demographics, addition of new SEN resource, increase in EAL pupils etc) to arrive at a qualified judgement on that schools position. And that’s what we did. Very, very hard work during the autumn term when the data came out but it is the best way to fully understand the subtleties of school performance. No shortcuts.

As mentioned above, the Schools Causing Concern guidance concerns me because it gives LAs the green light to develop flimsy, oversimplified methods for identifying supposedly weak and vulnerable schools, such as the one presented here. We therefore end up with a model that takes no account of context and is biased in favour of higher attaining schools, rewarding those that are above average even when the difference is not significant in any way. It certainly does no favours for those schools with lower ability intakes. So, I do not agree with Ofsted’s assertion that the ‘tool’ ‘places the local authority in a stronger position to identify schools at risk of deteriorating early enough to prevent their decline’. It’s way too basic and retrospective for such grand statements.

This probably makes me a marked man in LAx territory now, and I may have to work under a pseudonym in future, but if it causes LAs to reconsider their approaches then it’s been worth it.

And until it changes, my RAG rating for this particular methodology is obviously…

RED