AI and Marking

Given the concerns in relation to teacher workload, and you just need to take a quick look at the teacher wellbeing index reports to see this, it is clear that we need to look find solutions to the workload challenge.   Artificial intelligence (AI) is one potential piece of this puzzle although is by no means a silver bullet.    The issue I have come across on a number of occasions is concerns regarding some of the challenges in relation to AI such as inaccuracies.  I avoid talking of hallucinations as it anthropomorphises AI;  The reality is that its probability algorithm just outputted something which was wrong so why cant we simply say AI gets it wrong occasionally.    And we are right to have concerns about where an AI solution might provide inaccurate information, especially where it might relate to the marks given to student work or the feedback provided to parents in relation to a students progress.   But maybe we need to stop for a moment and step back and look at what we do currently.    Are our current human based approaches devoid of errors?

I did a quick look on google scholar and found a piece of AQA research from 2005 looking at Markling reliability and the below is the first line in the conclusion section of the report:

“The literature reviewed has made clear the inherent unreliability associated with assessment in general, and associated with marking in particular”

We are not talking about AI based marking here, we are talking about human based marking of work.     We are by no means the highly accurate marking and assessing machines we convince ourselves we are.     And there are lots of other studies which point to how easily we might be influenced.  I remember one study which focussed on decision making by judges where, when they analysed the timing of different decisions, they found that the proximity to a courtroom lunch break had a statistical impact on judges decisions.   Like marking, we would expect a judges decision to be independent of the time of the decision, and to be consistent, however the evidence suggests this isn’t quite the case.    Other studies have looked at how the sequence which papers are marked in can have an impact on marking, so the marking of a paper following a really good or poor paper, will be impacted by the paper which proceeded it.    Again this points to inconsistency in marking.    Also, that if the same paper is presented to the same marker on different occasions over a period of time, different marks result where if we were so accurate in our marking surely the marks for the same paper should be the same.

It seems clear to me that we are not as accurate in our marking and assessment decisions as we possibly think we are.   I suspect, calling out AIs inaccuracies is also easier than calling out our own human inaccuracy, as AI doesn’t argue back or try to justify its errors, in terms of justifying to us, or even internally justifying how the errors are valid to itself.  And this is where a significant part of the challenge is, in that we justify and convince ourselves of our accuracy and consistency, where any objective study would show we aren’t as good as we think we are.   When presented with such quantifiable evidence, we then proceed to generate narratives and explanations to justify or explain away any errors or inconsistencies, so overall our perception of our own human ability to assess and mark student work is therefore that we are very good and accurate at it.  AI doesn’t engage in such self-delusion.

Conclusion

In seeking to address workload and in considering the use of AI in this process we need to be cautious of wanting to get things 100% right.   Yes, this is our ideal solution but our current process is far from 100% right so surely we need only be able to match our current accuracy levels but with a reduced workload for teachers.    Now it may be that the AQA research may present the answer in that “a pragmatic and effective way of improving marking reliability might be to have each script marked by a human marker and by software”.   Maybe rather than looking for AI to do the marking for us, it is about working with AI to do the marking, using it as an assistant but ensuring human insight and checking is part of the process.

And I also note that the above applies not just to the marking of student work but also to the use of generative AI in the creation of parental reports, another area of significant workload for teachers.   Here also an approach of accepting the frailties of our current approach then seeking to use AI to achieve at least the same level of consistency while reducing workload seems appropriate.

Maybe we need to stop taking about Artificial Intelligence and talk more about using AI to create Intelligent Assistants (IA)?

References:

A Review of the literature on marking and reliability (2005), Meadows. M. and Billington. L., National Assessment Agency, AQA

Is using AI cheating?

Ever since ChatGPT burst onto the scene in November 2022 there have been various people in education citing concerns related to how LLMs such as ChatGPT, Gemini, Claude, etc might be misused by students.    But to misuse AI, it must therefore be possible to use AI where I feel the sense is that students should be prevented from using, and who decides what is an appropriate or inappropriate use?   Those invested in change and evolution, who may understand AI, its benefits and risks, or those invested in retaining the status quo with limited understanding or exercise of using AI, let alone using it in a classroom?

Concerns, concerns and more concerns

Concerns have been raised regarding student plagiarism and cheating where students might use generative AI to complete assignments, tests, or essays, undermining the authenticity of their work and misrepresenting their “true” abilities.   This in itself is interesting in ascertaining our “true” abilities.     My spelling and grammar needs work but through spelling and grammar checkers it appears better than it is, but given such checkers are so common does this matter when writing an assignment, blog post or other piece of content?    And does a piece of written coursework or an exam expose the “true” abilities of students, or is it simply a convenient proxy?    Concerns have also been raised in relation to dependency and over-reliance on AI tools which may hinder the development of critical thinking and problem-solving skills if students use them to bypass challenging tasks.   But in a world of search engines and suggestion algorithms suggesting our TV, shopping and music habits is this dependency or simply about convenience?   Access Disparities and digital divides have also been raised given not all students have equal access to generative AI tools, leading to disparities in academic performance and opportunities.  And I suspect this is the most troublesome of the concerns, where the argument regarding the issues some perceive with generative AI may simply fuel an increasing divide between those who can and do use generative AI and those who cant or won’t.

Solution or not?

In relation to assessment some have therefore suggested that the best solution is for simple pen and paper based assessment to be brought back.   I am not sure how this would work as students could still use generative AI to create their coursework before simply copying it by hand.   It also feels a bit like a “we’ve always done it this way”.

AI detection tools have been suggested however I simply don’t believe these are ever going to be reliable.   The key aim for generative AI is to create content which looks like it was created by a human so this will result in a race between AI vendors and the those creating the AI detectors, with only one likely to win that race.    And it ain’t the vendors providing the AI detectors (or the schools spending on money on said detectors). Oh and lets not forget the poor students who will be accused of cheating just because their writing style is highly typical and therefore falsely flagged by these so called AI detectors.

But maybe we need to take a step back and ask ourselves what is the purpose of education and of assessment?  

What is education about and why assess?

If part of the purpose of education is to provide students with the knowledge, skills and experiences which will allow them to flourish and thrive in the world post compulsory education, then shouldn’t we be looking to provide them with the knowledge, skills and experience in relation to using generative AI?    I can only see the use of generative AI increasing across different job types and careers, as I have seen my own use increase post November 2022.  As such, to me it is clear that we should be engaging and working with students in relation to the proper and effective use of generative AI.

And what is the purpose of assessment?    Is it to test memorisation?   And if so, is this as important in a world of search engines and generative AI?    Or in the case of coursework, is it to test the students ability to apply knowledge or demonstrate skills?    And if this is the case, shouldn’t the students be encouraged to use the tools they have available to them which therefore surely needs to include generative AI?    We now, for example, support the use of calculators in Maths exams and we don’t ban the use of spelling and grammar checkers when creating coursework.     And if a student with a learning difficulty uses technology to level the playing field through allowing them to type or dictate, why should it be different for a second language speaker of English using AI translation tools, or simply any student using generative AI to help them create better work, to get started, to refine or to seek feedback?    Why would we want students to create lesser work than they are capable, when using the tools which are now so widely available to them could allow them to achieve more?   Should we not be empowering students to achieve their very best using the tools readily available to them?

Maybe we need to question our current model for assessment, namely tests and coursework, accepting that in a world of generative AI these are no longer suitable or appropriate.   Focussing on assessing the outcomes, the product such as coursework, is no longer possible as students will all be able to create similar output using generative AI tools, so instead I would suggest we need to look towards exploring and assessing the processes students undertake.  

I also note lots of discussion on teachers using Gen AI to help with the workload challenges, using it to create lesson plans and lesson materials, to help with marking, etc.How is this ok for a teacher but for a student to use the same tools, in largely the same way, it isn’t acceptable?

Time for change, finally?

This does feel like a time where we education, and in particularly assessment, need to change significantly.    Gen AI is here to stay, so how can education, how can we make the most of it, preparing our students and providing them with the skills and experiences need to thrive and flourish?

AI and assessment (Part 1)

I recently spoke at an AI event for secondary schools in which one of the topics I spoke on related to AI and its impact on Assessment.   As such I thought I would share some of my thoughts, with this being the first of two blogs on the first of the sessions I delivered..

Exams

Exams, in the form of terminal GCSE and A-Level exams still form a fairly large part of our focus in schools.  We might talk about curriculum content and learning but at the end of the day, for students in Years 10,11, lower 6 and upper 6 the key thing is preparing them for their terminal exams as the results from these exams will determine the options available to students in the next stage of their educational journey.   The issue though is that these terminal exams have changed little.   I provided a photo of an exam being taken by students in 1940 and a similar exam in recent terms and there is little difference, other than one photo being black and white and the other being colour, between the photos.   The intervening period has seen the invention of DNA sequencing, the mobile phone, the internet and social media, and more recently the public access to generative AI but in terms of education and terminal exams little has changed.

One of the big challenges in terms of exams is scalability.  Any new solution needs to be scalable to exams taken in schools across the world.  Paper and pencil exams, sat by students across the world at the same time accommodates for this.  If we found life on Mars and wanted them to do a GCSE, we would simply need to translate the papers into Martian, stick the exams along with paper and pencils on a rocket and fire them to Mars.   But just as it is the way we have done things and the most easily scalable solution doesn’t make paper and pencil exams the best solutions.   But what is the alternative?

I think we need to acknowledge that a technology solution has to be introduced at some point and the key point is the scalability based on schools with differing resources.   As such we need a solution which can be delivered in schools with only 1 or 2 IT labs, rather than enough PCs to accommodate 200 students being examined at once as is the case with paper based exams.  So we need a solution which allows for students to sit the exams in groups, but without compromising the academic integrity of the exams where student share the questions they were presented with.    The solution, in my view is that of adaptive testing as used for ALIS and MIDYIS testing by the CEM.   Here students complete the test online but are presented different questions which adapt to students performance as they progress.   This means the testing experience is adapted to the student, rather than being a one size fits all as with paper exams.    This helps with keeping students motivated and within what CEM describe as the “learning zone”.   It also means as students receive different questions they can sit the exam at different times which solves the logistical issue of access to school devices.   Taken a step further it might allow for students to complete their exams when they are ready rather than on a date and time set for all students irrespective of their readiness.

AI also raises the question of our current limited pathways though education, with students doing GCSES and then A-Levels, BTecs or T-Levels and then onto university.    I believe there are 60 GCSE options available however most schools will offer only a fraction of this.    So what’s the alternative?    Well CalTech may provide a possible solution;  They require students to achieve calculus as an entry requirement yet lots of US schools don’t offer calculus possibly due to lack of staff or other reasons.   CalTechs solution to this has been to allow students to evidence their mastery of calculus through completion of an online Khan Academy programme.   What if we were more accepting of the online platforms as evidence of learning and subject mastery?   There is also the question of the size of the courses;   GCSEs and A-Levels and BTec quals are all 2 years long but why couldn’t we recognise smaller qualifications and thereby support more flexibility and personalisation in learning programmes?   In working life we might complete a short online course to develop a skill or piece of knowledge on a “just-in-time” basis so why couldn’t this work for schools and formal education?  The Open University already does this through micro credentials so there is evidence as to how it might work.   I suspect the main challenges here are logistical in terms of managing a larger number of courses from an exam board level, plus agreeing the equality between courses;   Is introductory calculus the same as digital number systems for example?

Coursework

Coursework is also a staple part of the current education system and summative assessment.    Ever since Generative AI made its bit entrance in terms of public accessibility we have worried about the cheating of students in relation to homework and coursework.    I suspect the challenge runs deeper as a key part of coursework is its originality or the fact that it is the students own work but what does that look like in a world of generative AI.    If a student has special educational needs and struggles to get started so uses ChatGPT to help start, but then adjusts and modifies the work over a period of time based on their own learning and views, is this the students own work?   And what about the student who does the work independently but then before submitting asks ChatGPT for feedback and advice, before adjusting the work and submitting;   Again, is this the students own work?  

There is a significant challenge in relation to originality of work and independent of AI this challenge has been growing.   As the speed of new content generation, in the form of blogs, YouTube videos, TikTok, etc, has increased year on year, plus as world populations continue to increase it become all the more difficult to be individual.  Consider being original in a room of 2 people compared with a room of 1000 people;    The more people and the more content, the more difficult it is to create something original.   So what does it really mean for a piece of work to be truly original or a students own work?

The challenge of originally and students own work relates to our choice of coursework as a proxy for learning;   It isnt necessarily the best method of measuring learning but it is convenient and scalable allowing for easy standardisation and moderation to ensure equality across schools all over the world.   It is easy to look at ten pieces of work and ensure they have been marked fairly and in a similar fashion;  Having been a moderator myself this was part of my job visited schools and carrying out moderation of coursework in relation to IT qualifications.   If however generative AI means that submitted content is no longer suitable to show student learning, maybe we need to look at the process students go through in creating their coursework.    This however has its own challenges in terms of how we would record our assessment of process and also how we would standardise or moderate this across schools.

Questions

I don’t have a solutions to the concerns or challenges I have outlined, however the purpose of my session was to stimulate some though and to pose some questions to consider.    The key questions I posed during the first part of my session were:

  1. Do we need an annual series of terminal exams?
  2. Does there need to be [such] a limited number of routes through formal education?
  3. Why are courses 2+ years long?
  4. Should we assess the process rather than product [in relation to coursework]?
  5. How can we assess the process in an internationally scalable form?

These are all pretty broad questions however as we start to explore the impact of AI in education I think we need to look broadly to the future.    In terms of technology the future has a tendency to come upon us quickly due to quick technology advancement and change, while education tends to be slow to adapt and change.    The sooner we therefore seek to answer the broad questions or at least think about them the better.

100+ years of exam halls and paper exams

And so, the exams season is in full flow with students across the world once again sitting in rows in exam halls, which are often simply school sports halls, with pen and paper to complete their end of course GCSE and A-Level exams.   If you looked at the halls the setup might be very much similar to exams from 100 years ago or more albeit education is now more accessible to the masses and exam halls now contain posters about “mobile devices” and how these are prohibited.    How is it possible that the exams process has changed so little?

Lets consider the wider world;  I asked ChatGPT for the significant technology advancements from the last 100 years and it came up with the below:

Computing and Information Technology:

The development of electronic computers and the birth of modern computing including the emergence of the internet and the World Wide Web, revolutionizing communication, information sharing, and commerce.

Transportation:

The rise of commercial aviation, making air travel accessible to millions and facilitating global connectivity along with the development of high-speed trains and advanced railway systems, enhancing transportation efficiency and connectivity.   Also, the proliferation of automobiles and the continuous improvement of electric vehicles and autonomous driving technologies.

Medicine and Healthcare:

The discovery and widespread use of antibiotics, dramatically reducing mortality rates from bacterial infections along with the development of vaccines against various diseases, leading to the eradication of smallpox and the control of many others.   Additionally, advancements in medical imaging technologies, such as X-rays, MRI, and CT scans, enabling non-invasive diagnosis and improved treatment planning plus progress in genetic research and biotechnology, including the mapping of the human genome and the development of gene therapies.

Space Exploration:

The first human-made object in space, the launch of Sputnik 1 in 1957, and subsequent manned space missions, culminating in the moon landing in 1969.    The establishment of space agencies like NASA, ESA, and others, leading to significant advancements in space technology, satellite communications, and planetary exploration.   And more recently the development of reusable rockets, such as SpaceX’s Falcon 9, reducing the cost of space travel and opening up opportunities for commercial space exploration.

Energy and Sustainability:

The expansion of renewable energy sources, including solar and wind power, as alternatives to fossil fuels plus improvements in energy storage technologies, such as lithium-ion batteries, facilitating the growth of electric vehicles and renewable energy integration.   This combined with a greater focus on sustainability and environmental awareness, driving innovations in energy-efficient buildings, green technologies, and eco-friendly practices.

Communication and Connectivity:

The evolution of telecommunications, from landline telephones to mobile phones, and the subsequent development of smartphones with advanced features and internet connectivity.   Also, the introduction of social media platforms, changing the way people connect, share information, and communicate globally and the advancement of wireless communication technologies, such as 4G and 5G, enabling faster data transfer, enhanced mobile connectivity, and the Internet of Things (IoT).

Conclusion

A lot has changed over the last 100 years, with a lot of the above occurring maybe in the last 10 to 20 years, yet in education we are still focussed on terminal exams like we were over 100 years ago.   We still take students in batches based on their date of birth and make them sit the same exam at the same time.    These exams are still provided as a paper document with students completing them with pen or pencil while sat in rows and columns in sports halls in near utter silence.  The papers are then gathered up and sent away to be marked with results not available for almost 3 months.

The above might have been ok 100 years ago but with the modern technology available to us now surely we should have made some progress.    I suspect, although there have been those who have suggested change, there hasn’t been a catalyst to drive it forward.   My current hope is that recent advancements in Artificial Intelligence (AI) and recent discussion regarding its use and potential, may be the catalyst we need.   Here’s to not still using the same exam processes 10 years from now, never mind 100!

Thoughts from the Bryanston Education Summit

I attended the 2nd Bryanston Education Summit during the week just past, on 6th June.   I had gone to in the inaugural event last year and I must admit to having found both years to be interesting and useful.   The weather both years has been glorious which also helps to add to the event and the beautiful surroundings of the school.   Here’s hoping Bryanston keep it up, and run another event next year.

During the day I attended a number of different presentations on different topics so I thought I would share some of my thoughts from these sessions.

The first presentation of the day was from Daisy Christodoulou who was discussing assessment.    She drew a really useful analogy in comparing preparing students for their exams with preparing to run a marathon.    It isn’t something where you can jump straight into a marathon distance on day 1 of training.  You need to slowly build up your preparations, focusing on developing certain skills and approaches.   You need to have a plan and then work to this plan, although amending it as needed as you progress, should injury arise or due to weather conditions, etc.    I found myself wondering about how often we actually spend with our students in discussing this plan, the proposed goal of the subject or year and how we will all, teachers, students, support staff and others, work towards those goals.

Daisy also spent some time discussing summative versus formative assessment suggesting that the use of grades should be kept to a minimum of only once or twice per year.   My first reaction to this was concern as it seemed to disregard the potential benefits of spaced retrieval testing which ultimately would result in a score representing the number of correct answers.   Following further thought my conclusion was that spaced retrieval is very focussed on knowledge plus just indicates where an answer is right or wrong as opposed to grading which is more a judgement of students ability.   As such it may be possible to reduce overall summative assessment grading while still making regular use of testing of student knowledge.   I think this also highlights the fact that assessment and testing are actually different things even although they are often generally used as two interchangeable terms referring to the same thing.

Mary Myatt was the second presenter who discussed how we might make learning high challenge but low threat.    As she discussed Sudoku I couldn’t help but draw parallels with computer gaming.  In both case we engage, of our own free will, in a form of testing.   In both cases the key is the low threat nature of the testing.    For me the question is therefore how do we make classroom learning and assessment low threat.    Mary suggested a path towards this in discussing with students our expectations such as setting reading outside their current ability level, which is therefore challenging, but telling them this and then promising to work through it with them in future lessons.   I think this links to building an appropriate classroom culture and climate such that students feel able to share the difficulties they face and work through them with the class.  It is very much about developing an open culture and positive or warm climate in which mistakes and difficulties are not seen as something to be feared or embarrassed by, but to be embraced, shared and worked through together.   Another thing I took away from Marys session was a list of books to read;  My bookshelf will be added to with some of her recommended books shortly.

The third of the sessions which I found most useful was the session by Andy Buck.    He discussed leadership drawing a number of concepts from the book Thinking Fast and Slow by Daniel Kahneman, a book which is one of my favourites.     I particularly enjoyed the practical demonstrations where he evidenced how we all show bias in our decision making.  This is a fact of being human and the way the brain works, we bring to decision making processes assumptions and viewpoints based on previous experiences, upbringing, etc.   He also, linked to this, demonstrated anchoring, managing to influence a whole room of educational professionals to get a question in relation to the number of Year 11 students in the UK wrong.   Statistics suggest that a percentage of the audience should have got this question correct based on a normal distribution of responses however using anchoring Andy influenced the audience away from the correct answer.   I have since used a very similar approach in a lesson with Lower 6 students to show how easily I can influence their answer and to suggest that Google, Amazon, Facebook, etc. with their huge amounts of data on individuals may therefore be able to influence individuals to a far greater extent.

There was also a presentation on VR in education which has opened my mind up a little to the possible applications of VR.   This might therefore be something we experiment with at school in the year ahead.

Microsoft’s Ian Fordham presented on the various things Microsoft are currently working on.   I continue to find the areas Microsoft are looking at such as using AI to help individuals with accessibility and in addressing SEN to be very interesting indeed.   I also was very interested by his mention of PowerBI as I see significant opportunities in using PowerBI within schools to build dashboards of data which are easy to interrogate and explore.    This removes the need for complex spreadsheets of data allowing teachers and school leaders to do more with the data available however with less effort or time required.    I believe this hits two key needs in relation to the data use in schools, being the need to do more with the vast amounts of data held with schools however the need to do it in a more efficient way such that teachers workload in relation to data can be reduced.

I also say a presentation by Crispin Weston on data use in school.    His suggestion that we need to use technology more to allow us to more easily analyse and use data is one I very much agree with.   This partly got me thinking about the Insights functionality in PowerBI as a possible way to make progress in this area.   He also talked about causation and correlation suggesting his belief that there is a link between the two and that the traditional call that “correlation is not causation” is in fact incorrect.   At first I was sceptical as to this however the key here lies in the type of data.    Where the data is simple and results in a simple linear trend line the resulting reliability of an argument that correlation equal causation is likely to be very low.   The world is seldom simple enough to present us with linear trends.    If, however the data over a period of time varies significantly and randomly and the second data element follows this however the reliability that correlation equals causation is likely to be significantly higher.     I think the main message I took away from Crispins session was to take data and findings with a pinch of salt and to ensure that context is taken into account.  If it looks simple and clear then there is something which hasn’t been considered.

Overall the day was a very useful one and the above is a summary of just some of the things I took away.   I must admit to taking 5 or 6 pages of tightly written notes, hastily scribbled on an iPad during the course of the day.

I hope that Bryanston decide to repeat the conference next year and is the quality of presenters and their sessions continues, that it becomes a reliable yearly event.   Here’s hoping the trend of good weather also continues should they decide to run the summit again next year.

 

 

 

Standardized Testing

I have written a number of times about my feelings with regards standardized testing.    (You can read some of my previous postings here – Some thoughts on Data , Building Test Machines).   Having worked internationally in schools in the Middle East I am particularly aware of standardized testing and the weight put on the results from such testing.   Within the UAE there is a focus on ensuring that education is of an international standard with the measure of this international standard being the results from PISA and also from EMSA testing regimes.    As a result individual schools and their teachers are expected to pore over the EMSA results and analyse what the results mean.    I feel that this focus on a standardized testing regime such as PISA is misplaced as how can we on one hand seek differentiated learning tailored to students as individuals while measuring all students with the a single standardized measure.

As such it was with great interest I read the article in the TES titled, “Ignore Pisa entirely,’ says world expert”.     The article refers to comments provided by Professor Yong Zhao who I was lucky to see at an SSAT conference event back in 2009.    Back then I found Professor Zhao to be both engaging and inspiring as a presenter, with some of his thoughts echoing some of my own plus also shaping some of the thoughts and ideas that I came to develop.    Again I find myself in agreement with Professor Zhao.    I particularly liked his comment regarding the need for “creativity, not uniformity”.

I feel the focus on PISA is the result of valuing what is measurable as opposed to measuring what is valued.      Measuring student performance in a standardized test is easy, with various statistical methods then allowing for what appears to be complex analysis of the data, therefore lending us to be able to prove or disprove various theories or beliefs.     Newspapers and other publishers then sensationalize the data and create causal explanations.   Education in Finland was heralded to be excellent recently as a result of the results from PISA testing.     Teaching in the UAE was deemed to be below the world average however better than most other Middle East countries.    Did PISA really provide a measure of the quality of education?    I think not!

Can education be boiled down to a simple test?   Is a students ability to do well in the PISA test what we value?    Does it take into consideration the students pathway through learning as the pathway differs from one country to another?   Does it take into consideration local needs?   Does it take into consideration the cultural, religious or other contexts within which the learning is taking place?    Does it take into account students as individuals?    Now I acknowledge that it may be difficult or even impossible to measure the above however does that mean that we accept a lesser measure such as PISA just because it is easier?

There may be some place for the PISA results in education however I feel we would be much better focusing on the micro level, on our own individual schools and on seeking to continually improve, as opposed to what Professor Zhao described as little more than a “beer drinking contest”.

 

Some thoughts on Data

A recent article in the Telegraph (read it here) got me thinking once more about data.   This also got me thinking about the book “Thinking, Fast and Slow” by Daniel Kahneman which I have only recently finished reading.  The book highlighted a number of issues which I feel have implications for education and need to be considered by school leaders.

Firstly the small numbers effect:  The Bill and Melinda gates foundation commissioned a study to examine schools in search of the most effective schools.    It found, unsurprisingly that small size, in terms of student numbers, schools achieved the best results, over larger schools.   Contradictory it also found that small schools also achieved the worst results.   The reason for this as explained by Kahneman is that where a data set contains only a small number of items the potential for variability is high.   As such, due to a variety of random variables and possibly a little helping of luck, some small schools do particularly well, out achieving big schools.    Other small schools are not so lucky and the variables don’t fall so well, resulting in the worst results.

To clarify this consider throwing three darts at a dart board aiming for the centre.   This represents the results of a school with a small number of students with higher scores being nearer centre and a lower score being those darts ending further from the centre.   In the case of student results an average result would then be calculated for the school and the same can be done looking at the position of the darts.   Assuming you are not a professional darts player you may do well or you may not do so well due to a variety of random variables.     Given the limited number of darts the potential for variability is high hence a high average or low average is very possible.   Next consider if you were to continue and throw sixty darts at the dart board, taking the average across all the dart throws.    Given the number of darts the average will regress towards your mean darts throwing ability.    The increased number of data items means that variability is reduced as each significant good or poor throw is averaged out among the other throws.

Within schools a great deal of value is being attached to statistical analysis of school data including standardised testing however care must be taken.   As I have suggested above a statistical analysis showing school A is better than school B could easily be the result of random factors such as school size, school resourcing and funding, etc as much as it may be related to better quality teaching and learning, and improved student outcomes.

Another issue if how we respond to the results.  Kahneman suggests that commonly we look for causal factors.   As such we seek to associate the data with a cause which in schools could be a number of different things however our tendency is to focus on that which comes easily to mind.   As such poorer (and better, although not as often,) results are associated most often attributed to teachers and the quality of their teaching as this is what is most frequently on the mind of school leaders.    We arrive at this conclusion often without considering other possible conclusions such as the variable difficulty of the assessments, assessment implementation, the specific cohort concerned, the sample size as discussed earlier and a multitude of other potential factors.   We also, due to arriving so quickly at a causal factor which clearly must be to blame and therefore needs to be rectified, fail to consider the statistical validity of our data.   We fail to consider the margins for error which may exist in our data including what we may consider acceptable margins for error.   We also fail to consider a number of other factors which influence our interpretation of the data including the tendency to focus more on addressing the results which are perceived to be negative.   This constant focus on the negative can result in a blame culture developing which can result in increasing negative results and increasing levels of blame.   Maybe an alternative approach which may work would be to focus more on the marginally positive results and how they were achieved and how they could be built upon.

The key issue in my belief is that we need to take care with data and the conclusions we infer from it.   We cannot abandon the use of data as how else would we measure how we are doing, however equally we cannot take it as fully factual.   The world is a complex place filled with variables, randomness and luck, and we need to examine school data bearing this fact in mind.   We also need to bear in mind that data is a tool to help us deliver the best learning opportunities for students;  data is not an end in itself!

 

Gaming

The subject of schools “gaming” school league tables and performance measures such as Progress 8 has made the news recently so I have decided to contribute my opinion to the mix.    Before doing so I need to be clear that I don’t have any particularly strong views with regards this issue.  I therefore believe that my points represent a balanced viewpoint.    I will however acknowledge that my assessment of my viewpoint as balanced is based on the context as set by my viewpoint, perception and the paradigms within which I operate as an individual.   As such, from the point of view of those reading, including yourself, this may not be balanced after all.    I make no apologies for this as all I can offer is my opinion, which is never wrong in that it is my opinion and therefore is formed based on my viewpoint and context.

Back on the subject of “gaming” the discussion seems to have opposing viewpoints.   One of these viewpoints is that a school should try to offer its students the best opportunities for success in the future.   As such it is important to enable them to achieve as many successful qualifications as possible.    These schools therefore look to enroll students in qualifications which for minimal effort return successful qualification, such as ECDL.

The other viewpoint is that schools enrolling students in bulk in ECDL are doing so in order to influence league tables and performance measures such as Progress 8.    Educators taking this position are of the opinion that these qualifications are of lesser value than other qualifications which may take longer to achieve or which are more difficult to achieve yet have comparable impact on league tables and other performance measures.

For me there may be truth in both viewpoints.   If the studying of specific exams is in the interest of students’ futures then surely it is the correct thing to do.    Consider two schools which are identical in outcomes except for the fact that students in one achieve an additional ECDL qualification.    Surely this puts students who leave with an additional qualification in a more positive position.   I myself worked in a school where we delivered OCR National IT to all students.   The reason we did this was due to vocational nature of the qualification which suited out student cohort plus the breadth of study and options available which allowed us to accommodate for individual student needs and interests.

Equally there is truth in the other viewpoint in that if a school put all students in for the ECDL qualification or the OCR National they may have done so purely in the interest of achieving a better league table position than other schools.   This may put students under stress where the qualification is additional, or may represent an unfair advantage where an “easy” subject has been substituted in place of a more difficult or valued subject with an equivalent or near equivalent league table or performance measurement points worth.

Both of the viewpoints include identical actions in the batch enrolling students in a given qualification yet both viewpoints result in totally opposing opinions.    The key fact is not so much what schools do but why they do it.    In one viewpoint it is about the students and the benefit to them while in the other viewpoint it is about the school and getting the best league table or performance measure result.

If OFSTED are to clamp down on “gaming” they are therefore going to have to try and identify why a school took the chosen action.     How are they going to do this?    How are they going to measure the “intentions” of school leaders?       Are we going to start seeing OFSTED inspectors administering polygraph lie detector tests on school leaders?

I also feel that this discussion has a lesser discussed aspect to it in the value of differing qualifications.   This discussion has raged for some time on the value of so called “core” subjects and the perceived lesser value of the arts and creative subjects.      The new “gaming” discussions adds differing values in terms of the perceived difficult level of a course along with the time taken to deliver the course, with shorter courses perceived to have lesser value.   Who will decide the relative worth of each course and the total worth of any individual students curriculum of study?

We should all be working in the interests of our students to try and provide them every competitive advantage with regards Further Education or Higher Education options, or options into employment, or even more generally into their future lives.   A key part of this is the qualifications they achieve so we need to get them everything reasonably possible.   In teaching we use every trick in the book to try and make sure students are learning plus are ready and able to succeed in whatever assessment is required to achieve a given qualification.   If this is “gaming” then maybe we are all involved.

Mandatory Testing?

As I was heading to work on Friday I heard a BBC news story regarding new proposals for “testing” of 4 year old children at the start of their school experience.    This immediately had me asking about the differences between assessment and testing.   I am not sure there is a difference however I am quite happy to listen to anyone who is able to explain this.

For me, independent of the age of students, one of the first things I need to do is to “test” or assess them.   I need to find out a little about them, about the things they like, about the things they are good at and the areas within which they still need to develop.    I have worked in secondary education, further education and higher education and across each stage the first thing I have done with new students is to assess or test them in order to help in planning their learning experience.

So this led me to ask why the story was so new worthy.    My first assumption was that it related to the differing perspectives and definitions for the term of “assessment” and “testing”.    It could be that some see the two terms as meaning the same thing, as I do, while others see each term as meaning something different.    This differing perspective leads to the debate around whether the proposal in question is a good or bad thing and therefore to a news worthy story.

Upon thinking on it further and accepting the commonality of the two terms I came to think that it is not what the two words mean or “are” which is the issue but the reason why we undertake them.    In the case of my testing at the start of working with new students, this is done as I know the benefit such testing will have in terms of providing the best learning experience possible.   It is done because good teaching demands it be done.    In the case of the news story they are discussing mandated testing.     The reason for mandating such testing may be linked to the reasoning I used in deciding to test however the fact it is mandated detracts for this.

The other issue is what is done with the results.   In my example the data is solely for me and to inform learning.  There needn’t be a score or a rubric attached.   In the case of mandated data collection those mandating it want the data which therefore required quantifiable and comparable scores and grades or at least we might assume this is the case.

Maybe we need to trust teachers more rather than mandating what must be done as the act of mandating something changes the activity being mandated!

 

Fitness Fail!

Day 7 and almost a quarter of the way through 29daysofwriting.  Am actually quite impressed with myself that at this point I am still going.  Its also Sunday which means a little bit of a relaxing day, including the wifes birthday then all finished off with #mltchat and #sltchat at the end of the day.

My posting today will focus loosely on assessment as a result of the below message which appeared on my phone this morning:

So although I may be doing ok at #29daysofwriting my phone is unimpressed at my fitness levels.   I have never been a particularly fit person and recently I have noted how much I struggle in terms of the health and fitness aspect of my life.     As such this was something I was trying to build upon and up until this morning I felt I was making some progress, then my phone provided me with this assessment of my performance.

I liken this message to the large and often standardized tests which we provide students.   I would suggest that students may end up feeling as I did today;  dejected, de-motivated, disappointed and disengaged to name but a few words beginning with “de” or “dis”.

Prior to receiving this message I thought I have been making progress as daily I was seeing an upward trend in the amount of exercise I was doing.     My measurement of exercise being steps taken as recorded by my phone.    I had also built up a bit of understanding as to how my exercise developed over the week, noting that my worst performance was at the beginning and end of the week, peeking with my best performance in the middle of the week.

This brought the realization that maybe I would have to focus on the start and end of the week with focused activities to improve my performance, whereas in the middle of the week when things were going ok, it might equally be ok to continue as currently.

Again looking at students this daily or regular feedback might be akin to assessment for learning with assessment data provided frequently and students required to use the data to drive improvement.   At least in me, this regular data did not dishearten or de-motivate, I as attempted to improve.

This makes me things that it is important to consider the frequency of testing and assessment, plus how we frame feedback.   I will admit that this isn’t anything new.

The issue here though is how I can get back to exercising following the de-motivational impact of my phones message.   The good thing is I consider myself to be quite resilient although I will leave that discussion for a later posting.