A new study from the University of Georgia explores how AI might ease the grading burden on teachers, promising faster scoring and more timely feedback for students.
Grading assignments can be one of the most time-consuming aspects of a teacher’s job. New research from the University of Georgia suggests that artificial intelligence (AI) could significantly alleviate this burden, allowing educators to focus more on teaching and less on administrative tasks.
With the adoption of the Next Generation Science Standards in many states, the complexity of grading has increased. These standards emphasize students’ abilities to argue, investigate and analyze data. However, this has made the grading process even more challenging.
“Asking kids to draw a model, to write an explanation, to argue with each other are very complex tasks,” corresponding author Xiaoming Zhai, an associate professor inUGA’s Mary Frances Early College of Education, said in a news release. “Teachers often don’t have enough time to score all the students’ responses, which means students will not be able to receive timely feedback.”
The study, published in Technology, Knowledge and Learning, explored the performance of Large Language Models (LLMs) like Mixtral in grading student work.
Unlike traditional AI, LLMs are trained using vast amounts of data from the internet, allowing them to generate human-like language.
For this study, Mixtral was presented with middle school students’ written responses to questions about particle behavior in different temperatures. The AI then created rubrics to assess student performance and assign scores.
However, the research revealed the limitations of relying solely on AI for grading. While LLMs can process responses rapidly, they often use shortcuts, such as identifying specific keywords, which can lead to incorrect assumptions about a student’s understanding.
“Students could mention a temperature increase, and the large language model interprets that all students understand the particles are moving faster when temperatures rise,” Zhai added. “But based upon the student writing, as a human, we’re not able to infer whether the students know whether the particles will move faster or not.”
The study’s key finding was that AI’s grading accuracy significantly improves when combined with human-created rubrics. Without these guidelines, the LLMs had an accuracy rate of just 33.5%. But with access to human-made rubrics, this rate jumped to over 50%.
“The train has left the station, but it has just left the station,” added Zhai. “It means we still have a long way to go when it comes to using AI, and we still need to figure out which direction to go in.”
Despite these challenges, the potential benefits of AI in education are significant. If AI tools are refined to provide more accurate and nuanced grading, they could save teachers countless hours usually spent on grading and feedback.
Some educators have already expressed enthusiasm for this potential development.
“Many teachers told me, ‘I had to spend my weekend giving feedback, but by using automatic scoring, I do not have to do that. Now, I have more time to focus on more meaningful work instead of some labor-intensive work,’” Zhai added. “That’s very encouraging for me.”
Source: University of Georgia