On Automated Essay Grading using Large Language Models

Pei Yee Liew, Ian K. T. Tan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)
61 Downloads (Pure)

Abstract

Automated Essay Grading (AEG), combining Automated Essay Scoring (AES) and Automated Writing Evaluation (AWE), is a time-saving solution to the challenges of manual essay evaluation. It aims to reduce the workload on educators by offering a more consistent grading approach. Inspired by ChatGPT’s impressive language comprehension and generation capabilities, this study explored the potential of various Large Language Models (LLMs) in AEG tasks. The models examined include GPT-4, GPT-3.5, PaLM, and LLaMA2. Tailored prompts were designed and their performance was assessed in conjunction with each LLM through prompt engineering. Our study shows that LLMs can achieve substantial agreement with human markers in AES, with a Quadratic Weighted Kappa (QWK) score of 0.68. In AWE, the feedback on the essay was assessed qualitatively. It achieved an agreement level score of 4.9 (out of 5) with a standard deviation of 0.05, closely aligned with human assessment. This study provided valuable insights into the effectiveness of LLMs in automated essay grading. It highlighted their potential to enhance educational assessment practices.

Original languageEnglish
Title of host publicationCSAI '24: Proceedings of the 2024 8th International Conference on Computer Science and Artificial Intelligence
PublisherAssociation for Computing Machinery
Pages204-211
Number of pages8
ISBN (Print)9798400718182
DOIs
Publication statusPublished - 15 Feb 2025
Event8th International Conference on Computer Science and Artificial Intelligence 2024 - Beijing, China
Duration: 6 Dec 20248 Dec 2024

Conference

Conference8th International Conference on Computer Science and Artificial Intelligence 2024
Abbreviated titleCSAI 2024
Country/TerritoryChina
CityBeijing
Period6/12/248/12/24

Keywords

  • automated essay scoring
  • automated writing evaluation
  • large language models
  • prompt engineering

ASJC Scopus subject areas

  • Information Systems and Management
  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'On Automated Essay Grading using Large Language Models'. Together they form a unique fingerprint.

Cite this