
JIET OPEN ACCESS
Journal of Intelligence and Engineering Technology
ISSN:Pending (print) | ISSN:Pending (online) | Publication Frequency: Quarterly
A Comparative Empirical Evaluation of Single-Agent and Multi-Agent LLM Prompting Strategies for Automated Formative Feedback in Education
* Corresponding Author1: Jiawen Lai, E-Mail: timi5645@gmail.com
Publication
Accepted 2026 May 15 ; Published 2026 May 18
Journal of Intelligence and Engineering Technology, 2026, 1(2), Pending.
Abstract
Automated formative feedback has emerged as a focal point in educational technology research, as large language models (LLMs) offer the prospect of providing personalized commentary on student writing at a scale that human instructors alone cannot match. What is less well examined, however, is how the underlying prompting design—particularly the choice between single-agent and multi-agent setups—shapes the pedagogical value of the feedback produced. To examine this question, we conducted a controlled comparison across four prompting configurations on a corpus of 200 undergraduate argumentative essays: a zero-shot single-agent baseline, a chain-of-thought single-agent variant, a dual-role multi-agent pipeline in which one model drafts feedback and another critiques it, and a tri-role multi-agent pipeline that introduces a dedicated revision stage on top of the draft-and-critique loop. Each set of feedback outputs was assessed along a multi-dimensional rubric covering accuracy, specificity, constructiveness, and tone, with three trained raters scoring independently. We also computed automated textual similarity metrics against expert-authored reference feedback to complement the human ratings and provide a more independent check. The tri-role multi-agent configuration produced the highest composite quality scores and, notably, the lowest rates of over-praise and hallucinated claims about essay content. The chain-of-thought single-agent variant, while not topping the rankings, delivered surprisingly close quality at a fraction of the inference cost, making it an attractive option when computational budget or latency matters. We close by discussing what these patterns mean in practice for educators and developers looking to integrate LLM-based feedback agents into higher-education writing workflows at scale.
Keywords
Large Language Model , AI Agent , Formative Feedback , Multi-Agent Prompting .
Metadata
Pages: 29-38
References: 28
Disciplines: Intelligent Systems
Subjects: Other
Cite This Article
APA Style
Lai, J. & Li, Z. (2026). A comparative empirical evaluation of single-agent and multi-agent llm prompting strategies for automated formative feedback in education. Journal of Intelligence and Engineering Technology, 1(2), 29-38. https://doi.org/10.70393/6a696574.343135
Acknowledgments
Not Applicable.
FUNDING
Not Applicable.
INSTITUTIONAL REVIEW BOARD STATEMENT
Not Applicable.
DATA AVAILABILITY STATEMENT
Not Applicable.
INFORMED CONSENT STATEMENT
Not Applicable.
CONFLICT OF INTEREST
Not Applicable.
AUTHOR CONTRIBUTIONS
Not application.
References
PUBLISHER'S NOTE
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Copyright © 2025 The Author(s). Published by Southern United Academy of Sciences.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Persistent Identifiers





Abstracting and Indexing




Quality Assurance


Archiving Services
t



