A Comparative Empirical Evaluation of Single-Agent and Multi-Agent LLM Prompting Strategies for Automated Formative Feedback in Education

Jiawen Lai; Zan Li

doi:10.70393/6a696574.343135

OPEN ACCESS |Research Article ||18 May 2026

A Comparative Empirical Evaluation of Single-Agent and Multi-Agent LLM Prompting Strategies for Automated Formative Feedback in Education

Jiawen Lai

University of California

timi5645@gmail.com

University of California, US, timi5645@gmail.com.

Zan Li

Peking University

timi5645@gmail.com

Peking University, CN, timi5645@gmail.com.

* Corresponding Author¹: Jiawen Lai, E-Mail: timi5645@gmail.com

Publication

Accepted 2026 May 15 ; Published 2026 May 18

Journal of Intelligence and Engineering Technology, 2026, 1(2), Pending.

Abstract

Automated formative feedback has emerged as a focal point in educational technology research, as large language models (LLMs) offer the prospect of providing personalized commentary on student writing at a scale that human instructors alone cannot match. What is less well examined, however, is how the underlying prompting design—particularly the choice between single-agent and multi-agent setups—shapes the pedagogical value of the feedback produced. To examine this question, we conducted a controlled comparison across four prompting configurations on a corpus of 200 undergraduate argumentative essays: a zero-shot single-agent baseline, a chain-of-thought single-agent variant, a dual-role multi-agent pipeline in which one model drafts feedback and another critiques it, and a tri-role multi-agent pipeline that introduces a dedicated revision stage on top of the draft-and-critique loop. Each set of feedback outputs was assessed along a multi-dimensional rubric covering accuracy, specificity, constructiveness, and tone, with three trained raters scoring independently. We also computed automated textual similarity metrics against expert-authored reference feedback to complement the human ratings and provide a more independent check. The tri-role multi-agent configuration produced the highest composite quality scores and, notably, the lowest rates of over-praise and hallucinated claims about essay content. The chain-of-thought single-agent variant, while not topping the rankings, delivered surprisingly close quality at a fraction of the inference cost, making it an attractive option when computational budget or latency matters. We close by discussing what these patterns mean in practice for educators and developers looking to integrate LLM-based feedback agents into higher-education writing workflows at scale.

Keywords

Large Language Model , AI Agent , Formative Feedback , Multi-Agent Prompting .

Metadata

DOI:

10.70393/6a696574.343135

ARK:

ark:/40704/JIET.v1n2a05

Pages: 29-38

References: 28

Disciplines: Intelligent Systems

Subjects: Other

Cite This Article

APA Style

Lai, J. & Li, Z. (2026). A comparative empirical evaluation of single-agent and multi-agent llm prompting strategies for automated formative feedback in education. Journal of Intelligence and Engineering Technology, 1(2), 29-38. https://doi.org/10.70393/6a696574.343135

Acknowledgments

Not Applicable.

FUNDING

Not Applicable.

INSTITUTIONAL REVIEW BOARD STATEMENT

Not Applicable.

DATA AVAILABILITY STATEMENT

Not Applicable.

INFORMED CONSENT STATEMENT

Not Applicable.

CONFLICT OF INTEREST

Not Applicable.

AUTHOR CONTRIBUTIONS

Not application.

References

1.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837.

2.

Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Kruber, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Pouly, O., Renz, L., Schneider, D., Schuller, B., ... Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274.

3.

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Zhang, S., Zhu, E., Li, B., Jiang, L., Zhang, X., & Wang, C. (2023). AutoGen: Enabling next-gen LLM applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155.

4.

Chen, L., Chen, P., & Lin, Z. (2020). Artificial intelligence in education: A review. IEEE Access, 8, 75264–75278.

5.

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. In Proceedings of the International Conference on Learning Representations (ICLR).

6.

Guo, S., Wei, W., Xu, L., Wang, X., Cai, Z., & Li, H. (2024). Using generative AI and multi-agents to provide automatic feedback. arXiv preprint arXiv:2411.07407.

7.

Yan, L., Sha, L., Zhao, L., Li, Y., Martinez-Maldonado, R., Chen, G., Li, X., Jin, Y., & Gašević, D. (2024). Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology, 55(4), 1340–1373.

8.

Sheng, J. Y., Jia, X. Y., Guo, Z. H., Gao, Y., Cao, Y. P., & Feng, X. Q. (2025). Characterizing Layer-Specific Mechanical Properties of Soft Materials by Pipette Aspiration Using Transformer Model and SHapley Additive exPlanations. International Journal of Applied Mechanics, 17(06), 2550048.

9.

Guo, Z., Man, Y., Sheng, J., Lin, B., Ahmed, A., Jiang, B., ... & Zhang, C. (2026). Event-VStream: Event-Driven Real-Time Understanding for Long Video Streams. arXiv preprint arXiv:2601.15655.

10.

Zhang, D., & Ma, X. (2025). Machine Learning-Based Credit Risk Assessment for Green Bonds: Climate Factor Integration and Default Prediction Analysis. Journal of Sustainability, Policy, and Practice, 1(2), 121-135.

11.

Trinh, T. K., & Zhang, D. (2024). Algorithmic fairness in financial decision-making: Detection and mitigation of bias in credit scoring applications. Journal of Advanced Computing Systems, 4(2), 36-49.

12.

Zhang, Y. (2026). A Comparative Study of Machine Learning Methods for Automated Customer Service Dialogue Quality Assessment. Journal of Science, Innovation & Social Impact, 2(1), 328-338.

13.

Dong, B., Zhang, D., & Xin, J. (2024). Deep reinforcement learning for optimizing order book imbalance-based high-frequency trading strategies. Journal of Computing Innovations and Applications, 2(2), 33-43.

14.

Zhang, D., & Feng, E. (2024). Quantitative Assessment of Regional Carbon Neutrality Policy Synergies Based on Deep Learning. Journal of Advanced Computing Systems, 4(10), 38-54.

15.

Abu-Rasheed, H., Weber, C., & Fathi, M. (2024). Knowledge graphs as context sources for LLM-based explanations of learning recommendations. In 2024 IEEE Global Engineering Education Conference (EDUCON) (pp. 1–5). IEEE.

16.

Liang, D. (2026). Identifying Undisclosed Related Party Relationships and Revenue Recognition Irregularities: A Rule-Based Analytical Approach for Audit Planning. Journal of Science, Innovation & Social Impact, 2(2), 26-36.

17.

Tang, T., & Yu, M. (2024). A Comparative Empirical Study of Semantic Signal Enhancement Methods for User Interest Features in CTR Prediction: Applicability of TF-IDF Weighting, Sentence-BERT Embeddings, and LDA Topic Fusion. Journal of Computing Innovations and Applications, 2(1), 165-174.

18.

Li, M., Wang, X., & Yu, M. (2025). Comparative Evaluation of Zero-Shot and Few-Shot Performance of Large Language Models in Low-Resource Language Machine Translation. Journal of Global Engineering Review, 3(2), 59-68.

19.

Wang, X., Fu, X., & Zou, D. (2025). Passage, Sentence, or Proposition? An Empirical Comparison of Retrieval Granularity Effects on LLM Answer Accuracy in Retrieval-Augmented Generation. Journal of Global Engineering Review, 3(1), 81-90.

20.

Dai, Y., Liu, A., & Li, H. (2025). A practical guide for supporting formative assessment and feedback using generative AI. arXiv preprint arXiv:2505.23405.

21.

Zhang, M., Lindsay, E. D., Thorbensen, F. B., Poulsen, D. B., & Bjerva, J. (2025). SEFL: Harnessing large language model agents to improve educational feedback systems. arXiv preprint arXiv:2502.12927.

22.

Park, J., Kim, S., Lee, H., & Chen, W. (2025). Enhancing game-based learning with AI-driven peer agents. In Proceedings of the IEEE Frontiers in Education Conference (FIE). IEEE.

23.

Scarlatos, A., Brinton, C., & Lan, A. (2025). Dialogue-driven knowledge tracing with LLM agents. In Proceedings of the ACM Conference on Learning @ Scale.

24.

Cohn, C., Hutchins, N., Biswas, G., & Hastings, P. (2026). A theory of adaptive scaffolding for LLM-based pedagogical agents. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI.

25.

Chung, P. T. (2026). Multi-Objective Optimization of Process Parameters for Dental Resin 3D Printing Using Improved NSGA-II Algorithm. Journal of Science, Innovation & Social Impact, 2(1), 276-287.

26.

Liu, Y. (2026). AI-Enhanced Healthcare Data Quality Governance: An Integrated Approach for Anomaly Detection and Integrity Verification. Journal of Sustainability, Policy, and Practice, 2(1), 215-229.

27.

Wang, Y. (2026). Explainable Risk Stratification for Polypharmacy-Related Adverse Outcomes in Community-Dwelling Elderly: A Rule-Enhanced Machine Learning Approach. Journal of Sustainability, Policy, and Practice, 2(2), 18-31.

28.

Li, Y. (2026). Performance Benchmarking and Optimization Strategies for Depth Estimation Algorithms in Unstructured Environments. Journal of Sustainability, Policy, and Practice, 2(2), 32-43.

PUBLISHER'S NOTE

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Copyright © 2025 The Author(s). Published by Southern United Academy of Sciences.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Journal of Intelligence and Engineering Technology (JIET)
Published by Southern United Academy of Sciences Limited ISNI: 0000000512776460, operated by the Publications Division.

JIET OPEN ACCESS

Journal of Intelligence and Engineering Technology

A Comparative Empirical Evaluation of Single-Agent and Multi-Agent LLM Prompting Strategies for Automated Formative Feedback in Education

Publication

Abstract

Keywords

Metadata

Cite This Article

Acknowledgments

FUNDING

INSTITUTIONAL REVIEW BOARD STATEMENT

DATA AVAILABILITY STATEMENT

INFORMED CONSENT STATEMENT

CONFLICT OF INTEREST

AUTHOR CONTRIBUTIONS

References

PUBLISHER'S NOTE

Persistent Identifiers

Abstracting and Indexing

Quality Assurance

Archiving Services