Offline Conservative RL for Transaction Authorization: Smartly Balancing Fraud Risk and Customer Friction

Yang Ximeng; Zhang Yiming

doi:10.70393/6a6574626d.333932

OPEN ACCESS |Research Article ||18 February 2026

Offline Conservative RL for Transaction Authorization: Smartly Balancing Fraud Risk and Customer Friction

Yang Ximeng

Board of Directors, Excellent Era Lending Service Corp., Makati, Philippines

Cocoliu898@gmail.com

Board of Directors, Excellent Era Lending Service Corp., Makati, Philippines , PH, Cocoliu898@gmail.com.

Zhang Yiming

Department of Financial Technology‌, Peking University, Peking, China

zhang1ming137@outlook.com

Department of Financial Technology‌, Peking University, Peking, China , CN, zhang1ming137@outlook.com.

* Corresponding Author¹: Yang Ximeng , E-Mail: Cocoliu898@gmail.com

Publication

Accepted 2026 February 14 ; Published 2026 February 18

Journal of Economic Theory and Business Management, 2026, 3(1), 3006-4953.

Abstract

This study instantiates credit strategy optimization at the transaction authorization layer, with actions approve, review, and decline. Within an Offline Conservative RL (CQL) framework, we co-optimize fraud loss, operational burden from manual reviews, and customer friction from false positives and delays via a unified multi-objective cost function. Using a public credit-card transaction dataset with severe class imbalance, the learned policy improves total cost relative to cost-sensitive supervised baselines, while offering favorable trade-offs along a Pareto frontier between risk, operations, and friction. We detail the MDP design (state featurization, action space, and cost weights) and show that CQL mitigates out-of-distribution overestimation in offline settings. The results indicate that conservative RL is a practical path for transaction-level credit decision-making that balances fraud risk with operational efficiency and user impact.

Keywords

Offline Reinforcement Learning , Cost-Sensitive Credit Risk Optimization , User-Centric Financial Decision Systems , Conservative Q-Learning CQL .

Metadata

DOI:

10.70393/6a6574626d.333932

ARK:

ark:/40704/JETBM.v3n1a01

Pages: 1-9

References: 20

Disciplines: Business Analytics

Subjects: Econometric Modeling

Cite This Article

APA Style

Ximeng , Y. & Yiming , Z. (2026). Offline conservative rl for transaction authorization: smartly balancing fraud risk and customer friction. Journal of Economic Theory and Business Management, 3(1), 1-9. https://doi.org/10.70393/6a6574626d.333932

Acknowledgments

Not Applicable.

FUNDING

Not Applicable.

INSTITUTIONAL REVIEW BOARD STATEMENT

Not Applicable.

DATA AVAILABILITY STATEMENT

Not Applicable.

INFORMED CONSENT STATEMENT

Not Applicable.

CONFLICT OF INTEREST

Not Applicable.

AUTHOR CONTRIBUTIONS

Not application.

References

1.

Khraishi, R., & Okhrati, R. (2022, November). Offline deep reinforcement learning for dynamic pricing of consumer credit. In Proceedings of the Third ACM International Conference on AI in Finance (pp. 325–333).

2.

So, M. M., & Thomas, L. C. (2011). Modelling the profitability of credit cards by Markov decision processes. European Journal of Operational Research, 212(1), 123–130.

3.

Sewak, M. (2019). Temporal difference learning, SARSA, and Q-learning: Some popular value approximation-based reinforcement learning approaches. In Deep reinforcement learning: Frontiers of artificial intelligence (pp. 51–63). Springer.

4.

Sha, F., Ding, C., Zheng, X., et al. (2025). Weathering the policy storm: How trade uncertainty shapes firm financial performance through innovation and operations. International Review of Economics & Finance, 104274.

5.

Deng, X. (2025). Cooperative optimization strategies for data collection and machine learning in large-scale distributed systems. In 2025 4th International Symposium on Computer Applications and Information Technology (ISCAIT) (pp. 2151–2154). IEEE.

6.

Trench, M. S., Pederson, S. P., Lau, E. T., Ma, L., Wang, H., & Nair, S. K. (2003). Managing credit lines and prices for Bank One credit cards. Interfaces, 33(5), 4–21.

7.

Wiesemann, W., Kuhn, D., & Rustem, B. (2013). Robust Markov decision processes. Mathematics of Operations Research, 38(1), 153–183.

8.

Tan, C., Gao, F., Song, C., Xu, M., Li, Y., & Ma, H. (2024). Highly reliable CI-JSO based densely connected convolutional networks using transfer learning for fault diagnosis. Journal of Information Systems Engineering and Management. https://doi.org/10.52783/jisem.v10i4.12207

9.

Tan, C., Gao, F., Song, C., Xu, M., Li, Y., & Ma, H. (2024). Proposed damage detection and isolation from limited experimental data based on a deep transfer learning and an ensemble learning classifier. Journal of Information Systems Engineering and Management. https://doi.org/10.52783/jisem.v10i4.12206

10.

Han, X., & Dou, X. (2025). User recommendation method integrating hierarchical graph attention network with multimodal knowledge graph. Frontiers in Neurorobotics, 19, 1587973.

11.

Zhuang, R. (2025). Evolutionary logic and theoretical construction of real estate marketing strategies under digital transformation. Economics and Management Innovation, 2(2), 117–124.

12.

Yang, Z., et al. (2025). RLHF fine-tuning of LLMs for alignment with implicit user feedback in conversational recommenders. arXiv. https://arxiv.org/abs/2508.05289

13.

Deng, X., & Yang, J. (2025). Multi-layer defense strategies and privacy-preserving enhancements for membership reasoning attacks in a federated learning framework. In 2025 5th International Conference on Computer Science and Blockchain (CCSB) (pp. 278–282). IEEE.

14.

Tan, C. (2024). The application and development trends of artificial intelligence technology in automotive production. Artificial Intelligence Technology Research, 2(5).

15.

Zhang, L., & Meng, Q. (2025, September). User portrait-driven smart home device deployment optimization and spatial interaction design. In 2025 5th International Conference on Artificial Intelligence, Automation and High Performance Computing (AIAHPC) (pp. 724–728). IEEE.

16.

Yang, H., Tian, Y., Yang, Z., Wang, Z., Zhou, C., & Li, D. (2025). Research on model parallelism and data parallelism optimization methods in large language model-based recommendation systems. arXiv. https://arxiv.org/abs/2506.17551

17.

Gonzalez, J., Tran, V., Meredith, J., Xu, I., Penchala, R., Vilar-Ribó, L., et al. (2025). How it begins: Initial response to opioids strongly predicts self-reported opioid use disorder. medRxiv.

18.

Wozabal, D., & Hochreiter, R. (2012). A coupled Markov chain approach to credit risk modeling. Journal of Economic Dynamics and Control, 36(3), 403–415.

19.

Kumar, A., Zhou, A., Tucker, G., & Levine, S. (2020). Conservative Q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33, 1179–1191.

20.

Mendonca, R., Geng, X., Finn, C., & Levine, S. (2020). Meta-reinforcement learning that is robust to distributional shifts via model identification and experience relabeling. arXiv. https://arxiv.org/abs/2006.07178

PUBLISHER'S NOTE

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Copyright © 2025 The Author(s). Published by Southern United Academy of Sciences.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.