Benchmarking Learned Cardinality Estimation Techniques for Analytical Query Processing in Data Warehouses

Jiacheng Hu; Xu Wang; Jiawen Lai

doi:10.70393/6a6374616d.343134

OPEN ACCESS |Research Article ||18 May 2026

Benchmarking Learned Cardinality Estimation Techniques for Analytical Query Processing in Data Warehouses

Jiacheng Hu

University of New South Wales

jessicar@gmail.com

Master’s Degree in Information Technology

Xu Wang

Beijing University of Posts and Telecommunications

jessicar@gmail.com

Computer Science

Jiawen Lai

University of California

jessicar@gmail.com

Computer Engineering

* Corresponding Author¹: Jiacheng Hu, E-Mail: jessicar@gmail.com

Publication

Accepted 2026 May 14 ; Published 2026 May 18

Journal of Computer Technology and Applied Mathematics, 2026, 3(3), 3007-4126.

Abstract

Cardinality estimation remains one of the most critical yet error-prone components of query optimization in modern data warehouses. Recent advances in machine learning have produced a diverse family of learned cardinality estimators that demonstrate substantial accuracy improvements on standard benchmarks. Yet existing evaluations predominantly rely on third-normal-form schemas, leaving their effectiveness on star and snowflake schemas—the backbone of analytical data warehousing—largely unexplored. This paper presents a systematic empirical evaluation of seven representative learned cardinality estimation methods spanning three paradigmatic categories: query-driven, data-driven, and hybrid approaches. All methods are benchmarked against the PostgreSQL histogram-based estimator on three complementary datasets: TPC-DS with its native snowflake schema, STATS-CEB with real-world relational data, and IMDB/JOB as the established cross-study reference. The evaluation encompasses estimation accuracy measured by Q-Error and P-Error, inference latency, training cost, model compactness, end-to-end query execution time, and robustness under simulated ETL batch insertions. Results indicate that hybrid methods, particularly FactorJoin, achieve the strongest accuracy on data warehouse workloads with a median Q-Error of 1.74 on TPC-DS, while data-driven methods such as FLAT and BayesCard offer a favorable balance between accuracy and inference speed. BayesCard and FactorJoin exhibit the highest resilience to data updates, with median Q-Error increasing by fewer than 1.5 points after a 50% data insertion. These findings provide actionable guidance for practitioners seeking to deploy learned cardinality estimation in production data warehouse environments.

Keywords

Learned Cardinality Estimation , Data Warehouse , Query Optimization , Benchmark Evaluation .

Metadata

DOI:

10.70393/6a6374616d.343134

ARK:

ark:/40704/JCTAM.v3n3a01

Pages: 1-8

References: 21

Disciplines: Software Systems

Subjects: Other

Cite This Article

APA Style

Hu, J., Wang, X. & Lai, J. (2026). Benchmarking learned cardinality estimation techniques for analytical query processing in data warehouses. Journal of Computer Technology and Applied Mathematics, 3(3), 1-8. https://doi.org/10.70393/6a6374616d.343134

Acknowledgments

Not Applicable.

FUNDING

Not Applicable.

INSTITUTIONAL REVIEW BOARD STATEMENT

Not Applicable.

DATA AVAILABILITY STATEMENT

Not Applicable.

INFORMED CONSENT STATEMENT

Not Applicable.

CONFLICT OF INTEREST

Not Applicable.

AUTHOR CONTRIBUTIONS

Not application.

References

1.

Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. (2015). How good are query optimizers, really? Proceedings of the VLDB Endowment, 9(3), 204–215.

2.

Zhou, X., Chai, C., Li, G., & Sun, J. (2022). Database meets artificial intelligence: A survey. IEEE Transactions on Knowledge and Data Engineering, 34(3), 1096–1116.

3.

Han, Y., Wang, H., Chen, L., Dong, Y., Chen, X., Yu, B., Yang, C., & Qian, W. (2024). ByteCard: Enhancing ByteDance's data warehouse with learned cardinality estimation. In Proceedings of the 2024 ACM SIGMOD International Conference on Management of Data.

4.

Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., & Kemper, A. (2019). Learned cardinalities: Estimating correlated joins with deep learning. In Proceedings of the 9th Biennial Conference on Innovative Data Systems Research (CIDR).

5.

Negi, P., Marcus, R., Kipf, A., Mao, H., Tatbul, N., Kraska, T., & Alizadeh, M. (2021). Flow-Loss: Learning cardinality estimates that matter. Proceedings of the VLDB Endowment, 14(11), 2019–2032.

6.

Yang, Z., Liang, E., Kamsetty, A., Wu, C., Duan, Y., Chen, X., Abbeel, P., Hellerstein, J. M., Krishnan, S., & Stoica, I. (2019). Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment, 13(3), 279–292.

7.

Yang, Z., Kamsetty, A., Luan, S., Liang, E., Duan, Y., Chen, X., & Stoica, I. (2020). NeuroCard: One cardinality estimator for all tables. Proceedings of the VLDB Endowment, 14(1), 61–73.

8.

Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., & Binnig, C. (2020). DeepDB: Learn from data, not from queries! Proceedings of the VLDB Endowment, 13(7), 992–1005.

9.

Zhu, R., Wu, Z., Han, Y., Zeng, K., Pfadler, A., Qian, Z., Zhou, J., & Cui, B. (2021). FLAT: Fast, lightweight and accurate method for cardinality estimation. Proceedings of the VLDB Endowment, 14(9), 1489–1502.

10.

Wu, P., & Cong, G. (2021). A unified deep model of learning from both data and queries for cardinality estimation. In Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data (pp. 2009–2022).

11.

Wu, Z., Negi, P., Alizadeh, M., Kraska, T., & Madden, S. (2023). FactorJoin: A new cardinality estimation framework for join queries. Proceedings of the ACM on Management of Data, 1(1).

12.

Wang, X., Qu, C., Wu, W., Wang, J., & Zhou, Q. (2021). Are we ready for learned cardinality estimation? Proceedings of the VLDB Endowment, 14(9), 1640–1654.

13.

Han, Y., Wu, Z., Wu, P., Zhu, R., Yang, J., Tan, L. W., Zeng, K., Cong, G., Qin, Y., Pfadler, A., Qian, Z., Zhou, J., Li, J., & Cui, B. (2022). Cardinality estimation in DBMS: A comprehensive benchmark evaluation. Proceedings of the VLDB Endowment, 15(4), 752–765.

14.

Kim, K., Jung, J., Seo, I., Han, W.-S., Choi, K., & Chong, J. (2022). Learned cardinality estimation: An in-depth study. In Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data (pp. 1214–1227).

15.

Zhang, J., Zhang, C., Li, G., & Chai, C. (2021). Learned cardinality estimation: A design space exploration and a comparative evaluation. Proceedings of the VLDB Endowment, 15(1), 85–97.

16.

Wu, Z., Shaikhha, A., Zhu, R., Zeng, K., Han, Y., & Zhou, J. (2020). BayesCard: Revitalizing Bayesian frameworks for cardinality estimation. arXiv preprint arXiv:2012.14743.

17.

Li, P., Wei, W., Zhu, R., Ding, B., Zhou, J., & Lu, H. (2023). ALECE: An attention-based learned cardinality estimator for SPJ queries on dynamic workloads. Proceedings of the VLDB Endowment, 17(2), 197–210.

18.

Wang, J., Chai, C., Liu, J., & Li, G. (2021). FACE: A normalizing flow based cardinality estimator. Proceedings of the VLDB Endowment, 15(1), 72–84.

19.

Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2021). Bao: Making learned query optimization practical. In Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data (pp. 1275–1288).

20.

Sun, J., & Li, G. (2019). An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 13(3), 307–319.

21.

Negi, P., Marcus, R., Kipf, A., Mao, H., Tatbul, N., Kraska, T., & Alizadeh, M. (2023). Robust query driven cardinality estimation under changing workloads. Proceedings of the VLDB Endowment, 16(7), 1520–1533.

PUBLISHER'S NOTE

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Copyright © 2025 The Author(s). Published by Southern United Academy of Sciences.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Journal of Computer Technology and Applied Mathematics (JCTAM)
Published by Southern United Academy of Sciences Limited ISNI: 0000000512776460, operated by the Publications Division.

JCTAM OPEN ACCESS

Journal of Computer Technology and Applied Mathematics

Benchmarking Learned Cardinality Estimation Techniques for Analytical Query Processing in Data Warehouses

Publication

Abstract

Keywords

Metadata

Cite This Article

Acknowledgments

FUNDING

INSTITUTIONAL REVIEW BOARD STATEMENT

DATA AVAILABILITY STATEMENT

INFORMED CONSENT STATEMENT

CONFLICT OF INTEREST

AUTHOR CONTRIBUTIONS

References

PUBLISHER'S NOTE

Persistent Identifiers

Abstracting and Indexing

Quality Assurance

Archiving Services