Data Quality Control in Semiconductor Manufacturing through Automated ETL Processes and Class Imbalance Handling Techniques

Min Yin

doi:10.70393/6a69656173.333532

OPEN ACCESS |Research Article ||3 December 2025

Data Quality Control in Semiconductor Manufacturing through Automated ETL Processes and Class Imbalance Handling Techniques

Min Yin

University of California-Berkeley

gmiayinc@gmail.com

University of California-Berkeley, 94720, USA.

* Corresponding Author¹: Min Yin, E-Mail: gmiayinc@gmail.com

Publication

Accepted 2025 November 29 ; Published 2025 December 3

Journal of Industrial Engineering and Applied Science, 2025, 3(6), 3005-6071.

Abstract

In semiconductor manufacturing, ensuring data quality is crucial for maintaining high production efficiency and product consistency. However, missing values, noise, and class imbalance in sensor data complicate the quality control process. This paper proposes a comprehensive framework that automates data cleaning and quality control by integrating ETL processes, advanced interpolation techniques, and class imbalance handling methods. A feature selection mechanism based on a voting strategy is introduced to optimize model predictions. Our research on real semiconductor manufacturing data validates the accuracy of the proposed method in improving data quality, yield, and defect detection prediction accuracy. This contributes to advancing data quality control in semiconductor manufacturing and provides a practical approach for future research in industrial data management and predictive maintenance.

Keywords

Automated ETL Processes , Data Quality Control , Missing Data Imputation , Class Imbalance Handling , Synthetic Minority Over-sampling Technique , Feature Selection , Yield Prediction , Predictive Maintenance .

Metadata

DOI:

10.70393/6a69656173.333532

ARK:

ark:/40704/JIEAS.v3n6a03

Pages: 13-22

References: 26

Disciplines: Information Science

Subjects: Data Management

Cite This Article

APA Style

Yin, M. (2025). Data quality control in semiconductor manufacturing through automated etl processes and class imbalance handling techniques. Journal of Industrial Engineering and Applied Science, 3(6), 13-22. https://doi.org/10.70393/6a69656173.333532

Acknowledgments

The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

FUNDING

Not applicable.

INSTITUTIONAL REVIEW BOARD STATEMENT

Not applicable.

DATA AVAILABILITY STATEMENT

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

INFORMED CONSENT STATEMENT

Not applicable.

CONFLICT OF INTEREST

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

AUTHOR CONTRIBUTIONS

Not applicable.

References

1.

Sun, Y., & Ortiz, J. (2024). An ai-based system utilizing iot-enabled ambient sensors and llms for complex activity tracking. arXiv preprint arXiv:2407.02606.

2.

Huang, S. (2025). Reinforcement Learning with Reward Shaping for Last-Mile Delivery Dispatch Efficiency. European Journal of Business, Economics & Management, 1(4), 122-130.

3.

Ren, L. (2025). Leveraging Large Language Models for Anomaly Event Early Warning in Financial Systems. European Journal of AI, Computing & Informatics, 1(3), 69-76.

4.

Wang, K. J., Wang, S. M., & Yang, S. J. (2007). A resource portfolio model for equipment investment and allocation of semiconductor testing industry. European Journal of Operational Research, 179(2), 390-403.

5.

Ren, L. (2025). Causal Modeling for Fraud Detection: Enhancing Financial Security with Interpretable AI. European Journal of Business, Economics & Management, 1(4), 94-104.

6.

Chen, Y. (2025). Artificial Intelligence in Economic Applications: Stock Trading, Market Analysis, and Risk Management. Journal of Economic Theory and Business Management, 2(5), 7-14.

7.

Tian, Y., Yang, Z., Liu, C., Su, Y., Hong, Z., Gong, Z., & Xu, J. (2025). CenterMamba-SAM: Center-Prioritized Scanning and Temporal Prototypes for Brain Lesion Segmentation. arXiv preprint arXiv:2511.01243.

8.

Li, K., Chen, X., Song, T., Zhou, C., Liu, Z., Zhang, Z., Guo, J., & Shan, Q. (2025a, March 24). Solving situation puzzles with large language model and external reformulation.

9.

Luo, M., Du, B., Zhang, W., Song, T., Li, K., Zhu, H., ... & Wen, H. (2023). Fleet rebalancing for expanding shared e-Mobility systems: A multi-agent deep reinforcement learning approach. IEEE Transactions on Intelligent Transportation Systems, 24(4), 3868-3881.

10.

Chen, Y. (2025). Interpretable Automated Machine Learning for Asset Pricing in US Capital Markets. Journal of Economic Theory and Business Management, 2(5), 15-21.

11.

Liu, Z. (2022, January 20–22). Stock volatility prediction using LightGBM based algorithm. In 2022 International Conference on Big Data, Information and Computer Network (BDICN) (pp. 283–286). IEEE.

12.

Liu, Z. (2025). Reinforcement Learning for Prompt Optimization in Language Models: A Comprehensive Survey of Methods, Representations, and Evaluation Challenges. ICCK Transactions on Emerging Topics in Artificial Intelligence, 2(4), 173-181.

13.

Wu, H., Zha, Z. J., Wen, X., Chen, Z., Liu, D., & Chen, X. (2019, October). Cross-fiber spatial-temporal co-enhanced networks for video action recognition. In Proceedings of the 27th ACM international conference on multimedia (pp. 620-628).

14.

Liu, Z. (2025). Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems. arXiv:2507.17774.

15.

Jin, Y., Li, Z., Zhang, C., Cao, T., Gao, Y., Jayarao, P., ... & Yin, B. (2024). Shopping mmlu: A massive multi-task online shopping benchmark for large language models. Advances in Neural Information Processing Systems, 37, 18062-18089.

16.

Wang, H., Li, Q., & Liu, Y. (2022). Regularized Buckley–James method for right‐censored outcomes with block‐missing multimodal covariates. Stat, 11(1), e515.

17.

Wang, H., Sun, W., & Liu, Y. (2022). Prioritizing autism risk genes using personalized graphical models estimated from single-cell rna-seq data. Journal of the American Statistical Association, 117(537), 38-51.

18.

Chen, Yinlei. "Daily Asset Pricing Based on Deep Learning: Integrating No-Arbitrage Constraints and Market Dynamics." Journal of Computer Technology and Applied Mathematics 2.6 (2026): 1-10.

19.

Ren, L. (2025). Reinforcement Learning for Prioritizing Anti-Money Laundering Case Reviews Based on Dynamic Risk Assessment. Journal of Economic Theory and Business Management, 2(5), 1-6.

20.

Pang, F. (2020, November). Research on Incentive Mechanism of Teamwork Based on Unfairness Aversion Preference Model. In 2020 2nd International Conference on Economic Management and Model Engineering (ICEMME) (pp. 944-948). IEEE.

21.

Cao S, Wang J, Tse T K T. Life‐cycle cost analysis and life‐cycle assessment of the second‐generation benchmark building subject to typhoon wind loads in Hong Kong[J]. The Structural Design of Tall and Special Buildings, 2023, 32(11-12): e2014.

22.

Ren, L. (2025). Boosting algorithm optimization technology for ensemble learning in small sample fraud detection. Academic Journal of Engineering and Technology Science, 8(4), 53-60.

23.

Wang J, Tse K T, Li S W. Integrating the effects of climate change using representative concentration pathways into typhoon wind field in Hong Kong[C]//Proceedings of the 8th European African Conference on Wind Engineering. 2022: 20-23.

24.

Wang J, Tim K T, Li S, et al. A systematic comparison of the wind profile codifications in the Western Pacific Region[J]. Wind and Structures, 2023, 37(2): 105-115.

25.

Saxena, S., & Unruh, A. (2002). Diagnosis of semiconductor manufacturing equipment and processes. IEEE transactions on semiconductor manufacturing, 7(2), 220-232.

26.

Ditmore, D., Stewart, J., Dudley, R., & Bright, N. (1989, September). Achieving semiconductor equipment reliability. In Proceedings. Seventh IEEE/CHMT International Electronic Manufacturing Technology Symposium, (pp. 5-11). IEEE.

PUBLISHER'S NOTE

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Copyright © 2025 The Author(s). Published by Southern United Academy of Sciences.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Journal of Industrial Engineering and Applied Science (JIEAS)
Published by Southern United Academy of Sciences Limited ISNI: 0000000512776460, operated by the Publications Division.

JIEAS OPEN ACCESS

Journal of Industrial Engineering and Applied Science

Data Quality Control in Semiconductor Manufacturing through Automated ETL Processes and Class Imbalance Handling Techniques

Publication

Abstract

Keywords

Metadata

Cite This Article

Acknowledgments

FUNDING

INSTITUTIONAL REVIEW BOARD STATEMENT

DATA AVAILABILITY STATEMENT

INFORMED CONSENT STATEMENT

CONFLICT OF INTEREST

AUTHOR CONTRIBUTIONS

References

PUBLISHER'S NOTE

Persistent Identifiers

Abstracting and Indexing

Quality Assurance

Archiving Services