AJNS OPEN ACCESS

Academic Journal of Natural Science

ISSN:3078-5170 (print) | ISSN:3078-5189 (online) | Publication Frequency: Quarterly

OPEN ACCESS|Research Article||15 April 2025

Research on Text Classification Methods Based on Decision Trees: A Case Study on the Recognition of the Entity Category 'Position'

* Corresponding Author1: Qiming Xing, E-Mail: xqm200104@gmail.com

Publication

Accepted 2025 April 10 ; Published 2025 April 15

Academic Journal of Natural Science, 2025, 2(2), 3078-5170.

Abstract

This paper investigates the application of a decision tree model for the binary classification task of the 'Position' category on the CLUENER2020 dataset, aiming to provide a lightweight and efficient method for named entity recognition. The CLUENER2020 dataset includes multiple label categories, among which the accurate identification of the 'Position' category is of significant importance for information extraction and text processing. Through data preprocessing, feature extraction, model training, and testing, this study evaluates the performance of the decision tree model on this task. The experimental results indicate that the model achieves an overall accuracy of 98%, with a precision of 98%, recall of 100%, and F1 score of 99% for the 'Non-Position' category, while the 'Position' category has a precision of 100%, recall of 85%, and F1 score of 92%. Although the model performs excellently on the 'Non-Position' category, the lower recall rate for the 'Position' category reveals a certain degree of missed detection, primarily attributed to the class imbalance in the dataset and the complexity of text features related to positions. The contribution of this paper lies in validating the applicability of traditional machine learning models for specific named entity recognition tasks. Particularly in resource-constrained scenarios, the decision tree model offers a feasible solution. Future research could further enhance model performance and improve the accuracy and robustness of named entity recognition tasks through data augmentation techniques, the integration of more complex model architectures, and in-depth feature engineering and hyperparameter optimization methods.

Keywords

Decision Tree , Named Entity Recognition , CLUENER2020 , Word Classification .

Metadata

Pages: 10-15

References: 14

Disciplines: Computer Science

Subjects: Data Science

Cite This Article

APA Style

Xing, Q. & Wang, Y. (2025). Research on text classification methods based on decision trees: a case study on the recognition of the entity category 'position'. Academic Journal of Natural Science, 2(2), 10-15. https://doi.org/10.70393/616a6e73.323835

Acknowledgments

The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

FUNDING

Not applicable.

INSTITUTIONAL REVIEW BOARD STATEMENT

Not applicable.

DATA AVAILABILITY STATEMENT

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

INFORMED CONSENT STATEMENT

Not applicable.

CONFLICT OF INTEREST

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

AUTHOR CONTRIBUTIONS

Not applicable.

References

1.
Mohit, B. (2014). Named entity recognition. In Natural language processing of semitic languages (pp. 221-245). Berlin, Heidelberg: Springer Berlin Heidelberg.

2.
Chowdhary, K., & Chowdhary, K. R. (2020). Natural language processing. Fundamentals of artificial intelligence, 603-649.

3.
Xu, L., Dong, Q., Liao, Y., Yu, C., Tian, Y., Liu, W., ... & Zhang, X. CLUENER2020: Fine-grained named entity recognition dataset and benchmark for chinese. arXiv 2020. arXiv preprint arXiv:2001.04351.

4.
Rabiner, L., & Juang, B. (1986). An introduction to hidden Markov models. ieee assp magazine, 3(1), 4-16.

5.
Sutton, C., & McCallum, A. (2012). An introduction to conditional random fields. Foundations and Trends® in Machine Learning, 4(4), 267-373.

6.
Salehinejad, H., Sankar, S., Barfett, J., Colak, E., & Valaee, S. (2017). Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078.

7.
Graves, A., & Graves, A. (2012). Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37-45.

8.
Koroteev, M. V. (2021). BERT: a review of applications in natural language processing and understanding. arXiv preprint arXiv:2103.11943.

9.
Qaiser, S., & Ali, R. (2018). Text mining: use of TF-IDF to examine the relevance of words to documents. International journal of computer applications, 181(1), 25-29.

10.
Yu, T., & Zhu, H. (2020). Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689.

11.
Song, Q., Xia, S., & Wu, Z. (2024, May). Automatic Optimization of Hyperparameters for Deep Convolutional Neural Networks: Grid Search Enhanced with Coordinate Ascent. In Proceedings of the 2024 International Conference on Machine Intelligence and Digital Applications (pp. 300-306).

12.
Wu, J., Chen, X. Y., Zhang, H., Xiong, L. D., Lei, H., & Deng, S. H. (2019). Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology, 17(1), 26-40.

13.
He, R., Li, B., Li, F., & Song, Q. (2024). A Review of Feature Engineering Methods in Regression Problems. Academic Journal of Natural Science, 1(1), 32-40.

14.
Song, Q., & Xia, S. (2024). Research on the Effectiveness of Different Outlier Detection Methods in Common Data Distribution Types. Journal of Computer Technology and Applied Mathematics, 1(1), 13-25.

PUBLISHER'S NOTE

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

cc Copyright © 2025 The Author(s). Published by Southern United Academy of Sciences.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
t