Risk Labeling and Classification Prediction of Logistics Claims Based on Machine Learning

Yu Chen , Cheng Liang , Biaoyong Liang
Article
2026 / Volume 9 / Pages 3073-3095
Published 25 April 2026

Abstract

Targeting the scenario of logistics claim governance, this paper proposes a risk labeling and prediction framework that incorporates business proportion constraints. First, missing values are processed, and two core features-"claim gap" and "claim amount ratio"-are constructed and standardized. Subsequently, a capacitated clustering method is employed to generate three risk labels. Under the hard constraints of "reasonable demands proportion ≥ 85%" and "serious excess proportion ≤ 3%", the sample proportions of the three categories in the final clustering solution are 0.850004, 0.119996, and 0.029999, respectively. Secondly, with the “actual compensation amount” as the regression target, models including Linear Regression, Decision Tree, XGBoost, LightGBM, CatBoost, and Random Forest are compared. The Random Forest model achieves R^2=0.7706, RMSE=135.5171, and MAE=92.1084 on the test set, and is utilized for the first-stage prediction in indirect classification. Finally, two risk classification routes are compared: direct classification (training classifiers using clustering labels as supervision signals) and indirect classification (predicting the compensation amount first, reconstructing the two-dimensional features, and outputting risk categories based on the nearest-centroid rule). The results indicate that indirect classification maintains consistency with the labeling rules while exhibiting more stable comprehensive performance and better business interpretability.

Keywords

k-means clustering, linear regression, random forest, decision tree, XGBoost