Skip to main content

Table 2 CatBoost models with different feature selection cutoffs were evaluated. The feature selection parameters yielding the best performing model are shown in bold

From: A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles

Cutoff percentile of feature importance

Mean ROC AUC

Mean PR AUC

Mean F4 score

Number of features selected after Nested CV

100%

0.950 [0.928–0.973]

0.648 [0.603–0.694]

0.416 [0.337–0.495]

64

90%

0.951 [0.931–0.972]

0.661 [0.597–0.726]

0.481 [0.396–0.567]

64

80%

0.949 [0.927–0.970]

0.672 [0.604–0.741]

0.476 [0.395–0.558]

62

70%

0.955 [0.934–0.975]

0.674 [0.604–0.743]

0.488 [0.423–0.552]

51

60%

0.948 [0.920–0.975]

0.663 [0.596–0.731]

0.466 [0.395–0.538]

43

50%

0.953 [0.928–0.978]

0.674 [0.601–0.746]

0.488 [0.404–0.573]

37

40%

0.957 [0.934–0.981]

0.666 [0.580–0.752]

0.462 [0.382–0.542]

28

30%

0.953 [0.934–0.972]

0.624 [0.538–0.710]

0.390 [0.276–0.505]

18

20%

0.951 [0.934–0.968]

0.637 [0.584–0.691]

0.420 [0.313–0.528]

10

10%

0.922 [0.897–0.946]

0.579 [0.494–0.663]

0.430 [0.321–0.539]

6