PENERAPAN SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) TERHADAP DATA TIDAK SEIMBANG PADA PEMBUATAN MODEL KOMPOSISI JAMU
As the times many people use herbal remedies
(jamu) to address health issues. Herbal medicines are made
from plants with a specific composition to produce certain
properties, so a model is needed to be made in order to
find the right formula to make herbal medicine with certain
properties. In this study, the response being investigated is a
potent herbal medicine in treating mood and behavior disorder.
In this analysis, the model is developed using logistic regression.
The accuracy of the model can be seen from the Area Under
Curve (AUC). Imbalanced data on the response variable can
cause the value of AUC become low. One of the ways to
solve it is using Synthetic Minority Oversampling Technique
(SMOTE). From this analysis, Nagelkerke R2 values generated
by the model with SMOTE 3.2% lower than model without
SMOTE. Nonetheless, the model with SMOTE is more accurate
than model without SMOTE because has higher AUC value.
The resulting AUC is equal to 0.976 for the model with SMOTE
and 0.908 for model without SMOTE. The results show that
SMOTE can increase the accuracy of the model for imbalanced
Keywords-imbalance data, logistic regression, SMOTE