Achievements
📈
2024Competition
4th Place Final — Data Analytics FINDIT!
Gadjah Mada University · National
Placed 4th in the Finals of the Data Analytics competition at FINDIT! 2024, organized by Gadjah Mada University — a national competition with over 200 participating teams. The task: predict customer promotion acceptance for a supermarket using demographic and behavioral data.
The problem
- Multiclass classification: predict jumlah_promosi (0–6) — on which promotion number a customer accepts an offer, out of 6 total promotions.
- Dataset: 3,817 customers × 17 features — demographics (age, education, marital status, income), shopping behavior (per-category spend, purchase channel), and complaint history.
- Key challenge: missing values, ordinal categorical features, and class imbalance across 7 promotion classes.
EDA and feature engineering
- Found that income range strongly correlates with promotion acceptance — discretized income into 5 quantile groups (pendapatan_score) to help the model capture this pattern.
- Discovered that customers born before 1930 always had jumlah_promosi = 3 — flagged for potential age-group discretization.
- Identified that jumlah_anak_balita and jumlah_anak_remaja had similar patterns relative to jumlah_promosi — merged into a single jumlah_anak feature.
- Handled missing values using MICE (Multivariate Imputation by Chained Equations) with XGBoost as the base estimator.
- Replaced impossible values ('5') in pendidikan and status_pernikahan with NaN before imputation.
- Mapped ordinal categoricals (SMP→0, SMA→1, Sarjana→2, Magister→3, Doktor→4) for efficient encoding.
Modeling
- Tested multiple treatments: baseline, income/age discretization, jumlah_anak feature creation, and Spearman-correlation-based clustering.
- Best single treatment: discretization (Macro F1: 0.761 ± 0.026 vs baseline 0.748).
- Final model: Stacking ensemble — CatBoost, XGBoost, LightGBM, Hist Gradient Boosting, Random Forest, and Extra Trees as base estimators, with Logistic Regression as the final meta-learner.
- Stacking outperformed all individual models: Macro F1 0.782 ± 0.009, Accuracy 0.779 ± 0.009.
- Evaluated using Stratified K-Fold cross-validation to preserve class proportions across folds.
Results
- 4th place out of 200+ teams as team LabtekV.
- Stacking was the best model overall — outperforming soft voting (0.769), Random Forest (0.768), CatBoost (0.754), and XGBoost (0.734).
- Main error pattern: label-0 (customers who never accepted any promotion) had the most misclassifications — expected given its dominant class proportion.
Links