A Comparison of Multiple Imputation Methods in Multiple Imputation by Chained Equations for Longitudinal Continuous Data: A Simulation Study

Tuncay YANARATEŞ; Erdem KARABULUT

doi:10.5336/biostatic.2025-108815

Turkiye Klinikleri Journal of Biostatistics

Journal Identity

About Journal

Peer Review Process

Last Issue

Issue List

Editorial Board

Information For Authors

Author Forms

Article Submission

Subscription

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

A Comparison of Multiple Imputation Methods in Multiple Imputation by Chained Equations for Longitudinal Continuous Data: A Simulation Study

Uzunlamasına Sürekli Veri İçin Zincirlenmiş Denklemlerle Çoklu Atama İçindeki Çoklu Atama Yöntemlerinin Karşılaştırılması: Bir Simülasyon Çalışması

Tuncay YANARATEŞ^a , Erdem KARABULUT^a
^aHacettepe University Faculty of Medicine, Department of Biostatistics, Ankara, Türkiye

Turkiye Klinikleri J Biostat. 2025;17(2):72-80

doi: 10.5336/biostatic.2025-108815

Article Language: EN

Full Text

ABSTRACT
Objective: Missing data is one of the main problems in longitudinal data. Imputation is one of the ways to solve this problem. Multiple imputation methods are preferred to single imputation methods because they explain the uncertainty around the true value and get almost unbiased estimates. In this study, we aim to compare 5 multiple imputation methods within multiple imputation by chained equations (MICE) for longitudinal continuous data. Material and Methods: We evaluated the performance of the 5 methods by generating data from multivariate distribution in R programming language. We deleted 10%, 20%, and 30% of the complete data under missing completely at random and missing at random. We simulated 1,000 repetitions. Our evaluation criterion is root mean squared error. Results: When there is a weak correlation in time points, MICE-random forest (MICE-RF) has the least biased results. When there is a strong correlation in time points, MICE-predictive mean matching (MICE-PMM) and MICE using linear regression with bootstrap (MICE-BOOT) have the least biased results. MICE-classification and regression trees and MICE using Bayesian linear regression have the most biased results. Conclusion: MICE-RF, MICEPMM, and MICE-BOOT can be used for longitudinal continuous data with missing observations. Moreover, researchers can generate different multivariate distributions in simulation studies to determine the optimal method.

Keywords: Longitudinal data; missing data; missing completely at random; multiple imputation

ÖZET
Amaç: Eksik veri, uzunlamasına veri analizinde temel sorunlardan biridir. Bu sorunu çözmek için bilinen yollardan biri atamadır. Çoklu atama yöntemleri, gerçek değer etrafındaki belirsizliği açıkladıkları ve neredeyse yansız tahminler elde ettikleri için tekli atama yöntemlerine tercih edilmektedir. Bu çalışmada, uzunlamasına veri için MICE içindeki 5 çoklu atama yönteminin karşılaştırılması amaçlanmaktadır. Gereç ve Yöntemler: Beş yöntemin performansı, R programlama dilinden çok değişkenli dağılım üreterek değerlendirilmiştir. Tamamen rastgele kayıp ve rastgele kayıp mekanizmaları altında tam verilerin %10, %20, %30'u silinmiştir. 1.000 tekrar yapılmıştır. Değerlendirme kriteri olarak karekök ortalama kare hata belirlenmiştir. Bulgular: Zaman noktaları arasında zayıf korelasyon varken, MICE-rastgele orman [random forest (RF)] (MICE-RF) en yansız sonuçlara sahiptir. Zaman noktaları arasında güçlü korelasyon varken, bootstrap ile doğrusal regresyon kullanan MICE [MICE using linear regression with bootstrap (MICE-BOOT)] ve MICE-tahmini ortalama eşleştirme [MICE-predictive mean matching (MICE-PMM)] en yansız sonuçlara sahiptir. MICE-sınıflandırma ve regresyon ağaçları ve Bayesian doğrusal regresyonu kullanan MICE en yanlı sonuçlara sahiptir. Sonuç: MICE-RF, MICE-PMM ve MICE-BOOT kayıp veriye sahip uzunlamasına veri için kullanılabilir. Ayrıca araştırmacılar, benzetim çalışmalarında farklı çok değişkenli dağılımlar üretebilirler.

Anahtar Kelimeler: Uzunlamasına veri; kayıp veri; tamamen rastgele kayıp; çoklu atama

REFERENCES:

Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581-92. [Link]
van Buuren S. Flexible Imputation of Missing Data. 1st ed. Boca Raton, FL: Chapman & Hall/CRC; 2012. [Crossref] [PubMed]
Jahangiri M, Kazemnejad A, Goldfeld KS, Daneshpour MS, Mostafaei S, Khalili D, et al. A wide range of missing imputation approaches in longitudinal data: a simulation study and real data analysis. BMC Med Res Methodol. 2023;23(1):161. ; [Crossref] [PubMed] [PMC]
Slade E, Naylor MG. A fair comparison of tree-based and parametric methods in multiple imputation by chained equations. Stat Med. 2020;39(8):1156-66. ; [Crossref] [PubMed] [PMC]
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer Verlag; 2001.
Doove LL, Van Buuren S, Dusseldorp E. Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics and Data Analysis. 2014;72:92-104. [Link]
Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. Am J Epidemiol. 2014;179(6):764-74. ; [Crossref] [PubMed] [PMC]
Morris TP, White IR, Royston P. Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. 2014;14:75. ; [Crossref] [PubMed] [PMC]
Box GEP, Tiao GC. Bayesian Inference in Statistical Analysis. 1st ed. New York: Reading Addison-Wesley; 1973. [Crossref] [PubMed]
Efron B, Tibshirani RJ. An Introduction to the Bootstrap. 1st ed. New York: Chapman & Hall; 1993.
Heitjan DF, Little RJA. Multiple imputation for the fatal accident reporting system. Journal of the Royal Statistical Society. Series C (Applied Statistics). 1991;40(1):13-29. [Link]
The R [Internet]. © The R Foundation. R Development Core Team. 2024. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available from: [Link]
Goretzko D. Factor retention in exploratory factor analysis with missing data. Educ Psychol Meas. 2022;82(3):444-64. ; [Crossref] [PubMed] [PMC]
Schwerter J, Gurtskaia K, Romero A, Zeyer-Gliozzo B, Pauly M. Evaluating tree-based imputation methods as an alternative to MICE PMM for drawing inference in empirical studies. 2024. [Link]
Javadi S, Bahrampour A, Saber MM, Garrusi B, Baneshi MR. Evaluation of four multiple imputation methods for handling missing binary outcome data in the presence of an interaction between a dummy and a continuous variable. Journal of Probability and Statistics. 2021:14. [Crossref]

.: Up To Date

.: Process List

Turkish English

About us Contact Us Comments

Ortadoğu Reklam Tanıtım Yayıncılık Turizm Eğitim İnşaat Sanayi ve Ticaret A.Ş.

.: Address

Turkocagi Caddesi No:30 06520 Balgat / ANKARA
Phone: +90 312 286 56 56
E-mail: info@turkiyeklinikleri.com

.: Manuscript Editing Department

Phone: +90 312 286 56 56/ 154 - 153
E-mail: yaziisleri@turkiyeklinikleri.com

.: English Language Redaction

Phone: +90 312 286 56 56/ 145
E-mail: tkyayindestek@turkiyeklinikleri.com

.: Marketing Sales-Project Department

Phone: +90 312 286 56 56/ 142
E-mail: reklam@turkiyeklinikleri.com

.: Subscription and Public Relations Department

Phone: +90 312 286 56 56/ 197
E-mail: abone@turkiyeklinikleri.com

.: Customer Services

Phone: +90 312 286 56 56/ 197
E-mail: satisdestek@turkiyeklinikleri.com