A Comparison of Penalized Regression Methods on Model Estimation and Variable Selection: A Simulation Study

Ali Türker ÇİFTÇİ^a , Didem DERİCİ YILDIRIM^b , Damla Hazal SUCU^b
^aNiğde Ömer Halisdemir University Faculty of Medicine, Department of Biostatistics and Medical Informatics, Niğde, Türkiye
^bMersin University Faculty of Medicine, Department of Biostatistics and Medical Informatics, Mersin, Türkiye

Turkiye Klinikleri J Biostat. 2025;17(1):1-15

doi: 10.5336/biostatic.2024-106402

Makale Dili: EN

Tam Metin

Ücretsiz Erişim

ABSTRACT
Objective: The aim of this study is to determine which variables to include in the model and to examine how successful the model estimation is with the Least Angle Regression (LAR), Least Absolute Shrinkage Selection Operator (LASSO) and Elastic Net (EN) regression methods, which are alternatives to unbiased methods. Material and Methods: In this study, variables that LAR, LASSO and EN regression methods, which are among the biased methods, take in model selection and their model prediction success are compared. For this purpose, data sets were generated in different scenarios in R program. The results obtained after data sets produced at the end of the simulation with standard normal distribution, sample sizes n=50, n=100, n=200, number of independent variables p=16, p=18, p=20 and correlation coefficients r=0.10; r=0.60; r=0.90 were recorded. Results: Model predictions of the methods were recorded in the study results. While the model predictions of the LAR and LASSO methods were close to each other, the EN regression method differed in model prediction. When analyzed in terms of Mean Square Error (MSE) and Coefficients of Determination, close values were observed. Conclusion: While model prediction success is high in data sets with low sample size, model prediction success decreases and MSE values increase when the sample size increases gradually. For this reason, it has been observed that these methods are more useful and provide better model prediction in cases where there is a multicollinearity problem in the data sets and in scenarios where the sample size is small.

Keywords: Penalized regression; variable selection; least angle regression; least absolute shrinkage selection operator; elastic net regression

ÖZET
Amaç: Bu çalışmanın amacı, hangi değişkenlerin modele dâhil edileceğini belirlemek ve yansız yöntemlere alternatif olan En Küçük Açı Regresyonu [Least Angle Regression (LAR)], En Küçük Mutlak Büzülme Seçim Operatörü [Least Absolute Shrinkage Selection Operator (LASSO)] ve Elastik Ağ [Elastic Net (EN)] regresyon yöntemleri ile model tahmininin ne kadar başarılı olduğunu incelemektir. Gereç ve Yöntemler: Bu çalışmada, yanlı yöntemlerden olan LAR, LASSO ve EN regresyon yöntemlerinin model seçiminde hangi değişkenleri aldığı ve model tahmin başarıları karşılaştırılmıştır. Bu amaçla R programında farklı senaryolarda veri setleri üretilmiştir. Standart normal dağılıma sahip, örnek genişlikleri n=50, n=100, n=200, bağımsız değişken sayıları p=16, p=18, p=20 ve korelasyon katsayıları r=0,10; r=0,60; r=0,90 olacak biçimde yapılan simülasyon sonunda üretilen veri setlerinin ardından elde edilen sonuçlar kaydedilmiştir. Bulgular: Çalışma sonuçlarında yöntemlerin model tahminleri kaydedilmiştir. LAR ve LASSO yöntemlerinin model tahminleri birbirine yakınken, EN regresyon yöntemi, model tahmininde farklılık göstermiştir. Hata Kareler Ortalaması (HKO) ve Belirtme Katsayıları ile incelendiğinde ise birbirlerine yakın değerler gözlenmiştir. Sonuç: Örnek genişliği düşük olan veri setlerinde model tahmin başarısı yüksek iken, örnek genişliğinin giderek arttığı durumlarda model tahmin başarısı azalmakta, HKO değerleri artmaktadır. Bu sebeple veri setlerinde çoklu bağlantı sorunu olduğu durumda ve örnek genişliğinin küçük olduğu senaryolarda bu yöntemlerin daha kullanışlı olduğu ve daha iyi model tahmini yaptığı görülmüştür.

Anahtar Kelimeler: Cezalı regresyon; değişken seçimi; en küçük açı regresyonu; en küçük mutlak büzülme seçim operatörü; elastik ağ regresyon

REFERANSLAR:

Topal M, Eyduran E, Yağanoğlu AM, Sönmez AY, Keskin S. Çoklu doğrusal bağlantı durumunda ridge ve temel bileşenler regresyon analiz yöntemlerinin kullanımı [Use of ridge and principal component regression analysis methods in multicollinearity]. Atatürk Üniversitesi Ziraat Fakültesi Dergisi. 2010;41(1):53-7. [Link]
Albayrak AS. Çoklu doğrusal bağlantı halinde en küçük kareler tekniğinin alternatifi yanlı tahmin teknikleri ve bir uygulama. Uluslararası Yönetim İktisat İşletme Dergisi. 2005;1(1):105-26. [Link]
Büyükuysal MÇ, Öz İİ. Çoklu doğrusal bağıntı varlığında en küçük karelere alternatif yaklaşım: ridge regresyon. DÜ Sağlık bil Enst Derg. 2016;6(2):110-4. [Link]
Küçük A. Doğrusal regresyonda ridge, liu ve lasso tahmin edicileri üzerine bir çalışma [Yüksek lisans tezi]. Ankara: Hacettepe Üniversitesi Fen Bilimleri Enstitüsü; 2019. [Link]
Wu Y. Can?t ridge regression perform variable selection? Technometrics. 2020;63(2):263-71. [Crossref]
Efron B, Hastie T, Johnstone I, Tibshrani R. Least angle regression. Institute Of Mathematical Statistics. 2004;32(2):407-99. [Link]
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning Data Mining. Inference and Prediction. 2nd ed. Berlin: Springer; 2008. p.73-4.
Kumarage PM, Yogarajah B, Ratnarajah N. Efficient Feature Selection for Prediction of Diabetic Using LASSO. 19th International Conference on Advances in ICT for Emerging Regions (ICTer); Colombo, Sri Lanka. 2019. p.1-7. [Crossref]
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of The Royal Statistical Society. 1994;58(1):266-88. [Link]
Reid S, Tibshirani R, Friedman J. A study of error variance estimation in lasso regression. Institute Of Statistical Science. 2016;26(1):35-67. [Link]
Hans C. Elastic net regression modeling with the orthant normal prior. Journal of the American Statistical Association. 2011;106(496);1383-93. [Link]
Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal Of The Royal Statistical Society B. 2005;67(2):301-20. [Link]
Jia J, Yu B. On model selection consistency of the en when p>n. Institute Of Statistical Science. 2010;20(2):565-611. [Link]
Ogutu JO, Schulz-Streeck T, Piepho HP. Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Proceedings. 2012;6(Suppl 2):1-6. [Crossref]
De Mol C, De Vito E, Rosasco L. Elastic-net regularization in learning theory. Journal Of Complexity. 2009;25(2):201-30. [Link]
Alpar R. Uygulamalı Çok Değişkenli İstatistiksel Yöntemler. 5. Baskı. Ankara: Detay Yayıncılık; 2017. p.400-1.
Farbahari A, Dehesh T, Gozashti MH. The usage of lasso, ridge and linear regression to explore the most influential metabolic variables that affect fasting blood sugar in type 2 diabetes patients. Rom. J Diabetes Nutr Metab Dis. 2020;26(4):371-9. [Crossref]
Yıldırım DD, Çiftçi AT. Etkili değişkenlerin cezalı regresyon yöntemleri ile belirlenmesi: diyabet veri kümesi üzerine bir uygulama [Determining the effective variables by penalized regression methods: An application on diabetes data set]. Mersin Univ Saglık Bilim Derg. 2020;14(1);105-12. [Link]
Zou H, Hastie T. Regression shrinkage and selection via the elastic net, with applications to microarrays. Journal of the Royals Statistical Society. 2003;67:301-20. [Link]
Itubide E, Cerda J, Graff M. A comparasion between LARS and LASSO for initialising the time-series forecasting auto-regressive equations. Procedia Technology. 2013;7:282-8. [Link]
Gregory DS, Jackson HM. Regulation techniques for multicollinearity: lasso, ridge and elastic nets. 2018;131. [Link]
Kayanan M, Wijekoon P. Performance of LASSO and elastic net estimators in misspecified regression model. Ceylon Journal of Science. 2019;48(3):293-9. [Crossref]
Bai J, Ng S. Forecasting economic time series using targated predictors. Journal of Econometrics. 2008;146:304-17. [Link]
Oyeyemi GM, Ogunjobi EO, Folorunsho AI. On performance of shrinkage methods-a monte carlo study. International Journal of Statistics and Applications. 2015;5(2):72-6. [Link]
Çiftsüren MN, Akkol S. Prediction of internal egg quality characterictics and variable selection using regularizations methods: ridge, LASSO and elastic net. Archives Animal. Breeding. 2018;61(3):279-84. [Link]

.: Güncel

.: İşlem Listesi

Türkçe İngilizce

Hakkımızda İletişim Görüş ve Öneri

Veri Politikamız Kullanım Şartları

Ortadoğu Reklam Tanıtım Yayıncılık Turizm Eğitim İnşaat Sanayi ve Ticaret A.Ş.

.: Adres

Türkocağı Caddesi No:30 06520 Balgat / ANKARA
Telefon: +90 312 286 56 56
E-posta: info@turkiyeklinikleri.com

.: Yazı İşleri Servisi

Telefon: +90 312 286 56 56/ 154 - 153
E-posta: yaziisleri@turkiyeklinikleri.com

.: İngilizce Dil Redaksiyonu

Telefon: +90 312 286 56 56/ 145
E-posta: tkyayindestek@turkiyeklinikleri.com

.: Reklam Servisi

Telefon: +90 312 286 56 56/ 142
E-posta: reklam@turkiyeklinikleri.com

.: Abone ve Halkla İlişkiler Servisi

Telefon: +90 312 286 56 56/ 197
E-posta: abone@turkiyeklinikleri.com

.: Müşteri Hizmetleri

Telefon: +90 312 286 56 56/ 197
E-posta: satisdestek@turkiyeklinikleri.com