Turkiye Klinikleri Journal of Biostatistics

.: ORIGINAL RESEARCH
Sınıf Dengesizliği Varlığında Hastalık Tanısı için Kolektif Öğrenme Yöntemlerinin Karşılaştırılması: Diyabet Tanısı Örneği
Comparison of Ensemble Learning Methods for Disease Diagnosis in Presence of Class Unbalanced: Case of Diabetes
Sultan TURHANa, Yüksel ÖZKANb, B. Sarer YÜREKLİc, Aslı SUNERb, Eralp DOĞUa
aMuğla Sıtkı Koçman Üniversitesi Fen Fakültesi, İstatistik Bölümü, Muğla, TÜRKİYE
bEge Üniversitesi Tıp Fakültesi, Biyoistatistik ve Tıbbi Bilişim AD, İzmir, TÜRKİYE
cEge Üniversitesi Tıp Fakültesi, Endokrinoloji BD, İzmir, TÜRKİYE
Turkiye Klinikleri J Biostat. 2020;12(1):16-26
doi: 10.5336/biostatic.2019-66816
Article Language: TR
Full Text
ÖZET
Amaç: Günümüzde makine öğrenmesi yöntemleri hastalık tanısının konulmasında yaygın olarak kullanılmaktadır. Ancak sağlık verisinin büyük hacimli, çok boyutlu ve karmaşık olması nedeniyle dengesiz sınıf problemi ile karşılaşılması durumunda bu yöntemlerin doğrudan kullanımı performans düşüşüne neden olmaktadır. Bu çalışmada diyabet hastalarına ilişkin dengesiz yapıdaki bir veri seti kullanılarak çeşitli yeniden örnekleme yöntemleri dengesizlik probleminin giderilmesinde kullanılmış ve kolektif (ensemble) öğrenme algoritmalarına entegre edilerek diyabet tanısı üzerinden sınıflandırma performansları karşılaştırılmıştır. Gereç Yöntemler: Kullanılan veriler Haziran ' Eylül 2013 tarihleri arasında, İzmir Bozkaya Eğitim ve Araştırma Hastanesi, Endokrinoloji ve Metabolizma Hastalıkları polikliniğine başvuran, 18 yaşından büyük 185 hastadan elde edilmiştir. Diyabet tanısının sınıflandırmasına yönelik sınıf dengesizliği problemini gidermek amacıyla alt örnekleme (under sampling), aşırı örnekleme (over sampling) ve sentetik azınlık aşırı örnekleme (SMOTE) yöntemleri kullanılmıştır. Sınıflandırma performansı üzerindeki etkiler, torbalama (bagging) ve arttırma (boosting) temelli kolektif öğrenme yöntemlerine entegre edilmesiyle karşılaştırılmıştır. Algoritmaların doğru sınıflandırma performanslarının karşılaştırılmasında doğruluk, Kappa istatistiği, duyarlılık ve seçicilik ölçütleri kullanılmıştır. Tüm istatistiksel analizler, açık kaynak kodlu bir yazılım olan R programlama dilinde yapılmıştır. Bulgular: Dengesiz veri setinde ham veri ile yapılan diyabet tanısı sınıflandırma başarısı oldukça düşüktür. Aşırı örnekleme yöntemi ile yapılan sınıflandırmaların, orijinal dengesiz veri seti, alt örnekleme ve sentetik azınlık aşırı örnekleme yöntemi ile yapılan sınıflandırmalardan çok daha başarılı tahmin gücüne sahip olduğu tespit edilmiştir. Sonuç: Sınıf dengesizliği varlığında veri setlerini yeniden örnekleme yöntemlerine tabi tutarak veriyi dengeledikten sonra sınıflandırma algoritmalarının kullanılması önerilmektedir.

Anahtar Kelimeler: Kolektif öğrenme; sınıflandırma; dengesiz veri; hastalık tanısı; diyabet
ABSTRACT
Objective: Recently, machine learning methods have been widely used in disease diognosis. However, due to the large volume, multidimensional and complexity of the information, an unbalanced data problem arises. In this study, it is aimed to eliminate problem of imbalance by using re-sampling methods in an unbalanced data set related to diabetes patients, to classify diagnosis of diabetes with ensemble learning algorithms and to compare correct classification performances of algorithms. Material and Methods: The data were collected from 185 patients older than 18 years of age who were admitted to Izmir Bozkaya Training and Research Hospital, Endocrinology and Metabolism Diseases outpatient clinic between June and September 2013. Under-sampling, over-sampling and synthetic minority over-sampling methods were used to eliminate unbalanced class problem for diagnosis of diabetes. The effects on classification performance were compared by integrating bagging and boosting methods into ensemble learning methods. Accuracy, Kappa statistics, sensitivity and specificity were used to compare correct classification performance of algorithms. All statistical analyzes were made in the R programming language, an open source software. Results: The success rate of diabetes diagnosis with raw data is very low in the unbalanced data set. It is determined that classifications made with over-sampling method have much more successful estimation power than classifications made with original unbalanced data set, under-sampling and synthetic minority over-sampling method. Conclusion: It is recommended to use classification algorithms after balancing the data by subjecting the data sets to resampling methods in the presence of class imbalance.

Keywords: Ensemble learning; classification; unbalanced data; disease diagnosis; diabetes
REFERENCES:
  1. Goldenberg R, Punthakee Z. Definition, classification and diagnosis of diabetes, prediabetes and metabolic syndrome. Can J Diabetes. 2013;37(1):197-212. [Crossref]  [PubMed] 
  2. Laiteerapong N, Cifu AS. Screening for prediabetes and type 2 diabetes mellitus. JAMA. 2016;315(7):697-8. [Crossref]  [PubMed]  [PMC] 
  3. Ogurtsova K, da Rocha Fernandes JD, Huang Y, Linnenkamp U, Guariguata L, Cho NH, et al. IDF Diabetes Atlas: global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Res Clin Pract. 2017;128:40-50. [Crossref]  [PubMed] 
  4. Jutel A. Classification, disease, and diagnosis. Perspect Biol Med. 2011;54(2):189-205. [Crossref]  [PubMed] 
  5. Liu Z, Tang D, Cai Y, Wang R, Chen F. A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data. Neurocomputing. 2017;266:641-50. [Crossref] 
  6. Wan S, Duan Y, Zou Q. HPSLPred: an ensemble multi-label classifier for human protein subcellular location prediction with imbalanced source. Proteomics. 2017;17(17-18). [Crossref]  [PubMed] 
  7. Zhang J, Cui X, Li J, Wang R. Imbalanced classification of mental workload using a cost-sensitive majority weighted minority oversampling strategy. Cogn Technol Work. 2017;19(4):633-53. [Crossref] 
  8. Wu Z, Lin W, Ji Y. An integrated ensemble learning model for imbalanced fault diagnostics and prognostics. IEEE Access. 2018;6:8394-02. [Crossref] 
  9. Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak. 2011;11(1):51. [Crossref]  [PubMed]  [PMC] 
  10. Zhou B, Li W, Hu J. A new segmented oversampling method for imbalanced data classification using quasi-linear SVM. IEEJ Trans Electr Electron Eng. 2017;12(6):891-8. [Crossref] 
  11. He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21(9):1263-84. [Crossref] 
  12. Alexander Yun-Chung Liu B. The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets. The University of Texas at Austin; 2004. http://fliphtml5.com/oefn/qjyp/basic/51-57
  13. Lin WC, Tsai CF, Hu YH, Jhang JS. Clustering-based undersampling in class-imbalanced data. Inf Sci (Ny). 2017;(409-410):17-26. [Crossref] 
  14. Douzas G, Bacao F. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst Appl. 2018;91:464-71. [Crossref] 
  15. Seo JH, Kim YH. Machine-learning approach to optimize SMOTE ratio in class imbalance dataset for intrusion detection. Comput Intell Neurosci. 2018;2018:9704672. [Crossref]  [PubMed]  [PMC] 
  16. Zhou ZH. Ensemble Methods: Foundations and Algorithms. 1st ed. Cambridge: CRC Press; 2012. p.236. [Crossref] 
  17. Çolak MC, Çolak C, Erdil N, Arslan AK. Investigating optimal number of cross validation on the prediction of postoperative atrial fibrillation by voting ensemble strategy. Turkiye Klinikleri J Biostat. 2016;8(1):30-5. [Crossref] 
  18. Yang P, Yang JYH, Zhou B, Zomaya A. A review of ensemble methods in bioinformatics. Curr Bioinform. 2010;5(4):296-308. [Crossref] 
  19. Freund Y, Schapire RE. A desicion-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1995;55(1):119-39. [Crossref] 
  20. Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123-40. [Crossref]  [Crossref] 
  21. Sewell M. Ensemble Methods. London: UCL; 2007. p.12.
  22. Temel GO, Ankaralı H, Taşdelen B, Erdoğan S, Özge A. A comparison of boosting tree and gradient treeboost methods for carpal tunnel syndrome. Turkiye Klinikleri J Biostat. 2014;6(2):67-73.
  23. Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. [Crossref] 
  24. Akar Ö, Güngör O, Akar A. [Determination of land use area with random forest classifier]. Gebze: 3. Uzaktan Algılama ve Coğrafi Bilgi Sistemleri Sempozyumu; 2010. p.11-3.
  25. Akman M, Genç Y, Ankaralı H. [Random forests methods and an application in health science]. Turkiye Klinikleri J Biostat. 2011;3(1):36-48.
  26. Akşehirli ÖY, Ankaralı H, Aydın D, Saraçlı Ö. [An alternative approach in medical diagnosis: support vector machines]. Turkiye Klinikleri J Biostat. 2013;5(1):19-28.
  27. Xu B, Huang JZ, Williams G, Wang Q, Ye Y. Classifying very high-dimensional data with random forests built from small subspaces. Int J Data Warehous Min. 2012;8(2):44-63. [Crossref] 
  28. Zhao H, Williams GJ, Huang JZ. wsrf: an R package for classification with scalable weighted subspace random forests. J Stat Softw. 2017;77(3):1-30. [Crossref] 
  29. Rätsch G, Onoda T, Müller KR. Soft margins for AdaBoost. Mach Learn. 2001;42(3):287-320. [Crossref] 
  30. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. Ann Stat. 2000;28(2):337-407. [Crossref]  [Crossref] 
  31. Cai YD, Feng KY, Lu WC, Chou KC. Using LogitBoost classifier to predict protein structural classes. J Theor Biol. 2006;238(1):172-6. [Crossref]  [PubMed] 
  32. Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002;38(4):367-78. [Crossref] 

.: Up To Date

Login



Contact


Ortadoğu Reklam Tanıtım Yayıncılık Turizm Eğitim İnşaat Sanayi ve Ticaret A.Ş.

.: Address

Turkocagi Caddesi No:30 06520 Balgat / ANKARA
Phone: +90 312 286 56 56
Fax: +90 312 220 04 70
E-mail: info@turkiyeklinikleri.com

.: Manuscript Editing Department

Phone: +90 312 286 56 56/ 2
E-mail: yaziisleri@turkiyeklinikleri.com

.: English Language Redaction

Phone: +90 312 286 56 56/ 145
E-mail: tkyayindestek@turkiyeklinikleri.com

.: Marketing Sales-Project Department

Phone: +90 312 286 56 56/ 142
E-mail: reklam@turkiyeklinikleri.com

.: Subscription and Public Relations Department

Phone: +90 312 286 56 56/ 118
E-mail: abone@turkiyeklinikleri.com

.: Customer Services

Phone: +90 312 286 56 56/ 118
E-mail: satisdestek@turkiyeklinikleri.com

1. TERMS OF USE

1.1. To use the web pages with http://www.turkiyeklinikleri.com domain name or the websites reached through the sub domain names attached to the domain name (They will be collectively referred as "SITE"), please read the conditions below. If you do not accept these terms, please cease to use the "SITE." "SITE" owner reserves the right to change the information on the website, forms, contents, the "SITE," "SITE" terms of use anytime they want.

1.2. The owner of the "SITE" is Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. (From now on it is going to be referred as "Turkiye Klinikleri", shortly) and it resides at Turkocagi cad. No:30, 06520 Balgat Ankara. The services in the "SITE" are provided by "Turkiye Klinikleri."

1.3. Anyone accessing the "SITE" with or without a fee whether they are a natural person or a legal identity is considered to agree these terms of use. In this contract hereby, "Turkiye Klinikleri" may change the stated terms anytime. These changes will be published in the "SITE" periodically and they will be valid when they are published. Any natural person or legal identity benefiting from and reaching to the "SITE" are considered to be agreed to any change on hereby contract terms done by "Turkiye Klinikleri."

1.4. The "Terms of Use" hereby is published in the website with the last change on March 30th 2014 and the "SITE" is activated by enabling the access to everyone. The "Terms of Use" hereby is also a part of the any "USER Contract" was and/or will be done with the users using "Turkiye Klinikleri" services with or without a fee an inseparable.

2. DEFINITIONS

2.1. "SITE": A website offering different kind of services and context with a certain frame determined by "Turkiye Klinikleri" and it is accessible on-line on http://www.turkiyeklinikleri.com domain name and/or subdomains connected to the domain name.

2.2. USER: A natural person or a legal identity accessing to the "SITE" through online settings.

2.3. LINK: A link enabling to access to another website through the "SITE", the files, the context or through another website to the "SITE", the files and the context.

2.4. CONTEXT: Any visual, literary and auditory images published in the "Turkiye Klinikleri", "SITE" and/or any website or any accessible information, file, picture, number/figures, price, etc.

2.5. "USER CONTRACT": An electronically signed contract between a natural or a legal identity benefiting from special services "Turkiye Klinikleri" will provide and "Turkiye Klinikleri".

3. SCOPE OF THE SERVICES

3.1. "Turkiye Klinikleri" is completely free to determine the scope and quality of the services via the "SITE".

3.2. To benefit the services of "Turkiye Klinikleri" "SITE", the "USER" must deliver the features that will be specified by "Turkiye Klinikleri". "Turkiye Klinikleri" may change this necessity any time single-sided.

3.3. Not for a limited number, the services "Turkiye Klinikleri" will provide through the "SITE" for a certain price or for free are;

- Providing scientific articles, books and informative publications for health industry.

- Providing structural, statistical and editorial support to article preparation stage for scientific journals.

4. GENERAL PROVISIONS

4.1. "Turkiye Klinikleri" is completely free to determine which of the services and contents provided in the "SITE" will be charged.

4.2. People benefiting from the services provided by "Turkiye Klinikleri" and using the website can use the "SITE" only according to the law and only for personal reasons. Users have the criminal and civil liability for every process and action they take in the "SITE". Every USER agrees, declares and undertakes that they will not proceed by any function or action infringement of rights of "Turkiye Klinikleri"s and/or other third parties', they are the exclusive right holder on usage, processing, storage, made public and revealing any written, visual or auditory information reported to Turkiye Klinikleri" and/or "SITE" to the third parties. "USER" agrees and undertakes that s/he will not duplicate, copy, distribute, process, the pictures, text, visual and auditory images, video clips, files, databases, catalogs and lists within the "SITE", s/he will not be using these actions or with other ways to compete with "Turkiye Klinikleri", directly or indirectly.

4.3. The services provided and the context published within the "SITE" by third parties is not under the responsibility of "Turkiye Klinikleri", institutions collaborated with "Turkiye Klinikleri", "Turkiye Klinikleri" employee and directors, "Turkiye Klinikleri" authorized salespeople. Commitment to accuracy and legality of the published information, context, visual and auditory images provided by any third party are under the full responsibility of the third party. "Turkiye Klinikleri" does not promise and guarantee the safety, accuracy and legality of the services and context provided by a third party.

4.4. "USER"s cannot act against "Turkiye Klinikleri", other "USER"s and third parties by using the "SITE". "Turkiye Klinikleri" has no direct and/or indirect responsibility for any damage a third party suffered or will suffer regarding "USER"s actions on the "SITE" against the rules of the hereby "Terms of Use" and the law.

4.5. "USER"s accept and undertake that the information and context they provided to the "SITE" are accurate and legal. "Turkiye Klinikleri" is not liable and responsible for promising and guaranteeing the verification of the information and context transmitted to "Turkiye Klinikleri" by the "USER"s, or uploaded, changed and provided through the "SITE" by them and whether these information are safe, accurate and legal.

4.6. "USER"s agree and undertake that they will not perform any action leading to unfair competition, weakening the personal and commercial credit of "Turkiye Klinikleri" and a third party,  encroaching and attacking on personal rights within the "SITE" in accordance with the Turkish Commercial Code Law.

4.7. "Turkiye Klinikleri" reserves the right to change the services and the context within the "SITE"  anytime. "Turkiye Klinikleri" may use this right without any notification and timelessly. "USER"s have to make the changes and/or corrections "Turkiye Klinikleri" required immediately. Any changes and/or corrections that are required by "Turkiye Klinikleri", may be made by "Turkiye Klinikleri" when needed. Any harm, criminal and civil liability resulted or will result from changes and/or corrections required by "Turkiye Klinikleri" and were not made on time by the "USER"s belongs completely to the users.

4.8. "Turkiye Klinikleri" may give links through the "SITE" to other websites and/or "CONTEXT"s and/or folders that are outside of their control and owned and run by third parties. These links are provided for ease of reference only and do not hold qualification for support the respective web SITE or the admin or declaration or guarantee for the information inside. "Turkiye Klinikleri" does not hold any responsibility over the web-sites connected through the links on the "SITE", folders and context, the services or products on the websites provided through these links or their context.

4.9. "Turkiye Klinikleri" may use the information provided to them by the "USERS" through the "SITE" in line with the terms of the "PRIVACY POLICY" and "USER CONTRACT". It may process the information or classify and save them on a database. "Turkiye Klinikleri" may also use the USER's or visitor's identity, address, e-mail address, phone number, IP number, which sections of the "SITE" they visited, domain type, browser type, date and time information to provide statistical evaluation and customized services.

5. PROPRIETARY RIGHTS

5.1. The information accessed through this "SITE" or provided by the users legally and all the elements (including but not limited to design, text, image, html code and other codes) of the "SITE" (all of them will be called as studies tied to "Turkiye Klinikleri"s copyrights) belongs to "Turkiye Klinikleri". Users do not have the right to resell, process, share, distribute, display or give someone permission to access or to use the "Turkiye Klinikleri" services, "Turkiye Klinikleri" information and the products under copyright protection by "Turkiye Klinikleri". Within hereby "Terms of Use" unless explicitly permitted by "Turkiye Klinikleri" nobody can reproduce, process, distribute or produce or prepare any study from those under "Turkiye Klinikleri" copyright protection.

5.2. Within hereby "Terms of Use", "Turkiye Klinikleri" reserves the rights for "Turkiye Klinikleri" services, "Turkiye Klinikleri" information, the products associated with "Turkiye Klinikleri" copyrights, "Turkiye Klinikleri" trademarks, "Turkiye Klinikleri" trade looks or its all rights for other entity and information it has through this website unless it is explicitly authorized by "Turkiye Klinikleri".

6. CHANGES IN THE TERMS OF USE

"Turkiye Klinikleri" in its sole discretion may change the hereby "Terms of Use" anytime announcing within the "SITE". The changed terms of the hereby "Terms of Use" will become valid when they are announced. Hereby "Terms of Use" cannot be changed by unilateral declarations of users.

7. FORCE MAJEURE

"Turkiye Klinikleri" is not responsible for executing late or never of this hereby "Terms of Use", privacy policy and "USER Contract" in any situation legally taken into account as force majeure. Being late or failure of performance or non-defaulting of this and similar cases like this will not be the case from the viewpoint of "Turkiye Klinikleri", and "Turkiye Klinikleri" will not have any damage liability for these situations. "Force majeure" term will be regarded as outside of the concerned party's reasonable control and any situation that "Turkiye Klinikleri" cannot prevent even though it shows due diligence. Also, force majeure situations include but not limited to natural disasters, rebellion, war, strike, communication problems, infrastructure and internet failure, power cut and bad weather conditions.

8. LAW AND AUTHORISATION TO FOLLOW

Turkish Law will be applied in practicing, interpreting the hereby "Terms of Use" and managing the emerging legal relationships within this "Terms of Use" in case of finding element of foreignness, except for the rules of Turkish conflict of laws. Ankara Courts and Enforcement Offices are entitled in any controversy happened or may happen due to hereby contract.

9. CLOSING AND AGREEMENT

Hereby "Terms of Use" come into force when announced in the "SITE" by "Turkiye Klinikleri". The users are regarded to agree to hereby contract terms by using the "SITE". "Turkiye Klinikleri" may change the contract terms and the changes will be come into force by specifying the version number and the date of change on time it is published in the "SITE".

 

30.03.2014

Privacy Policy

We recommend you to read the terms of use below before you visit our website. In case you agree these terms, following our rules will be to your favor. Please read our Terms of Use thoroughly.

www.turkiyeklinikleri.com website belongs to Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. and is designed in order to inform physicians in the field of health

www.turkiyeklinikleri.com cannot reach to user’s identity, address, service providers or other information. The users may send this information to the website through forms if they would like to. However, www.turkiyeklinikleri.com may collect your hardware and software information. The information consists of your IP address, browser type, operating system, domain name, access time, and related websites. www.turkiyeklinikleri.com cannot sell the provided user information (your name, e-mail address, home and work address, phone number) to the third parties, publish it publicly, or keep it in the website. Gathered information has a directing feature to be a source for the website’s visitor profile, reporting and promotion of the services.

www.turkiyeklinikleri.com uses the taken information:

-To enhance, improve and maintain the quality of the website

-To generate visitor’s profile and statistical data

-To determine the tendency of the visitors on using our website

-To send print publications/correspondences

-To send press releases or notifications through e-mail

-To generate a list for an event or competition

By using www.turkiyeklinikleri.com you are considered to agree that;

-Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. cannot be hold responsible for any user’s illegal and immoral behavior,

-Terms of use may change from time to time,

-It is not responsible for other websites’ contents it cannot control or the harms they may cause although it uses the connection they provided.

Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. may block the website to users in the following events:

-Information with wrong, incomplete, deceiving or immoral expressions is recorded to the website,

-Proclamation, advertisement, announcement, libelous expressions are used against natural person or legal identity,

-During various attacks to the website,

-Disruption of the website because of a virus.

Written, visual and audible materials of the website, including the code and the software are under protection by legal legislation.

Without the written consent of Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. the information on the website cannot be downloaded, changed, reproduced, copied, republished, posted or distributed.

All rights of the software and the design of the website belong to Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc.

Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. will be pleased to hear your comments about our terms of use. Please share the subjects you think may enrich our website or if there is any problem regarding our website.

info@turkiyeklinikleri.com