Turkiye Klinikleri Journal of Biostatistics

.: ORIGINAL RESEARCH
A New Class-Weighting Formulation for the Class Imbalance Problem: A Methodological Research
Sınıf Dengesizliği Problemi İçin Yeni Bir Sınıf Ağırlıklandırma Formülasyonu: Metodolojik Bir Araştırma
Batuhan BAKIRARARa , Selen YILMAZ IŞIKHANb
aDepartment of Biostatistics, Ankara University Faculty of Medicine, Ankara, Türkiye
bDepartment of Economics and Administrative Programs, Hacettepe University Vocational School of Social Sciences, Ankara, Türkiye
Turkiye Klinikleri J Biostat. 2023;15(2):79-90
doi: 10.5336/biostatic.2023-96293
Article Language: EN
Full Text
ABSTRACT
Objective: Many of the machine learning classification algorithms are not robust against unbalanced classes and result in poorly accurate and biased models. One way to address class imbalance is to assign weights to classes. This article proposes a new class-weighting approach to improve the classification problem when there is an imbalance between two class. Material and Methods: The performances of the new formulation were compared with the previously proposed Inverse of Square Root of Number of Samples, effective number of samples weighting formula and unweighted Random Forest solutions. A simulation study was performed using performances of 3 imbalance rates (0.10, 0.20, 0.30), 6 different sample sizes (250, 300, 350, 400, 450, 500) and 4 different methods with 1,000 repetitions. Additionally, the methods were analyzed on the lung cancer dataset with 39 samples in the minority group and with 270 samples in the majority group. Results: Experimental results demonstrated that our proposed weighting formula, least number of ratio and range multiplier, performed equal to or better solution than Inverse of Square Root of Number of Samples in both simulations and real data. Generally, minority class accuracy and balanced accuracy of our formulation were either very close to or higher than that of Inverse of Square Root of Number of Samples. Conclusion: The new formulation provided accuracy estimates of the 2 classes in a balanced way for each sample size and for each imbalance rate. Additionally, as the sample size increased from 250 to 500, stable decreasing weights could be obtained for the patient and control groups.

Keywords: Class Imbalance; class-weighting methods; classification; Random Forest algorithm
ÖZET
Amaç: Makine öğrenimi sınıflandırma algoritmalarının birçoğu, dengesiz sınıflara karşı güçlü değildir ve doğruluğu düşük ve yanlı modeller ile sonuç verir. Sınıf dengesizliğini çözmenin bir yolu, sınıflara ağırlık atamaktır. Bu makale, 2 sınıf arasında bir dengesizlik olduğunda sınıflandırma problemini iyileştirmek için yeni bir sınıf ağırlıklandırma yaklaşımı önermektedir. Gereç ve Yöntemler: Yeni formülasyonun performansları, daha önce önerilen Örnek Sayısının Karekökünün Tersi, etkin örnek sayısı ağırlıklandırma formülü ve ağırlıklandırılmamış Random Forest çözümleri ile karşılaştırılmıştır. Üç dengesizlik oranı (0,10, 0,20, 0,30), 6 farklı örneklem büyüklüğü (250, 300, 350, 400, 450, 500) ve 4 farklı yöntemin 1.000 tekrarlı performansları kullanılarak simülasyon çalışması yapılmıştır. Ayrıca yöntemler azınlık grubunda 39 örnek ve çoğunluk grubunda 270 örnek ile akciğer kanseri veri setinde analiz edilmiştir. Bulgular: Deneysel sonuçlar, önerilen ağırlıklandırma formülümüz olan en az sayı oranı ve açıklık çarpanının hem simülasyonlarda hem de gerçek veride Örnek Sayısının Karekökünün Tersi'ninkine eşit veya daha iyi bir performans gösterdiğini belirtmiştir. Genel olarak formülümüzün azınlık sınıfı doğruluğu ve dengeli doğruluğu, Örnek Sayısının Karekökünün Tersi formülünün doğruluğuna çok yakın ya da daha yüksektir. Sonuç: Yeni formülasyon, her örneklem büyüklüğü ve her bir dengesizlik oranı için 2 sınıfın doğruluk tahminlerini dengeli bir şekilde sağlamıştır. Ayrıca örneklem büyüklüğü 250'den 500'e çıkarıldığında hasta ve kontrol grupları için tutarlı azalan ağırlıklar elde edilebilmiştir.

Anahtar Kelimeler: Sınıf dengesizliği; sınıf ağırlıklandırma yöntemleri; sınıflandırma; Rastgele Orman algoritması
REFERENCES:
  1. Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data. 2019;6(1):27. [Crossref] 
  2. Wu Z, Lin W, Ji Y. An integrated ensemble learning model for imbalanced fault diagnostics and prognostics. IEEE Access. 2018;6:8394-8402. [Crossref] 
  3. Barua S, Islam MM, Yao X, Murase K. MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans know data Eng. 2012;26(2):405-25. [Crossref] 
  4. Hassan MM, Huda S, Yearwood J, Jelinek HF, Almogren A. Multistage fusion approaches based on a generative model and multivariate exponentially weighted moving average for diagnosis of cardiovascular autonomic nerve dysfunction. Inf. Fusion. 2018;41:105-18. [Crossref] 
  5. Han J, Yang Z, Zhang Q, Chen C, Li H, Lai S, et al. A method of insulator faults detection in aerial images for high-voltage transmission lines inspection. Appl Sci. 2019;9(10):2009. [Crossref] 
  6. Irtaza A, Adnan SM, Ahmed KT, Jaffar A, Khan A, Javed A, et al. An ensemble based evolutionary approach to the class imbalance problem with applications in CBIR. Appl Sci. 2018;8(4):495. [Crossref] 
  7. Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf. Sci. 2019;479:448-55. [Crossref] 
  8. Tao X, Chen W, Zhang X, Guo W, Qi L, Fan Z. SVDD boundary and DPC clustering technique-based oversampling approach for handling imbalanced and overlapped data. Knowl Based Syst. 2021;234:107588. [Crossref] 
  9. Mahani A, Ali ARB. Classification problem in imbalanced datasets. Recent Trends Comput Intell. 2019:1-23. [Crossref] 
  10. Krawczyk B. Learning from imbalanced data: open challenges and future directions. Prog Artif Intell. 2016;5(4):221-32. [Crossref] 
  11. Cui Y, Jia M, Lin T-Y, Song Y, Belongie S. Class-balanced loss based on effective number of samples. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. [Crossref] 
  12. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intel Res. 2002;16:321-57. [Crossref] 
  13. Asundi RV, Prakash R, Kumar K. Class Weight technique for Handling Class Imbalance. ResearchGate. 18 Temmuz 2022. [Link] 
  14. Campos Almazán A. Bal images analysis for their automatic quantification [Degree thesis]. Barcelona: Universitat Politècnica de Catalunya; 2021. [Cited: 10.01.2023]. Available from: [Link] 
  15. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(1):6. [Crossref]  [PubMed]  [PMC] 
  16. Akosa J. Predictive accuracy: A misleading performance measure for highly imbalanced data. Paper Presented at: Proceedings of the SAS Global Forum. 2017. [Link] 
  17. Hen H, Garcia E. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21(9):1263-84. [Crossref] 
  18. Ripley B, Venables B, Bates DM, Hornik K, Gebhardt A, Firth D, et al. Package 'mass'. Cran R. 2013;538:113-20. [Link] 
  19. Yarberry W. DPLYR. CRAN Recipes: DPLYR, Stringr, Lubridate, and RegEx in R. 1st ed. Berkeley, CA: Springer; 2021. p.1-58. [Crossref] 
  20. Maldonado S, Vairetti C, Fernandez A, Herrera F. FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification. Pattern Recognition. 2022;124:108511. [Crossref] 
  21. Zhu M, Xia J, Jin X, Yan M, Cai G, Yan J, et al. Class weights random forest algorithm for processing class imbalanced medical data. IEEE Access. 2018;6:4641-52. [Crossref] 
  22. Cao K, Wei C, Gaidon A, Arechiga N, Ma T. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in Neural Information Processing Systems. 2019;32. [Link] 
  23. Divyanth L, Marzougui A, González-Bernal MJ, McGee RJ, Rubiales D, Sankaran S. Evaluation of effective class-balancing techniques for CNN-based assessment of aphanomyces root rot resistance in pea (Pisum sativum L.). Sensors. 2022;22(19):7237. [Crossref]  [PubMed]  [PMC] 
  24. Wang YX, Ramanan D, Hebert M. Learning to model the tail. Adv Neural Inf Process Syst. 2017;30. [Link] 
  25. Huang C, Li Y, Loy CC, Tang X. Learning deep representation for imbalanced classification. Paper Presented at: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition. 2016. [Crossref] 
  26. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. Adv Neural inf Process Syst. 2013;26. [Link] 
  27. Tan J, Wang C, Li B, Li Q, Ouyang W, Yin C, et al. Equalization loss for long-tailed object recognition. Paper Presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. [Crossref] 
  28. Jonker J. Weighted convolutional neural networks rare electrocardiogram detection for real-time heart monitoring. 2021. [Link] 

.: Up To Date

Login



Contact


Ortadoğu Reklam Tanıtım Yayıncılık Turizm Eğitim İnşaat Sanayi ve Ticaret A.Ş.

.: Address

Turkocagi Caddesi No:30 06520 Balgat / ANKARA
Phone: +90 312 286 56 56
Fax: +90 312 220 04 70
E-mail: info@turkiyeklinikleri.com

.: Manuscript Editing Department

Phone: +90 312 286 56 56/ 2
E-mail: yaziisleri@turkiyeklinikleri.com

.: English Language Redaction

Phone: +90 312 286 56 56/ 145
E-mail: tkyayindestek@turkiyeklinikleri.com

.: Marketing Sales-Project Department

Phone: +90 312 286 56 56/ 142
E-mail: reklam@turkiyeklinikleri.com

.: Subscription and Public Relations Department

Phone: +90 312 286 56 56/ 118
E-mail: abone@turkiyeklinikleri.com

.: Customer Services

Phone: +90 312 286 56 56/ 118
E-mail: satisdestek@turkiyeklinikleri.com

1. TERMS OF USE

1.1. To use the web pages with http://www.turkiyeklinikleri.com domain name or the websites reached through the sub domain names attached to the domain name (They will be collectively referred as "SITE"), please read the conditions below. If you do not accept these terms, please cease to use the "SITE." "SITE" owner reserves the right to change the information on the website, forms, contents, the "SITE," "SITE" terms of use anytime they want.

1.2. The owner of the "SITE" is Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. (From now on it is going to be referred as "Turkiye Klinikleri", shortly) and it resides at Turkocagi cad. No:30, 06520 Balgat Ankara. The services in the "SITE" are provided by "Turkiye Klinikleri."

1.3. Anyone accessing the "SITE" with or without a fee whether they are a natural person or a legal identity is considered to agree these terms of use. In this contract hereby, "Turkiye Klinikleri" may change the stated terms anytime. These changes will be published in the "SITE" periodically and they will be valid when they are published. Any natural person or legal identity benefiting from and reaching to the "SITE" are considered to be agreed to any change on hereby contract terms done by "Turkiye Klinikleri."

1.4. The "Terms of Use" hereby is published in the website with the last change on March 30th 2014 and the "SITE" is activated by enabling the access to everyone. The "Terms of Use" hereby is also a part of the any "USER Contract" was and/or will be done with the users using "Turkiye Klinikleri" services with or without a fee an inseparable.

2. DEFINITIONS

2.1. "SITE": A website offering different kind of services and context with a certain frame determined by "Turkiye Klinikleri" and it is accessible on-line on http://www.turkiyeklinikleri.com domain name and/or subdomains connected to the domain name.

2.2. USER: A natural person or a legal identity accessing to the "SITE" through online settings.

2.3. LINK: A link enabling to access to another website through the "SITE", the files, the context or through another website to the "SITE", the files and the context.

2.4. CONTEXT: Any visual, literary and auditory images published in the "Turkiye Klinikleri", "SITE" and/or any website or any accessible information, file, picture, number/figures, price, etc.

2.5. "USER CONTRACT": An electronically signed contract between a natural or a legal identity benefiting from special services "Turkiye Klinikleri" will provide and "Turkiye Klinikleri".

3. SCOPE OF THE SERVICES

3.1. "Turkiye Klinikleri" is completely free to determine the scope and quality of the services via the "SITE".

3.2. To benefit the services of "Turkiye Klinikleri" "SITE", the "USER" must deliver the features that will be specified by "Turkiye Klinikleri". "Turkiye Klinikleri" may change this necessity any time single-sided.

3.3. Not for a limited number, the services "Turkiye Klinikleri" will provide through the "SITE" for a certain price or for free are;

- Providing scientific articles, books and informative publications for health industry.

- Providing structural, statistical and editorial support to article preparation stage for scientific journals.

4. GENERAL PROVISIONS

4.1. "Turkiye Klinikleri" is completely free to determine which of the services and contents provided in the "SITE" will be charged.

4.2. People benefiting from the services provided by "Turkiye Klinikleri" and using the website can use the "SITE" only according to the law and only for personal reasons. Users have the criminal and civil liability for every process and action they take in the "SITE". Every USER agrees, declares and undertakes that they will not proceed by any function or action infringement of rights of "Turkiye Klinikleri"s and/or other third parties', they are the exclusive right holder on usage, processing, storage, made public and revealing any written, visual or auditory information reported to Turkiye Klinikleri" and/or "SITE" to the third parties. "USER" agrees and undertakes that s/he will not duplicate, copy, distribute, process, the pictures, text, visual and auditory images, video clips, files, databases, catalogs and lists within the "SITE", s/he will not be using these actions or with other ways to compete with "Turkiye Klinikleri", directly or indirectly.

4.3. The services provided and the context published within the "SITE" by third parties is not under the responsibility of "Turkiye Klinikleri", institutions collaborated with "Turkiye Klinikleri", "Turkiye Klinikleri" employee and directors, "Turkiye Klinikleri" authorized salespeople. Commitment to accuracy and legality of the published information, context, visual and auditory images provided by any third party are under the full responsibility of the third party. "Turkiye Klinikleri" does not promise and guarantee the safety, accuracy and legality of the services and context provided by a third party.

4.4. "USER"s cannot act against "Turkiye Klinikleri", other "USER"s and third parties by using the "SITE". "Turkiye Klinikleri" has no direct and/or indirect responsibility for any damage a third party suffered or will suffer regarding "USER"s actions on the "SITE" against the rules of the hereby "Terms of Use" and the law.

4.5. "USER"s accept and undertake that the information and context they provided to the "SITE" are accurate and legal. "Turkiye Klinikleri" is not liable and responsible for promising and guaranteeing the verification of the information and context transmitted to "Turkiye Klinikleri" by the "USER"s, or uploaded, changed and provided through the "SITE" by them and whether these information are safe, accurate and legal.

4.6. "USER"s agree and undertake that they will not perform any action leading to unfair competition, weakening the personal and commercial credit of "Turkiye Klinikleri" and a third party,  encroaching and attacking on personal rights within the "SITE" in accordance with the Turkish Commercial Code Law.

4.7. "Turkiye Klinikleri" reserves the right to change the services and the context within the "SITE"  anytime. "Turkiye Klinikleri" may use this right without any notification and timelessly. "USER"s have to make the changes and/or corrections "Turkiye Klinikleri" required immediately. Any changes and/or corrections that are required by "Turkiye Klinikleri", may be made by "Turkiye Klinikleri" when needed. Any harm, criminal and civil liability resulted or will result from changes and/or corrections required by "Turkiye Klinikleri" and were not made on time by the "USER"s belongs completely to the users.

4.8. "Turkiye Klinikleri" may give links through the "SITE" to other websites and/or "CONTEXT"s and/or folders that are outside of their control and owned and run by third parties. These links are provided for ease of reference only and do not hold qualification for support the respective web SITE or the admin or declaration or guarantee for the information inside. "Turkiye Klinikleri" does not hold any responsibility over the web-sites connected through the links on the "SITE", folders and context, the services or products on the websites provided through these links or their context.

4.9. "Turkiye Klinikleri" may use the information provided to them by the "USERS" through the "SITE" in line with the terms of the "PRIVACY POLICY" and "USER CONTRACT". It may process the information or classify and save them on a database. "Turkiye Klinikleri" may also use the USER's or visitor's identity, address, e-mail address, phone number, IP number, which sections of the "SITE" they visited, domain type, browser type, date and time information to provide statistical evaluation and customized services.

5. PROPRIETARY RIGHTS

5.1. The information accessed through this "SITE" or provided by the users legally and all the elements (including but not limited to design, text, image, html code and other codes) of the "SITE" (all of them will be called as studies tied to "Turkiye Klinikleri"s copyrights) belongs to "Turkiye Klinikleri". Users do not have the right to resell, process, share, distribute, display or give someone permission to access or to use the "Turkiye Klinikleri" services, "Turkiye Klinikleri" information and the products under copyright protection by "Turkiye Klinikleri". Within hereby "Terms of Use" unless explicitly permitted by "Turkiye Klinikleri" nobody can reproduce, process, distribute or produce or prepare any study from those under "Turkiye Klinikleri" copyright protection.

5.2. Within hereby "Terms of Use", "Turkiye Klinikleri" reserves the rights for "Turkiye Klinikleri" services, "Turkiye Klinikleri" information, the products associated with "Turkiye Klinikleri" copyrights, "Turkiye Klinikleri" trademarks, "Turkiye Klinikleri" trade looks or its all rights for other entity and information it has through this website unless it is explicitly authorized by "Turkiye Klinikleri".

6. CHANGES IN THE TERMS OF USE

"Turkiye Klinikleri" in its sole discretion may change the hereby "Terms of Use" anytime announcing within the "SITE". The changed terms of the hereby "Terms of Use" will become valid when they are announced. Hereby "Terms of Use" cannot be changed by unilateral declarations of users.

7. FORCE MAJEURE

"Turkiye Klinikleri" is not responsible for executing late or never of this hereby "Terms of Use", privacy policy and "USER Contract" in any situation legally taken into account as force majeure. Being late or failure of performance or non-defaulting of this and similar cases like this will not be the case from the viewpoint of "Turkiye Klinikleri", and "Turkiye Klinikleri" will not have any damage liability for these situations. "Force majeure" term will be regarded as outside of the concerned party's reasonable control and any situation that "Turkiye Klinikleri" cannot prevent even though it shows due diligence. Also, force majeure situations include but not limited to natural disasters, rebellion, war, strike, communication problems, infrastructure and internet failure, power cut and bad weather conditions.

8. LAW AND AUTHORISATION TO FOLLOW

Turkish Law will be applied in practicing, interpreting the hereby "Terms of Use" and managing the emerging legal relationships within this "Terms of Use" in case of finding element of foreignness, except for the rules of Turkish conflict of laws. Ankara Courts and Enforcement Offices are entitled in any controversy happened or may happen due to hereby contract.

9. CLOSING AND AGREEMENT

Hereby "Terms of Use" come into force when announced in the "SITE" by "Turkiye Klinikleri". The users are regarded to agree to hereby contract terms by using the "SITE". "Turkiye Klinikleri" may change the contract terms and the changes will be come into force by specifying the version number and the date of change on time it is published in the "SITE".

 

30.03.2014

Privacy Policy

We recommend you to read the terms of use below before you visit our website. In case you agree these terms, following our rules will be to your favor. Please read our Terms of Use thoroughly.

www.turkiyeklinikleri.com website belongs to Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. and is designed in order to inform physicians in the field of health

www.turkiyeklinikleri.com cannot reach to user’s identity, address, service providers or other information. The users may send this information to the website through forms if they would like to. However, www.turkiyeklinikleri.com may collect your hardware and software information. The information consists of your IP address, browser type, operating system, domain name, access time, and related websites. www.turkiyeklinikleri.com cannot sell the provided user information (your name, e-mail address, home and work address, phone number) to the third parties, publish it publicly, or keep it in the website. Gathered information has a directing feature to be a source for the website’s visitor profile, reporting and promotion of the services.

www.turkiyeklinikleri.com uses the taken information:

-To enhance, improve and maintain the quality of the website

-To generate visitor’s profile and statistical data

-To determine the tendency of the visitors on using our website

-To send print publications/correspondences

-To send press releases or notifications through e-mail

-To generate a list for an event or competition

By using www.turkiyeklinikleri.com you are considered to agree that;

-Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. cannot be hold responsible for any user’s illegal and immoral behavior,

-Terms of use may change from time to time,

-It is not responsible for other websites’ contents it cannot control or the harms they may cause although it uses the connection they provided.

Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. may block the website to users in the following events:

-Information with wrong, incomplete, deceiving or immoral expressions is recorded to the website,

-Proclamation, advertisement, announcement, libelous expressions are used against natural person or legal identity,

-During various attacks to the website,

-Disruption of the website because of a virus.

Written, visual and audible materials of the website, including the code and the software are under protection by legal legislation.

Without the written consent of Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. the information on the website cannot be downloaded, changed, reproduced, copied, republished, posted or distributed.

All rights of the software and the design of the website belong to Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc.

Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. will be pleased to hear your comments about our terms of use. Please share the subjects you think may enrich our website or if there is any problem regarding our website.

info@turkiyeklinikleri.com