Turkiye Klinikleri Journal of Biostatistics

Data Mining Genome-Based Algorithm for Optimal Gene Selection and Prediction of Colorectal Carcinoma
Optimal Gen Seçimi İçin Veri Madenciliği Genom Tabanlı Algoritma ve Kolorektal Karsinomun Tahmini
Alabi Waheed BANJOKOa
aDepartment of Statistics, University of Ilorin, Ilorin, Kwara State, NIGERIA
Turkiye Klinikleri J Biostat. 2020;12(3):261-71
doi: 10.5336/biostatic.2020-77341
Article Language: EN
Full Text
Objective: This study presents a method for optimal selection of gene subsets to enhance the non-clinical diagnostic classification and prediction of colorectal cancer using gene expression level of gene expression profiles obtained with an Affymetrix oligonucleotide array. Material and Method: A Hybrid multi-objective Support vector Machine (SVM) feature selection and classification algorithm was employed to determine the Biomarker gene subsets that are highly statistically and clinically relevant to the 62 (tumour or normal) responses of the gene expression levels. The genes selection was done in two stages with the first stage using the Bayesian t-test to prune the non-informative genes and the second stage employed the multi-objective optimization method that allows sequential addition of genes for optimal determination of the pre-selected gene subsets. The SVM with RBF kernel (〖SVM〗_RBF ) was fitted sequentially to select the set of near-optimal genes that are correlated with the response class. Results: The optimally selected gene subset yielded an accuracy of 90.1% on the test data that were never used in the building process of the algorithm.Furthermore, the results obtained from the principal component analysis and the complete linkage hierarchical clustering indicated near-perfect discrimination of the two clinical response groups of the colorectal cancer status of the patients. Conclusion: This work has fully demonstrated that non-clinical colon cancer diagnosis and prediction of patients using their gene signatures from the gene microarray expression data is very possible when the appropriate data mining technique tools are used.

Keywords: Support vector machines; feature selection; multi-objective optimization; principal component analysis; clustering
Amaç: Bu çalışma, afimetrik oligo-nükleotid dizisi ile elde edilen gen ekspresyon profillerinin gen ekspresyon seviyesini kullanarak kolorektal kanserin klinik olmayan tanı sınıflandırmasını ve tahminini geliştirmek amacıyla gen alt kümelerinin optimal seçimi için bir yöntem sunar. Gereç ve Yöntemler: Gen ekspresyon seviyelerinin 62 (tümör veya normal) yanıtları ile istatistiksel ve klinik olarak oldukça ilgili biyo-belirteç alt kümelerini belirlemek için hibrit çok amaçlı destek vektör makinesi (DVM) özelliği seçimi ve sınıflandırma algoritması kullanılmıştır. Gen seçimi iki aşamada yapılmıştır; ilk aşamada bilgi vermeyen genleri budamak için Bayesçi t-testi, ikinci aşamada önceden seçilen gen alt kümelerinin optimal belirlenmesi için genlerin için sekansiyel eklenmesine izin veren çok amaçlı optimizasyon yöntemi kullanılmıştır. RBF çekirdeğine sahip SVM (SVMREF), yanıt sınıfı ile korele olan neredeyse optimal genler kümesini seçmek için sırayla yerleştirildi. Bulgular: Optimal olarak seçilen gen alt kümesi, algoritmanın oluşturulma sürecinde hiç kullanılmayan test verilerinde %90.1'lik bir doğruluk sağlamıştır. Ayrıca, ana bileşen analizinden ve tam bağlantı hiyerarşik kümelenmesinden elde edilen sonuçlar, kolorektal kanser durumunun iki klinik yanıt grubunun ayırt edilmesi için neredeyse mükemmele yakın ayrımını göstermiştir. Sonuç: Bu çalışma, klinik olmayan kolon kanseri teşhisinin ve hastaların gen mikro-dizi ekspresyon verilerinden kendi gen imzalarını kullanarak tahmin etmelerinin, uygun veri madenciliği teknik araçları kullanıldığında çok mümkün olduğunu tam olarak göstermiştir.

Anahtar Kelimeler: Destek vektör makineleri; öznitelik seçimi; çok amaçlı optimizasyon; temel bileşenler analizi; kümeleme
  1. Zhi J, Sun J, Wang Z, Ding W. Support vector machine classifier for prediction of the metastasis of colorectal cancer. Int J Mol Med. 2018;41(3):1419-26.[Crossref] [PubMed] [PMC] 
  2. Zhao D, Liu H, Zheng Y, He Y, Lu D, Lyu C. A reliable method for colorectal cancer prediction based on feature selection and support vector machine. Med Biol Eng Comput. 2019;57(4):901-12.[Crossref] [PubMed] 
  3. Khan J, Wei JS, Ringnér M, Saal LH, Ladanyi M, Westermann F, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med. 2001;7(6):673-9.[Crossref] [PubMed] [PMC] 
  4. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8(1):68-74.[Crossref] [PubMed] 
  5. Banjoko AW, Yahya WB, Garba MK. Multiclass response feature selection and cancer tumour classification with support vector machine. J Epidemiol Biostat. 2019;5:91-104.[Crossref] 
  6. Ge H, Yan Y, Wu D, Huang Y, Tian F. Potential role of LINC00996 in colorectal cancer: a study based on data mining and bioinformatics. Onco Targets Ther. 2018;11:4845-55.[Crossref] [PubMed] [PMC] 
  7. Ge H, Yan Y, Yue C, Liang C, Wu J. Long noncoding RNA LINC00265 targets EGFR and promotes deterioration of colorectal cancer: a comprehensive study based on data mining and in vitro validation. Onco Targets Ther. 2019;12:10681-92.[Crossref] [PubMed] [PMC] 
  8. Committee on Diagnostic Error in Health Care; Board on Health Care Services; Institute of Medicine; The National Academies of Sciences, Engineering, and Medicine. Improving Diagnosis in Health Care. Balogh EP, Miller BT, Ball JR, editors. Washington (DC): National Academies Press (US); 2015.[PubMed] 
  9. Yahya WB, Ulm K, Fahrmeir L, Hapfelmeier A. k-SS: a sequential feature selection and prediction method in microarray study. International Journal of Artificial Intelligence. 2011;6(11):19-47.[Link] 
  10. Hapfelmeier A, Yahya WB, Robert R, Ulm K. Predictive modeling of gene expression data. In: Crowley J, Hoering A, eds. Handbook of Statistics in Clinical Oncology. New York: Chapman & Hall/CRC; 2012. p.471-83.[Crossref] 
  11. Banjoko A, Yahya WB, Garba MK, Olaniran OR, Dauda KA, Olorede KO. Efficient support vector machine classification of diffuse large B-cell lymphoma and follicular lymphoma mRNA tissue samples. Annals Computer Science Series. 2015;13:69-79.[Link] 
  12. Kharrat N, Assidi M, Abu-Elmagd M, Pushparaj PN, Alkhaldy A, Arfaoui L, et al. Data mining analysis of human gut microbiota links Fusobacterium spp. with colorectal cancer onset. Bioinformation. 2019;15(6):372-9.[Crossref] [PubMed] [PMC] 
  13. Liang H, Yang L, Tao L, Shi L, Yang W, Bai J, et al. Data mining-based model and risk prediction of colorectal cancer by using secondary health data: a systematic review. Chin J Cancer Res. 2020;32(2):242-51.[Crossref] [PubMed] [PMC] 
  14. Pourhoseingholi MA, Kheirian S, Zali MR. Comparison of basic and ensemble data mining methods in predicting 5-year survival of colorectal cancer patients. Acta Inform Med. 2017;25(4):254-8.[Crossref] [PubMed] [PMC] 
  15. Wang T, Yang C, Zhao H. Prediction analysis for microbiome sequencing data. Biometrics. 2019;75(3):875-84.[Crossref] [PubMed] 
  16. Yahya WB, Aremu GT, Garba MK. Multiclass sequential feature selection and classification method for genomic data. JAST. 2016;20(1-2):50-61.[Link] 
  17. Rápolti E, Szigeti A, Farkas R, Bellyei S, Boronkai A, Papp A, et al. [Neoadjuvant radiochemotherapy in the treatment of locally advanced rectal tumors]. Magy Onkol. 2009;53(4):345-9.[Crossref] [PubMed] 
  18. Mohamad MS, Deris S, Illias RMD. A hybrid of genetic algorithm and support vector machine for features selection and classification of gene expression microarray. IJCIA. 2005;5(1):91-107.[Crossref] 
  19. Mohamad MS, Omatu S, Deris S, Misman MF,Yoshioka M. Selecting informative genes from microarray data by using hybrid methods for cancer classification. Artif Life Robotics. 2009;13:414-7.[Crossref] 
  20. Michalewicz Z. Genetic Algorithms + Data Structures = Evolution Programs. New York: Springer Verlag; 1996.[Crossref] 
  21. Banjoko AW, Yahya WB, Garba MK, Abdulazeez KO. Weighted support vector machine algorithm for efficient classification and prediction of binary response data. J Phys Conf Ser. 2019;1366:012101.[Crossref] 
  22. Zagorecki A. Feature selection for naïve Bayesian network ensemble using evolutionary algorithms. Proceedings of the 2014 Federated Conference on Computer Science and Information Systems. ACSIS. 2014. p.381-5.[Crossref] 
  23. Yang K, Zhou B, Yi F, Chen Y, Cheng Y. Correction to: colorectal cancer diagnostic algorithm based on sub-patch weight color histogram in combination of improved least squares support vector machine for pathological image. J Med Syst. 2019;43(12):333.[Crossref] [PubMed] 
  24. Zhao D, Liu H, Zheng Y, He Y, Lu D, Lyu C. Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis. J Biomed Inform. 2019;92:103124.[Crossref] [PubMed] 
  25. Olaniran OR, Yahya WB. Bayesian hypothesis testing of two normal samples using bootstrap prior technique. JMASM. 2017;16(2):185-96.[Crossref] 
  26. Yahya WB. Gene Selection and Tumour Classification in Cancer Research. Lambert Academic Sabrick; 2012.
  27. Vapnik VN. The Nature of Satistical Learning Theory. Springer-Verlag; 1995.[Crossref] [PubMed] 
  28. Banjoko AW, Yahya WB. Sequential optimization based feature selection algorithm for efficient cancer classification and prediction. In: 4th iSTEAMS International Multidisciplinary Conference, Vol. 14. p.265-74. (AlHikmah University, Ilorin, Nigeria, 2018).
  29. Hwang CL, Masud ASM. Multiple Objective Decision Making, Methods and Applications: A State-Of-The-Art Survey. Berlin: Springer-Verlag; 1979.[Crossref] 
  30. Miettinen K, Ruiz F, Wierzbicki AP. Introduction to multiobjective optimization: interactive approaches. In: Branke J, Branke J, Deb K, Miettinen K, Słowiński R, eds. Multiobjective optimization: interactive and evolutionary approaches. Berlin, Heidelberg: Springer-Verlag; 2008. p.27-57.
  31. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A. 1999;96(12):6745-50.[Crossref] [PubMed] [PMC] 
  32. Ting-Lee ML. Analysis of Microarray Gene Expression Data. New York: Kluwer Academics; 2004.
  33. van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics. 2006;7:142.[Crossref] [PubMed] [PMC] 
  34. Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines: and Other Kernel-Based Learning Methods. Cambridge University Press; 1999.[Crossref] [PMC] 

.: Up To Date



Ortadoğu Reklam Tanıtım Yayıncılık Turizm Eğitim İnşaat Sanayi ve Ticaret A.Ş.

.: Address

Turkocagi Caddesi No:30 06520 Balgat / ANKARA
Phone: +90 312 286 56 56
Fax: +90 312 220 04 70
E-mail: info@turkiyeklinikleri.com

.: Manuscript Editing Department

Phone: +90 312 286 56 56/ 2
E-mail: yaziisleri@turkiyeklinikleri.com

.: English Language Redaction

Phone: +90 312 286 56 56/ 145
E-mail: tkyayindestek@turkiyeklinikleri.com

.: Marketing Sales-Project Department

Phone: +90 312 286 56 56/ 142
E-mail: reklam@turkiyeklinikleri.com

.: Subscription and Public Relations Department

Phone: +90 312 286 56 56/ 118
E-mail: abone@turkiyeklinikleri.com

.: Customer Services

Phone: +90 312 286 56 56/ 118
E-mail: satisdestek@turkiyeklinikleri.com


1.1. To use the web pages with http://www.turkiyeklinikleri.com domain name or the websites reached through the sub domain names attached to the domain name (They will be collectively referred as "SITE"), please read the conditions below. If you do not accept these terms, please cease to use the "SITE." "SITE" owner reserves the right to change the information on the website, forms, contents, the "SITE," "SITE" terms of use anytime they want.

1.2. The owner of the "SITE" is Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. (From now on it is going to be referred as "Turkiye Klinikleri", shortly) and it resides at Turkocagi cad. No:30, 06520 Balgat Ankara. The services in the "SITE" are provided by "Turkiye Klinikleri."

1.3. Anyone accessing the "SITE" with or without a fee whether they are a natural person or a legal identity is considered to agree these terms of use. In this contract hereby, "Turkiye Klinikleri" may change the stated terms anytime. These changes will be published in the "SITE" periodically and they will be valid when they are published. Any natural person or legal identity benefiting from and reaching to the "SITE" are considered to be agreed to any change on hereby contract terms done by "Turkiye Klinikleri."

1.4. The "Terms of Use" hereby is published in the website with the last change on March 30th 2014 and the "SITE" is activated by enabling the access to everyone. The "Terms of Use" hereby is also a part of the any "USER Contract" was and/or will be done with the users using "Turkiye Klinikleri" services with or without a fee an inseparable.


2.1. "SITE": A website offering different kind of services and context with a certain frame determined by "Turkiye Klinikleri" and it is accessible on-line on http://www.turkiyeklinikleri.com domain name and/or subdomains connected to the domain name.

2.2. USER: A natural person or a legal identity accessing to the "SITE" through online settings.

2.3. LINK: A link enabling to access to another website through the "SITE", the files, the context or through another website to the "SITE", the files and the context.

2.4. CONTEXT: Any visual, literary and auditory images published in the "Turkiye Klinikleri", "SITE" and/or any website or any accessible information, file, picture, number/figures, price, etc.

2.5. "USER CONTRACT": An electronically signed contract between a natural or a legal identity benefiting from special services "Turkiye Klinikleri" will provide and "Turkiye Klinikleri".


3.1. "Turkiye Klinikleri" is completely free to determine the scope and quality of the services via the "SITE".

3.2. To benefit the services of "Turkiye Klinikleri" "SITE", the "USER" must deliver the features that will be specified by "Turkiye Klinikleri". "Turkiye Klinikleri" may change this necessity any time single-sided.

3.3. Not for a limited number, the services "Turkiye Klinikleri" will provide through the "SITE" for a certain price or for free are;

- Providing scientific articles, books and informative publications for health industry.

- Providing structural, statistical and editorial support to article preparation stage for scientific journals.


4.1. "Turkiye Klinikleri" is completely free to determine which of the services and contents provided in the "SITE" will be charged.

4.2. People benefiting from the services provided by "Turkiye Klinikleri" and using the website can use the "SITE" only according to the law and only for personal reasons. Users have the criminal and civil liability for every process and action they take in the "SITE". Every USER agrees, declares and undertakes that they will not proceed by any function or action infringement of rights of "Turkiye Klinikleri"s and/or other third parties', they are the exclusive right holder on usage, processing, storage, made public and revealing any written, visual or auditory information reported to Turkiye Klinikleri" and/or "SITE" to the third parties. "USER" agrees and undertakes that s/he will not duplicate, copy, distribute, process, the pictures, text, visual and auditory images, video clips, files, databases, catalogs and lists within the "SITE", s/he will not be using these actions or with other ways to compete with "Turkiye Klinikleri", directly or indirectly.

4.3. The services provided and the context published within the "SITE" by third parties is not under the responsibility of "Turkiye Klinikleri", institutions collaborated with "Turkiye Klinikleri", "Turkiye Klinikleri" employee and directors, "Turkiye Klinikleri" authorized salespeople. Commitment to accuracy and legality of the published information, context, visual and auditory images provided by any third party are under the full responsibility of the third party. "Turkiye Klinikleri" does not promise and guarantee the safety, accuracy and legality of the services and context provided by a third party.

4.4. "USER"s cannot act against "Turkiye Klinikleri", other "USER"s and third parties by using the "SITE". "Turkiye Klinikleri" has no direct and/or indirect responsibility for any damage a third party suffered or will suffer regarding "USER"s actions on the "SITE" against the rules of the hereby "Terms of Use" and the law.

4.5. "USER"s accept and undertake that the information and context they provided to the "SITE" are accurate and legal. "Turkiye Klinikleri" is not liable and responsible for promising and guaranteeing the verification of the information and context transmitted to "Turkiye Klinikleri" by the "USER"s, or uploaded, changed and provided through the "SITE" by them and whether these information are safe, accurate and legal.

4.6. "USER"s agree and undertake that they will not perform any action leading to unfair competition, weakening the personal and commercial credit of "Turkiye Klinikleri" and a third party,  encroaching and attacking on personal rights within the "SITE" in accordance with the Turkish Commercial Code Law.

4.7. "Turkiye Klinikleri" reserves the right to change the services and the context within the "SITE"  anytime. "Turkiye Klinikleri" may use this right without any notification and timelessly. "USER"s have to make the changes and/or corrections "Turkiye Klinikleri" required immediately. Any changes and/or corrections that are required by "Turkiye Klinikleri", may be made by "Turkiye Klinikleri" when needed. Any harm, criminal and civil liability resulted or will result from changes and/or corrections required by "Turkiye Klinikleri" and were not made on time by the "USER"s belongs completely to the users.

4.8. "Turkiye Klinikleri" may give links through the "SITE" to other websites and/or "CONTEXT"s and/or folders that are outside of their control and owned and run by third parties. These links are provided for ease of reference only and do not hold qualification for support the respective web SITE or the admin or declaration or guarantee for the information inside. "Turkiye Klinikleri" does not hold any responsibility over the web-sites connected through the links on the "SITE", folders and context, the services or products on the websites provided through these links or their context.

4.9. "Turkiye Klinikleri" may use the information provided to them by the "USERS" through the "SITE" in line with the terms of the "PRIVACY POLICY" and "USER CONTRACT". It may process the information or classify and save them on a database. "Turkiye Klinikleri" may also use the USER's or visitor's identity, address, e-mail address, phone number, IP number, which sections of the "SITE" they visited, domain type, browser type, date and time information to provide statistical evaluation and customized services.


5.1. The information accessed through this "SITE" or provided by the users legally and all the elements (including but not limited to design, text, image, html code and other codes) of the "SITE" (all of them will be called as studies tied to "Turkiye Klinikleri"s copyrights) belongs to "Turkiye Klinikleri". Users do not have the right to resell, process, share, distribute, display or give someone permission to access or to use the "Turkiye Klinikleri" services, "Turkiye Klinikleri" information and the products under copyright protection by "Turkiye Klinikleri". Within hereby "Terms of Use" unless explicitly permitted by "Turkiye Klinikleri" nobody can reproduce, process, distribute or produce or prepare any study from those under "Turkiye Klinikleri" copyright protection.

5.2. Within hereby "Terms of Use", "Turkiye Klinikleri" reserves the rights for "Turkiye Klinikleri" services, "Turkiye Klinikleri" information, the products associated with "Turkiye Klinikleri" copyrights, "Turkiye Klinikleri" trademarks, "Turkiye Klinikleri" trade looks or its all rights for other entity and information it has through this website unless it is explicitly authorized by "Turkiye Klinikleri".


"Turkiye Klinikleri" in its sole discretion may change the hereby "Terms of Use" anytime announcing within the "SITE". The changed terms of the hereby "Terms of Use" will become valid when they are announced. Hereby "Terms of Use" cannot be changed by unilateral declarations of users.


"Turkiye Klinikleri" is not responsible for executing late or never of this hereby "Terms of Use", privacy policy and "USER Contract" in any situation legally taken into account as force majeure. Being late or failure of performance or non-defaulting of this and similar cases like this will not be the case from the viewpoint of "Turkiye Klinikleri", and "Turkiye Klinikleri" will not have any damage liability for these situations. "Force majeure" term will be regarded as outside of the concerned party's reasonable control and any situation that "Turkiye Klinikleri" cannot prevent even though it shows due diligence. Also, force majeure situations include but not limited to natural disasters, rebellion, war, strike, communication problems, infrastructure and internet failure, power cut and bad weather conditions.


Turkish Law will be applied in practicing, interpreting the hereby "Terms of Use" and managing the emerging legal relationships within this "Terms of Use" in case of finding element of foreignness, except for the rules of Turkish conflict of laws. Ankara Courts and Enforcement Offices are entitled in any controversy happened or may happen due to hereby contract.


Hereby "Terms of Use" come into force when announced in the "SITE" by "Turkiye Klinikleri". The users are regarded to agree to hereby contract terms by using the "SITE". "Turkiye Klinikleri" may change the contract terms and the changes will be come into force by specifying the version number and the date of change on time it is published in the "SITE".



Privacy Policy

We recommend you to read the terms of use below before you visit our website. In case you agree these terms, following our rules will be to your favor. Please read our Terms of Use thoroughly.

www.turkiyeklinikleri.com website belongs to Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. and is designed in order to inform physicians in the field of health

www.turkiyeklinikleri.com cannot reach to user’s identity, address, service providers or other information. The users may send this information to the website through forms if they would like to. However, www.turkiyeklinikleri.com may collect your hardware and software information. The information consists of your IP address, browser type, operating system, domain name, access time, and related websites. www.turkiyeklinikleri.com cannot sell the provided user information (your name, e-mail address, home and work address, phone number) to the third parties, publish it publicly, or keep it in the website. Gathered information has a directing feature to be a source for the website’s visitor profile, reporting and promotion of the services.

www.turkiyeklinikleri.com uses the taken information:

-To enhance, improve and maintain the quality of the website

-To generate visitor’s profile and statistical data

-To determine the tendency of the visitors on using our website

-To send print publications/correspondences

-To send press releases or notifications through e-mail

-To generate a list for an event or competition

By using www.turkiyeklinikleri.com you are considered to agree that;

-Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. cannot be hold responsible for any user’s illegal and immoral behavior,

-Terms of use may change from time to time,

-It is not responsible for other websites’ contents it cannot control or the harms they may cause although it uses the connection they provided.

Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. may block the website to users in the following events:

-Information with wrong, incomplete, deceiving or immoral expressions is recorded to the website,

-Proclamation, advertisement, announcement, libelous expressions are used against natural person or legal identity,

-During various attacks to the website,

-Disruption of the website because of a virus.

Written, visual and audible materials of the website, including the code and the software are under protection by legal legislation.

Without the written consent of Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. the information on the website cannot be downloaded, changed, reproduced, copied, republished, posted or distributed.

All rights of the software and the design of the website belong to Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc.

Ortadoğu Advertisement Presentation Publishing Tourism Education Architecture Industry and Trade Inc. will be pleased to hear your comments about our terms of use. Please share the subjects you think may enrich our website or if there is any problem regarding our website.
