A New Preprocessing Method for Diabetes and Biomedical Data Classification


  • Sarbast CHALO ran University, Engineering Faculty, Department of Computer Engineering, Şanlıurfa, Turkey
  • İbrahim Berkan AYDİLEK ran University, Engineering Faculty, Department of Computer Engineering, Şanlıurfa, Turkey




People of all ages and socioeconomic levels, all over the world, are being diagnosed with type 2 diabetes at rates that are higher than they have ever been. It is possible for it to be the root cause of a wide variety of diseases, the most notable of which include blindness, renal illness, kidney disease, and heart disease. Therefore, it is of the utmost importance that a system is devised that, based on medical information, is capable of reliably detecting patients who have diabetes. We present a method for the identification of diabetes that involves the training of the features of a deep neural network between five and 10 times using the cross-validation training mode. The Pima Indian Diabetes (PID) data set was retrieved from the database that is part of the machine learning repository at UCI. In addition, the results of ten-fold cross-validation show an accuracy of 97.8%, a recall OF 97.8%, and a precision of 97.8% for PIMA dataset using RF algorithm. This research examined a variety of other biomedical datasets to demonstrate that machine learning may be used to develop an efficient system that can accurately predict diabetes. Several different types of machine learning classifiers, such as KNN, J48, RF, and DT, were utilized in the experimental findings of biological datasets. The findings that were obtained demonstrated that our trainable model is capable of correctly classifying biomedical data. This was demonstrated by achieving higher 99% accuracy, recall, and precision for parikson dataset.


Download data is not yet available.


Williams, R., Karuranga, S., Malanda, B., Saeedi, P., Basit, A., Besançon, S., ... & Colagiuri, S. (2020). Global and regional estimates and projections of diabetes-related health expenditure: Results from the International Diabetes Federation Diabetes Atlas. Diabetes research and clinical practice, 162, 108072.

Abdulqadir, H. R., Abdulazeez, A. M., & Zebari, D. A. (2021). Data mining classification techniques for diabetes prediction. Qubahan Academic Journal, 1(2), 125-133.

Arshad, M., Saeed, M., Rahman, A. U., Zebari, D. A., Mohammed, M. A., Al-Waisy, A. S., ... & Thanoon, M. (2022). The Assessment of Medication Effects in Omicron Patients through MADM Approach Based on Distance Measures of Interval-Valued Fuzzy Hypersoft Set. Bioengineering, 9(11), 706.

Devi, M. Renuka, and J. Maria Shyla. "Analysis of Various Data Mining Techniques to Predict Diabetes Mellitus." International Journal of Applied Engineering Research 11.1, pp. 727-730, 2016.

Zebari, D. A., Sadiq, S. S., & Sulaiman, D. M. (2022, March). Knee Osteoarthritis Detection Using Deep Feature Based on Convolutional Neural Network. In 2022 International Conference on Computer Science and Software Engineering (CSASE) (pp. 259-264). IEEE.

Swapna G, Vinayakumar R, Soman KP. Diabetes detection using deep learning algorithms. ICT Express. 2018;4(4):243–6.

Naz, H., & Ahuja, S. (2020). Deep learning approach for diabetes prediction using PIMA Indian dataset. Journal of Diabetes & Metabolic Disorders, 19(1), 391-403.

Ibrahim, D. A., Zebari, D. A., Mohammed, H. J., & Mohammed, M. A. (2022). Effective hybrid deep learning model for COVID‐19 patterns identification using CT images. Expert Systems, e13010.

Kapoor, N. R., Kumar, A., Kumar, A., Zebari, D. A., Kumar, K., Mohammed, M. A., ... & Albahar, M. A. (2022). Event-Specific Transmission Forecasting of SARS-CoV-2 in a Mixed-Mode Ventilated Office Room Using an ANN. International Journal of Environmental Research and Public Health, 19(24), 16862.

Mohammed, H. J., Al-Fahdawi, S., Al-Waisy, A. S., Zebari, D. A., Ibrahim, D. A., Mohammed, M. A., ... & Kim, J. (2022). ReID-DeePNet: A Hybrid Deep Learning System for Person Re-Identification. Mathematics, 10(19), 3530.

Zeebaree, D. Q., Haron, H., Abdulazeez, A. M., & Zebari, D. A. (2019, April). Machine learning and region growing for breast cancer segmentation. In 2019 International Conference on Advanced Science and Engineering (ICOASE) (pp. 88-93). IEEE.

Craven MW, Shavlik JW. Using neural networks for data mining. Futur Gener Comput Syst. 1997;13(2–3):211–29. https://doi.org/ 10.1016/s0167-739x(97)00022-8.

Radhimeenakshi S. Classification and prediction of heart disease risk using data mining techniques of support vector machine and artificial neural networks. In: 2016 International Conference on Computing for Sustainable Global Development (INDIACom); 2016;3107–11.

Perveen S, Shahbaz M, Keshavjee K, Guergachi A. Metabolic syn- drome and development of diabetes mellitus: predictive modeling based on machine learning techniques, IEEE Access. IEEE. 2019;7: 1365–75

Perveen S, et al. Performance analysis of data mining classification techniques to predict diabetes. Procedia Computer Science. 2016;82:115–21.

Alade OM, Sowunmi OY. Information technology science. 2018;724:14–22.

Putri, NK, Rustam Z and Sarwinda D 2019, Learning Vector Quantization for Diabetes Data Classification with Chi-Square Feature Selection IOP Conf. Ser.: Mater. Sci. and Eng. 546 052059

Nurhayati and A N 2014 Implementation of Naive Bayes and K-nearest neighbor algorithm for diagnosis of diabetes mellitus Proc. of the 13th Int. Conference on Applied Computer and Applied Computation Science 117-120

Vinoth R et al 2014 A Hybrid Text Classification Approach Using KNN and SVM International Journal of Advance Foundation and Research in Computer (IJAFRC) 1 3

Nadira T and Rustam Z 2018 Classification of cancer data using support vector machines with features selection method based on global artificial bee colony Proceedings of the 3rd International Symposium on Current Progress in Mathematics and Science, AIP Conf. Proc.

Arfiani, Rustam Z, Pandelaki J, and Siahaan A 2019 Kernel Spherical K-Means and Support Vector Machine for Acute Sinusitis Classification IOP Conf. Ser.: Mater. Sci. and Eng. 546 052011

Rampisela T V and Rustam Z 2018 Classification of Schizophrenia Data Using Support Vector Machine (SVM) J. Phys.: Conf. Ser. 1108 012044

K. Bache and M. Lichman, “UCI Machine Learning Repository,” University of California Irvine School of Information, vol. 2008, no. 14/8. p. 0, 2013.

AYDÏLEK, Ï. B. (2018, September). Examining effects of the support vector machines kernel types on biomedical data classification. In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP) (pp. 1-4). IEEE.

Krstajic, D.; Buturovic, L.J.; Leahy, D.E.; Thomas, S. Cross-validation pitfalls when selecting and assessing regression and classification models. J. Cheminformatics 2014, 6, 10.

Haji, S. H., Abdulazeez, A. M., Zeebaree, D. Q., Ahmed, F. Y., & Zebari, D. A. (2021, July). The Impact of Different Data Mining Classification Techniques in Different Datasets. In 2021 IEEE Symposium on Industrial Electronics & Applications (ISIEA) (pp. 1-6). IEEE.

Asaad, R. R., & Ali, R. I. (2019). Back Propagation Neural Network(BPNN) and Sigmoid Activation Function in Multi-Layer Networks. Academic Journal of Nawroz University, 8(4), 216–221. https://doi.org/10.25007/ajnu.v8n4a464

G. Swapna, U. Rajendra Acharya, S. VinithaSree, J. S. Suri, Automated detection of diabetes using higher order spectral features extracted from heart rate signals, Intelligent Data Analysis 17 (2) (2013) 309–326.

Acharya, U. R., Molinari, F., Sree, S. V., Chattopadhyay, S., Ng, K. H., and Suri, J. S., Automated diagnosis of epileptic EEG using entropies. Biomed. Signal Process. Control 7(4):401–408, 2012.

Chicho, B. T., Abdulazeez, A. M., Zeebaree, D. Q., & Zebari, D. A. (2021). Machine learning classifiers-based classification for IRIS recognition. Qubahan Academic Journal, 1(2), 106-118.

Al‐Waisy, A. S., Ibrahim, D., Zebari, D. A., Hammadi, S., Mohammed, H., Mohammed, M. A., & Damaševičius, R. (2022). Identifying defective solar cells in electroluminescence images using deep feature representations. PeerJ Computer Science, 8, e992.

Almufti, S., Asaad, R., & Salim, B. (2018). Review on elephant herding optimization algorithm performance in solving optimization problems. International Journal of Engineering & Technology, 7, 6109-6114.

Mohapatra, N., Shreya, K., & Chinmay, A. (2020). Optimization of the random forest algorithm. In Advances in data science and management (pp. 201-208). Springer, Singapore.‏

AYDÏLEK, Ï. B. (2018, September). Examining effects of the support vector machines kernel types on biomedical data classification. In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP) (pp. 1-4). IEEE.

Chicco, D., & Jurman, G. (2020). Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC medical informatics and decision making, 20(1), 1-16.

Wang, J. (2021, September). Heart Failure Prediction with Machine Learning: A Comparative Study. In Journal of Physics: Conference Series (Vol. 2031, No. 1, p. 012068). IOP Publishing.

Auxilia, L. A. (2018, May). Accuracy prediction using machine learning techniques for Indian patient liver disease. In 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI) (pp. 45-50). IEEE.

A. Anagaw, Y.L. Chang, “A new complement naïve bayesian approach for biomedical data classification”, Journal of Ambient Intelligent and Humanized Computing, vol. 10, pp. 3889 - 3897, 2019.

Aversano, L., Bernardi, M. L., Cimitile, M., Iammarino, M., Macchia, P. E., Nettore, I. C., & Verdone, C. (2021). Thyroid disease treatment prediction with machine learning approaches. Procedia Computer Science, 192, 1031-1040.

Habets, J. G., Janssen, M. L., Duits, A. A., Sijben, L. C., Mulders, A. E., De Greef, B., ... & Herff, C. (2020). Machine learning prediction of motor response after deep brain stimulation in Parkinson’s disease—proof of principle in a retrospective cohort. PeerJ, 8, e10317.

Nishat, M. M., Hasan, T., Nasrullah, S. M., Faisal, F., Asif, M. A. A. R., & Hoque, M. A. (2021, August). Detection of Parkinson's Disease by Employing Boosting Algorithms. In 2021 Joint 10th International Conference on Informatics, Electronics & Vision (ICIEV) and 2021 5th International Conference on Imaging, Vision & Pattern Recognition (icIVPR)(pp. 1-7). IEEE.

Turanoglu-Bekar, E., Ulutagay, G., & Kantarcı-Savas, S. (2016). Classification of thyroid disease by using data mining models: a comparison of decision tree algorithms. Oxford Journal of Intelligent Decision and Data Sciences, 2, 13-28.

Reshi, A. A., Ashraf, I., Rustam, F., Shahzad, H. F., Mehmood, A., & Choi, G. S. (2021). Diagnosis of vertebral column pathologies using concatenated resampling with machine learning algorithms. PeerJ Computer Science, 7, e547.

Karabulut, E. M., & Ibrikci, T. (2014). Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing. Journal of medical systems, 38(5), 1-9.

Abdulazeez, A. M., Zeebaree, D. Q., Zebari, D. A., & Hameed, T. H. (2021). Leaf Identification Based on Shape, Color, Texture and Vines Using Probabilistic Neural Network. Computación y Sistemas, 25(3), 617-631.

Patil, V., & Ingle, D. R. (2021, June). Comparative analysis of different ML classification algorithms with diabetes prediction through Pima Indian diabetics dataset. In 2021 International Conference on Intelligent Technologies (CONIT) (pp. 1-9). IEEE.

Chang, V., Bailey, J., Xu, Q. A., & Sun, Z. (2022). Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications, 1-17.


Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., & Tang, H. (2018). Predicting diabetes mellitus with machine learning techniques. Frontiers in genetics, 9, 515.

Theerthagiri, P., & Vidya, J. (2021). Diagnosis and Classification of the Diabetes Using Machine Learning Algorithms.

Saha, P. K., Patwary, N. S., & Ahmed, I. (2019, December). A widespread study of diabetes prediction using several machine learning techniques. In 2019 22nd International Conference on Computer and Information Technology (ICCIT)(pp. 1-5). IEEE.

Samet, S., Laouar, M. R., & Bendib, I. (2021, October). Use of Machine Learning Techniques to Predict Diabetes at an Early Stage. In 2021 International Conference on Networking and Advanced Systems (ICNAS) (pp. 1-6). IEEE.

Rajab Asaad, R. (2021). Review on Deep Learning and Neural Network Implementation for Emotions Recognition . Qubahan Academic Journal, 1(1), 1–4. https://doi.org/10.48161/qaj.v1n1a25

Taher, K. I., Abdulazeez, A. M., & Zebari, D. A. (2021). Data Mining Classification Algorithms for Analyzing Soil Data. Asian Journal of Research in Computer Science, 17-28.

Khalid, L. F., Abdulazeez, A. M., Zeebaree, D. Q., Ahmed, F. Y., & Zebari, D. A. (2021, July). Customer churn prediction in telecommunications industry based on data mining. In 2021 IEEE Symposium on Industrial Electronics & Applications (ISIEA) (pp. 1-6). IEEE.

Asaad, R. R., Mustafa, R. F., & Hussien, S. I. (2020). Mortality Statistics and Cause of Death at Duhok City from The Period (2014-2019) Using R Language Data Analytics. Academic Journal of Nawroz University, 9(3), 1–7. https://doi.org/10.25007/ajnu.v9n3a699



How to Cite

CHALO, S., & Berkan AYDİLEK, İbrahim. (2023). A New Preprocessing Method for Diabetes and Biomedical Data Classification. Qubahan Academic Journal, 2(4), 6–18. https://doi.org/10.48161/qaj.v2n4a135