Lung cancer Prediction and Classification based on Correlation Selection method Using Machine Learning Techniques


  • Dakhaz Mustafa Abdullah Duhok Polytechnic University
  • Adnan Mohsin Abdulazeez Presidency of Duhok Polytechnic University Duhok Polytechnic University Duhok, Iraq
  • Amira Bibo Sallow College of Engineering Nawroz University Duhok, Iraq



Lung cancer is one of the leading causes of mortality in every country, affecting both men and women. Lung cancer has a low prognosis, resulting in a high death rate. The computing sector is fully automating it, and the medical industry is also automating itself with the aid of image recognition and data analytics. This paper endeavors to inspect accuracy ratio of three classifiers which is Support Vector Machine (SVM), K-Nearest Neighbor (KNN)and, Convolutional Neural Network (CNN) that classify lung cancer in early stage so that many lives can be saving. Basically, the informational indexes utilized as a part of this examination are taken from UCI datasets for patients affected by lung cancer. The principle point of this paper is to the execution investigation of the classification algorithms accuracy by WEKA Tool. The experimental results show that SVM gives the best result with 95.56%, then CNN with CNN 92.11% and KNN with 88.40%.


Download data is not yet available.


P. Chaudhari, H. Agarwal, and V. Bhateja, “Data augmentation for cancer classification in oncogenomics: an improved KNN based approach,” Evol. Intell., pp. 1–10, 2019.

S. F. Khorshid and A. M. Abdulazeez, “BREAST CANCER DIAGNOSIS BASED ON K-NEAREST NEIGHBORS: A REVIEW,” PalArch’s J. Archaeol. Egypt/Egyptology, vol. 18, no. 4, pp. 1927–1951, 2021.

F. Q. Kareem and A. M. Abdulazeez, “Ultrasound Medical Images Classification Based on Deep Learning Algorithms: A Review.”

D. Q. Zeebaree, A. M. Abdulazeez, D. A. Zebari, H. Haron, and H. N. A. Hamed, “Multi-Level Fusion in Ultrasound for Cancer Detection Based on Uniform LBP Features.”

J. R. F. Junior, M. Koenigkam-Santos, F. E. G. Cipriano, A. T. Fabro, and P. M. de Azevedo-Marques, “Radiomics-based features for pattern recognition of lung cancer histopathology and metastases,” Comput. Methods Programs Biomed., vol. 159, pp. 23–30, 2018.

I. Ibrahim and A. Abdulazeez, “The Role of Machine Learning Algorithms for Diagnosing Diseases,” J. Appl. Sci. Technol. Trends, vol. 2, no. 01, pp. 10–19, 2021.

P. Das, B. Das, and H. S. Dutta, “Prediction of Lungs Cancer Using Machine Learning,” EasyChair, 2020.

G. A. P. Singh and P. K. Gupta, “Performance analysis of various machine learning-based approaches for detection and classification of lung cancer in humans,” Neural Comput. Appl., vol. 31, no. 10, pp. 6863–6877, 2019.

B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” J. Appl. Sci. Technol. Trends, vol. 2, no. 01, pp. 20–28, 2021.

H. A. Hussein and A. M. Abdulazeez, “COVID-19 PANDEMIC DATASETS BASED ON MACHINE LEARNING CLUSTERING ALGORITHMS: A REVIEW,” PalArch’s J. Archaeol. Egypt/Egyptology, vol. 18, no. 4, pp. 2672–2700, 2021.

D. M. Abdullah and N. S. Ahmed, “A Review of most Recent Lung Cancer Detection Techniques using Machine Learning,” Int. J. Sci. Bus., vol. 5, no. 3, pp. 159–173, 2021.

M. I. Faisal, S. Bashir, Z. S. Khan, and F. H. Khan, “An evaluation of machine learning classifiers and ensembles for early stage prediction of lung cancer,” in 2018 3rd International Conference on Emerging Trends in Engineering, Sciences and Technology (ICEEST), 2018, pp. 1–4.

D. Q. Zeebaree, H. Haron, and A. M. Abdulazeez, “Gene selection and classification of microarray data using convolutional neural network,” in 2018 International Conference on Advanced Science and Engineering (ICOASE), 2018, pp. 145–150.

D. Q. Zeebaree, H. Haron, A. M. Abdulazeez, and D. A. Zebari, “Trainable model based on new uniform LBP feature to identify the risk of the breast cancer,” in 2019 International Conference on Advanced Science and Engineering (ICOASE), 2019, pp. 106–111.

H. Tang, J. Zhao, and X. Yang, “Explore machine learning for analysis and prediction of lung cancer related risk factors,” in Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence, 2018, pp. 41–45.

P. R. Radhika, R. A. S. Nair, and G. Veena, “A Comparative Study of Lung Cancer Detection using Machine Learning Algorithms,” in 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), 2019, pp. 1–4.

A. I. Rahmani and M. Katouli, “Diagnosing Lung Cancer Using Grasshopper Optimization Algorithm and k-Nearest Neighbor Classification,” J. homepage http//iieta. org/journals/rces, vol. 6, no. 4, pp. 69–75, 2019.

Y. Nai et al., “Improving Lung Lesion Detection in Low Dose Positron Emission Tomography Images Using Machine Learning,” in 2018 IEEE Nuclear Science Symposium and Medical Imaging Conference Proceedings (NSS/MIC), 2018, pp. 1–3.

S. Senthil and B. Ayshwarya, “Lung cancer prediction using feed forward back propagation neural networks with optimal features,” Int. J. Appl. Eng. Res., vol. 13, no. 1, pp. 318–325, 2018.


M. Somvanshi, P. Chavan, S. Tambade, and S. V. Shinde, “A review of machine learning techniques using decision tree and support vector machine,” Proc. - 2nd Int. Conf. Comput. Commun. Control Autom. ICCUBEA 2016, 2017, doi: 10.1109/ICCUBEA.2016.7860040.

D. M. Abdulqader, A. M. Abdulazeez, and D. Q. Zeebaree, “Machine Learning Supervised Algorithms of Gene Selection: A Review,” Mach. Learn., vol. 62, no. 03, 2020.

O. Ahmed and A. Brifcani, “Gene Expression Classification Based on Deep Learning,” in 2019 4th Scientific International Conference Najaf (SICN), 2019, pp. 145–149.

N. O. M. Salim and A. M. Abdulazeez, “Human Diseases Detection Based On Machine Learning Algorithms: A Review,” Int. J. Sci. Bus., vol. 5, no. 2, pp. 102–113, 2021.

N. M. Abdulkareem and A. M. Abdulazeez, “Machine Learning Classification Based on Radom Forest Algorithm: A Review,” Int. J. Sci. Bus., vol. 5, no. 2, pp. 128–142, 2021.

R. Sathishkumar, K. Kalaiarasan, A. Prabhakaran, and M. Aravind, “Detection of Lung Cancer using SVM Classifier and KNN Algorithm,” in 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN), 2019, pp. 1–7.

S. Uddin, A. Khan, M. E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease prediction,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–16, 2019.

N. Najat and A. M. Abdulazeez, “Gene clustering with partition around mediods algorithm based on weighted and normalized Mahalanobis distance,” in 2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), 2017, pp. 140–145.

S. Hussein, P. Kandel, C. W. Bolan, M. B. Wallace, and U. Bagci, “Lung and pancreatic tumor characterization in the deep learning era: novel supervised and unsupervised learning approaches,” IEEE Trans. Med. Imaging, vol. 38, no. 8, pp. 1777–1787, 2019.

B. M. S. Hasan and A. M. Abdulazeez, “A Review of Principal Component Analysis Algorithm for Dimensionality Reduction,” J. Soft Comput. Data Min., vol. 2, no. 1, pp. 20–30, 2021.

D. M. Sulaiman, A. M. Abdulazeez, H. Haron, and S. S. Sadiq, “Unsupervised Learning Approach-Based New Optimization K-Means Clustering for Finger Vein Image Localization,” in 2019 International Conference on Advanced Science and Engineering (ICOASE), 2019, pp. 82–87.

H. U. Dike, Y. Zhou, K. K. Deveerasetty, and Q. Wu, “Unsupervised learning based on artificial neural network: A review,” in 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS), 2018, pp. 322–327.

H. R. Abdulqadir and A. M. Abdulazeez, “Reinforcement Learning and Modeling Techniques: A Review,” Int. J. Sci. Bus., vol. 5, no. 3, pp. 174–189, 2021.

K. Roy et al., “A Comparative study of Lung Cancer detection using supervised neural network,” in 2019 International Conference on Opto-Electronics and Applied Optics (Optronix), 2019, pp. 1–5.

S. Baskar, P. M. Shakeel, K. P. Sridhar, and R. Kanimozhi, “Classification System for Lung Cancer Nodule Using Machine Learning Technique and CT Images,” in 2019 International Conference on Communication and Electronics Systems (ICCES), 2019, pp. 1957–1962.

B. M. Boban and R. K. Megalingam, “Lung Diseases Classification based on Machine Learning Algorithms and Performance Evaluation,” in 2020 International Conference on Communication and Signal Processing (ICCSP), 2020, pp. 315–320.

A. Sreekumar, K. R. Nair, S. Sudheer, H. G. Nayar, and J. J. Nair, “Malignant Lung Nodule Detection using Deep Learning,” in 2020 International Conference on Communication and Signal Processing (ICCSP), 2020, pp. 209–212.

N. Banerjee and S. Das, “Prediction Lung Cancer–In Machine Learning Perspective,” in 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), 2020, pp. 1–5.

N. Maleki, Y. Zeinali, and S. T. A. Niaki, “A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection,” Expert Syst. Appl., vol. 164, p. 113981, 2021.

D. Reddy, E. N. H. Kumar, D. Reddy, and P. Monika, “Integrated Machine Learning Model for Prediction of Lung Cancer Stages from Textual data using Ensemble Method,” in 2019 1st International Conference on Advances in Information Technology (ICAIT), 2019, pp. 353–357.

Ö. Günaydin, M. Günay, and Ö. Şengel, “Comparison of lung cancer detection algorithms,” in 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), 2019, pp. 1–4.

A. Elnakib, H. M. Amer, and F. E. Z. Abou-Chadi, “Early Lung Cancer Detection Using Deep Learning Optimization,” 2020.

S. M. Salaken, A. Khosravi, A. Khatami, S. Nahavandi, and M. A. Hosen, “Lung cancer classification using deep learned features on low population dataset,” in 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), 2017, pp. 1–5.

A. Asuntha and A. Srinivasan, “Deep learning for lung Cancer detection and classification,” Multimed. Tools Appl., vol. 79, no. 11, pp. 7731–7762, 2020.

W. Rahane, H. Dalvi, Y. Magar, A. Kalane, and S. Jondhale, “Lung cancer detection using image processing and machine learning healthcare,” in 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 2018, pp. 1–5.

H. S. Yahia and A. M. Abdulazeez, “Medical Text Classification Based on Convolutional Neural Network: A Review,” Int. J. Sci. Bus., vol. 5, no. 3, pp. 27–41, 2021.

S. Potghan, R. Rajamenakshi, and A. Bhise, “Multi-Layer Perceptron Based Lung Tumor Classification,” in 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2018, pp. 499–502.

S. S. Raoof, M. A. Jabbar, and S. A. Fathima, “Lung Cancer Prediction using Machine Learning: A Comprehensive Approach,” in 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 2020, pp. 108–115.

J. Saeed and A. M. Abdulazeez, “Facial Beauty Prediction and Analysis Based on Deep Convolutional Neural Network: A Review,” J. Soft Comput. Data Min., vol. 2, no. 1, pp. 1–12, 2021.

Y. Lei, B. Yang, X. Jiang, F. Jia, N. Li, and A. K. Nandi, “Applications of machine learning to machine fault diagnosis: A review and roadmap,” Mech. Syst. Signal Process., vol. 138, p. 106587, 2020.

N. Omar, A. M. Abdulazeez, A. Sengur, and S. G. S. Al-Ali, “Fused faster RCNNs for efficient detection of the license plates,” Indones. J. Electr. Eng. Comput. Sci., vol. 19, no. 2, pp. 974–982, 2020.

Z. Zainudin, S. M. Shamsuddin, and S. Hasan, “Deep Learning for Image Processing in WEKA Environment,” Int. J. Adv. Soft Compu. Appl, vol. 11, no. 1, 2019.

V. Mhetre and M. Nagar, “Classification based data mining algorithms to predict slow, average and fast learners in educational system using WEKA,” in 2017 International Conference on Computing Methodologies and Communication (ICCMC), 2017, pp. 475–479.

Al Janabi, K. B., & Kadhim, R. (2018). Data reduction techniques: a comparative study for attribute selection methods. International Journal of Advanced Computer Science and Technology, 8(1), 1-13.

Sugianela, Y., & Ahmad, T. (2020, February). Pearson Correlation Attribute Evaluation-based Feature Selection for Intrusion Detection System. In 2020 International Conference on Smart Technology and Applications (ICoSTA) (pp. 1-5). IEEE.

Demisse, G. B., Tadesse, T., & Bayissa, Y. (2017). Data mining attribute selection approach for drought modeling: A case study for Greater Horn of Africa. arXiv preprint arXiv:1708.05072.

Kumar, S., & Chong, I. (2018). Correlation analysis to identify the effective data in machine learning: Prediction of depressive disorder and emotion states. International journal of environmental research and public health, 15(12), 2907.

O. Caelen, “A Bayesian interpretation of the confusion matrix,” Ann. Math. Artif. Intell., vol. 81, no. 3, pp. 429–450, 2017.

N. Milosevic, A. Dehghantanha, and K.-K. R. Choo, “Machine learning aided Android malware classification,” Comput. Electr. Eng., vol. 61, pp. 266–274, 2017.

J. Xu, Y. Zhang, and D. Miao, “Three-way confusion matrix for classification: A measure driven view,” Inf. Sci. (Ny)., vol. 507, pp. 772–794, 2020.

Z. Yang, T. Zhang, J. Lu, D. Zhang, and D. Kalui, “Optimizing area under the ROC curve via extreme learning machines,” Knowledge-Based Syst., vol. 130, pp. 74–89, 2017.

D. Brzezinski and J. Stefanowski, “Prequential AUC: properties of the area under the ROC curve for data streams with concept drift,” Knowl. Inf. Syst., vol. 52, no. 2, pp. 531–562, 2017.



How to Cite

Mustafa Abdullah, D., Mohsin Abdulazeez, A., & Bibo Sallow, A. (2021). Lung cancer Prediction and Classification based on Correlation Selection method Using Machine Learning Techniques. Qubahan Academic Journal, 1(2), 141–149.




Most read articles by the same author(s)