Effective Analysis of Air Pollution Using Decision Tree, Naive Bayes and Zeror Classifiers
Main Article Content
Abstract
One of the leading factors for survival of living organisms on the earth is air. In the last twenty years, the world has been advancing in technology which has impacted negatively on the atmosphere around us thereby polluting the air. This explained why many researchers placed value on accurately forecasting the levels of pollution in the air. Also, an effective air quality management greatly depends on accurate air prediction. Recently, machine learning techniques are widely used in knowledge discovery. The study, therefore, analyzed the effectiveness of three machine learning techniques on air pollution dataset that was downloaded from Kaggle repository. The dataset consists of 15 attributes and 29532 instances which were further divided into 70% for training and 30% for testing. The metrics used for evaluation include: classification accuracy, error rate, execution time, mean absolute error, Root mean squared error (RMSE), confusion matrix and area under curve (AUC). The stimulation was done using WEKA statistical tool. The results showed error rate value of 0, 0.7, and 64.5% for Decision tree, Naive Bayes and ZeroR respectively. Area under curve value of 1, 0.992 and 0.499; Mean absolute error value 0, 0.0034 and 0.2457; Root mean squared error (RMSE) value of 0, 0.0427 and 0.3505; and Kappa statistic value of 1, 0.9903 and 0 for Decision tree, Naive Bayes and ZeroR respectively. Based on the analysis, the study concluded that decision tree algorithm recorded the highest prediction accuracy followed by Naive Bayes and ZeroR based on the datasets used. The study, therefore, recommends that the performance of other classification algorithms could be tested on the datasets.