OBJECTIVE: The objective of this study is to present the concrete impact of air pollution and tobacco use on lung disease by using a data engineering approach and acquired datasets.
MATERIAL AND METHODS: To demonstrate the relationship between the air pollution and the tobacco use with the lung diseases, various relevant datasets are acquired. These datasets present not only Turkey but also worldwide situation. Datasets used in this study present the population, industrial growth, number of motor vehicles, forest area size, tobacco use rate, air pollution, number of death due to asthma, lung disease, tobacco use and air pollution. In total, 10 different datasets are gathered to prove our objective. To achieve our objective with the acquired materials, a data engineering approach is adopted. From a data engineering point of view, each dataset represents a variable for the calculation. With the data science engineering techniques used in this study, existing relations between these variables are clearly stated. Besides, with this information, a cause-consequence matching is achieved as well. In this study, covariance, correlation analyses are executed on the datasets. Moreover, multi-linear regression is performed for the forecasting.
Results: Relations between the various datasets are explored and results are divided into 3 clusters based on the relations. Among the explored relations, the most significant relation is discovered between the tobacco use rate and its effects on death rates. This relation is measured around 93-94%, which can be considered as a high risk.
Conclusion: Results show the concrete impacts of deforestation on air pollution, increase in tobacco use especially in easy ages causes lung disease in worldwide. These results indicate a global warning about various senses: the importance of the forest area size to balance the air quality, regulations about the number of motor vehicles, and the tobacco selling to young people are highly required.
Cite this article as: Pinarer O. Analyzing the relationship between air pollution, tobacco use with lung diseases via data engineering approach.