PhenomDetect: Detection of Air Hazards in the U. S.- SVM, Random Forest, Gradient Boost, XGBoost, KNN, LSTM, GRU, Tableau
This project is a team effort of Team ‘Vesper’ participated in the NASA International Space Apps Challenge'20. Our team took on the challenge of the automatic air hazard detection because we were inspired by the idea of building a tool that could potentially save many lives just by automatically analyzing data from a variety of sources and putting this analysis into the hands of key decision-makers, as well as the general public.
Our approach in solving the problem involved investigating several machine learning models to automate the detection of hazards (e.g., air hazards), building a dashboard to visualize the detections and incorporating ancillary data in an attempt to show the scope and impact of the detected hazards.
To develop the machine learning model, we utilized popular python libraries like scikit-learn and TensorFlow, which allowed us to explore the data, construct new features and evaluate several machine learning models. The following strategies are considered:
- Exploratory data analysis has been performed.
- 2 new features (i.e., wind_direction, wind_speed) are engineered.
- Other demographics (i.e., population density, COPD patients, forest percentage, number of registered vehicles) of specific regions are also integrated as features.
- Correraltion heatmap is generated to investigate inter feature correlation.
- Different models have been built e.g., Linear Regression, SVM, Random Forest, Gradient Boosting Regression, XGBoost Regression, K-Nearest Neighbour, Bidirectional LSTM network, Bidirectional GRU Network, Multilayer Perceptron Network and 1-D Convolutional Network.
- The highest performance is achieved from Random Forest, R2 = 0.56.
- A dashboard is created to visualize the best model’s predicted air quality index (AQI) in 2019.
- Analyzing the predicted data uncovered factors related to health impacts (e.g., Chronic obstructive pulmonary disease) and environmental issues (e.g., number of vehicles) because of air hazard.
This Project’s GitHub Repository