High Accuracy Seasonal Hurricane Intensity Prediction Using Outgoing Longwave Radiation Maps
Background Information: North Atlantic Hurricane Season. (n.d.). Retrieved July 5, 2021, from https://www.cpc.ncep.noaa.gov/products/outlooks/Background.html
Climate prediction center—Atlantic hurricane outlook archive. (n.d.). Retrieved March 1, 2022, from https://www.cpc.ncep.noaa.gov/products/outlooks/hurricane-archive.shtml
Comaniciu, A. (2020). A Novel Seasonal Prediction of Atlantic Hurricanes using Neural Networks. https://mercersec.org/fair/projects/project/187
Dataset: Noaa ncep-ncar cdas-1 monthly. (n.d.). Retrieved July 5, 2021, from https://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP-NCAR/.CDAS-1/.MONTHLY/
HURDAT comparison table. (n.d.). Retrieved July 5, 2021, from https://www.aoml.noaa.gov/hrd/hurdat/comparison_table.html
Hurricane damage | center for science education. (n.d.). Retrieved March 1, 2022, from https://scied.ucar.edu/learning-zone/storms/hurricane-damage
Karnauskas, K. B., & Li, L. (2016). Predicting Atlantic seasonal hurricane activity using outgoing longwave radiation over Africa: African OLR and Atlantic Hurricanes. Geophysical Research Letters, 43(13), 7152–7159. https://doi.org/10.1002/2016GL069792
Sklearn. Linear_model. Logisticregression. (n.d.). Scikit-Learn. Retrieved March 11, 2022, from https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Sklearn. Neighbors. Kneighborsclassifier. (n.d.). Scikit-Learn. Retrieved March 11, 2022, from https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
Udemy: Machine Learning A-Z: Hands on Python & R in Data Science (2019). https://www.udemy.com/course/data-science-and-machine-learning-bootcamp-with-python-and-r/
National Center for Atmospheric Research Staff (Eds). Last modified 05 Sep 2014. "The Climate Data Guide: Outgoing Longwave Radiation (OLR): AVHRR ." Retrieved from https://climatedataguide.ucar.edu/climate-data/outgoing-longwave-radiation-olr-avhrr.
Tropical cyclone climatology. (n.d.). Retrieved March 1, 2022, from https://www.nhc.noaa.gov/climo/
What are el nino and la nina? (n.d.). https://oceanservice.noaa.gov/facts/ninonina.html
USA: Expected costs of damage from hurricane winds and storm-related flooding. (n.d.). Retrieved March 1, 2022, from https://www.preventionweb.net/publication/usa-expected-costs-damage-hurricane-winds-and-storm-related-flooding
Visualizing your convolutional neural network predictions with saliency maps. (2019, June 21). Open Data Science. https://odsc.medium.com/visualizing-your-convolutional-neural-network-predictions-with-saliency-maps-9604eb03d766
What is the relation between Logistic Regression and Neural Networks and when to use which? (2022, March 4). Dr. Sebastian Raschka. https://sebastianraschka.com/faq/docs/logisticregr-neuralnet.html
Additional Project Information
Question being addressed:
Will a Machine Learning algorithm with Outgoing Longwave Radiation (OLR) Maps as input result in a high accuracy seasonal hurricane prediction with three intensity classes: high, medium, and low intensity?
Every year, the hurricane season impacts millions of people in the United States, creating billions of dollars in property damage and severe disruptions in people’s lives. The ability to accurately predict the strength of the upcoming hurricane season ahead of time is critical to allow the authorities to prepare for potentially devastating events.
This project is a continuation of the project I presented in 2020, in which I used neural networks and Sea Surface Temperature (SST) data to accurately predict the upcoming hurricane season as high intensity or medium-low Intensity (Comaniciu, 2020). The SST-based predictor resulted in 83% accuracy (on average) for two classes, which was a significant improvement over the 65% accuracy for an equivalent NOAA’s two-class prediction for the past 20 years.
NOAA’s prediction classifies the upcoming hurricane season using 3 classes: high intensity, low intensity and medium intensity. However, using SST data input, my neural network was not able to accurately distinguish between 3 classes, and achieved accuracy just over random chance with 3 classes. My conclusion was that the SST data lacked enough information. In this project, I will try to implement predictors for 3 classes: high, medium, and low, using Outgoing Longwave Radiation (OLR) data as input instead of SST data. The National Oceanic and Atmospheric Administration (NOAA) interprets the corresponding number of storms, hurricanes, and major hurricanes based on the 3 intensity classes it predicts, so having the same 3 class prediction is important because it offers a better understanding of the upcoming hurricane season’s outlook (“Background Information: North Atlantic Hurricane Season,” n.d).
- Download and preprocess the OLR data maps.
- Create ACE class labels.
- Generate training and test sets.
- Train the machine learning algorithms.
- Use Machine Learning algorithms to predict classes for the test set.
- Determine the accuracy of prediction and compare the different predictors.
- Analyze and interpret the results.
The OLR Maps will be downloaded from NOAA NCEP-NCAR CDAS-1 MONTHLY, using the variable name “Diagnostic top upward longwave flux” (“Dataset: Noaa ncep-ncar cdas-1 monthly,” n.d).
I will create the class labels using ACE intensity values downloaded from NOAA’s website, and classify them based on the NOAA classification in high, medium and low intensity seasons: if ACE > 111 10^4 kt^2, the seasons’ activity is considered high, if ACE < 66 10^4 kt^2, the season activity is low, and the season is considered to be medium intensity for values in between (“HURDAT comparison table,” n.d.).
Experimental Design and Data Analysis:
The machine learning algorithms will be trained using data from the month of July for each year (July OLR based index was shown in a previous paper to be highly correlated to the annual number of storms) paired with an ACE class label (0 for high, 1 for low, 2 for medium) (Karnauskas & Li, 2016). The test data will be given to the predictor without the class label and the output of the predictor class will be compared to the actual class label and the accuracy of the predictor will be computed as the percentage of correct predictions.
Similarly to my previous project, which used SST data for prediction, I plan to use a simple Convolutional Neural Network with 5 layers and 5 filters in the convolutional layer in order to predict the upcoming hurricane season’s intensity, but now the input data will be OLR data map images.
Since the available OLR data set is limited to 73 samples, I plan to use image augmentation to eliminate overfitting (Udemy, 2019).
I will implement additional Machine Learning Algorithms: k-Nearest Neighbors (k-NN) and Logistic Regression and I will compare the accuracy of the predictions for the three models. k-NN is one of the simplest algorithms for classification, but it is shown to perform well for many datasets. Logistic Regression is a classic machine learning algorithm and it is often used for classification and prediction because of its very good performance for many applications. For k-NN, I plan to experiment with the number of neighbors to get the best prediction accuracy.
All the classifiers for 3 class prediction will be trained on the same training set (with image augmentation for CNN) and will be tested on the last 15 years for prediction, which is the testing method that was also used in the previous paper that studied an OLR based index for predicting the number of storms.
Finally, I plan to use saliency maps to better understand the prediction models and to validate them using geophysical properties that are known to influence the hurricane intensity for a given season (“Visualizing your convolutional neural network,” 2019, June 21).
Materials: Macbook laptop.
Risk Assessment: There is no risk associated with this project.
Questions and Answers
1. What was the major objective of your project and what was your plan to achieve it?
The major objective of my project was to predict the upcoming hurricane season’s intensity with high accuracy by analyzing Outgoing Longwave Radiation data using Artificial Intelligence.
a. Was that goal the result of any specific situation, experience, or problem you encountered?
I wanted to predict seasonal hurricane intensity with as much accuracy as possible, because more confidence in the prediction will prompt more resource allocations and better preparation for upcoming hurricanes in the case of a high intensity season, which will save lives and billions of dollars in property damage.
b. Were you trying to solve a problem, answer a question, or test a hypothesis?
I wanted to solve the problem of designing an accurate predictor for the upcoming hurricane season.
2. What were the major tasks you had to perform in order to complete your project?
I had to perform data acquisition and processing, and I tested different machine learning algorithms: Convolutional Neural Networks (CNN), Logistic Regression classifier, and k-Nearest Neighbors (k-NN). I also tried to interpret my results using saliency maps to understand and validate which regions are important for prediction.
3. What is new or novel about your project?
The idea of using Outgoing Longwave Radiation maps as input for Machine Learning predictors is novel.
a. Is there some aspect of your project's objective, or how you achieved it that you haven't done before?
In my previous project, I used Convolutional Neural Networks (CNN) as a predictor with a different data type: Sea Surface Temperature Maps. In this project, while I still considered CNN architecture, but applied to Outgoing Longwave Radiation (OLR) maps, I also implemented two other machine learning techniques: Logistic Regression classifier and k-Nearest Neighbors, and I compared the prediction skill for all three models.
b. Is your project's objective, or the way you implemented it, different from anything you have seen?
Using OLR maps for prediction is completely novel. My idea stemmed from a previous paper published in Geophysical Research Letters which calculated an index based on OLR data, which they showed to be correlated with the number of storms expected in the upcoming season, but they did not succeed in accurately predicting seasonal hurricane intensity. I hypothesized that using the entire OLR map data will provide more information needed for a more accurate prediction of the upcoming hurricane season’s intensity.
c. If you believe your work to be unique in some way, what research have you done to confirm that it is?
I have done literature search, and the only paper that mentions using Outgoing Longwave Radiation data calculated an index based on OLR data and did not use the entire map.
4. What was the most challenging part of completing your project?
The most challenging part of completing my project was the low amount of Outgoing Longwave Radiation data accessible, since only the years 1949 to 2021 were available. Convolutional Neural Networks learn complex models and require a large number of samples to be able to generalize well and not overfit.
a. What problems did you encounter, and how did you overcome them?
I overcame the problem of the low amounts of available data for Neural Networks, which is a complex model, so without enough data, the model overfits (learns the training data too closely and is not able to generalize patterns), by using noisy images. Noisy images are slight alterations to the images; for example, random combinations of translations, rotations, and zoom-ins. Another way I alleviated overfitting for all three models was reducing the dimensions of the data by resizing the image to a 36 by 36 square image (also, CNN works best with square images), because the complexity of a machine learning model is proportional with the dimensionality of the data times the number of classes.
b. What did you learn from overcoming these problems?
I learned that reducing the dimension of images and using noisy images help control overfitting for CNN. However, for the Logistic Regression classifier, which is not able to interpret images, running classifications with noisy images lead to poor results, so I did not use the enhanced set for this classifier. Logistic Regression still overfit to some extent in my final result, but the accuracy for the test set was close to the accuracy obtained by the CNN model. I measured the overfitting by looking at the difference between the accuracy of the training set and the accuracy of the test set. I also learned that after using noisy data, my neural network had more variability in the results, so I had to run the model 10 times and get the mean, standard deviation, and computed confidence intervals in order to interpret the results.
5. If you were going to do this project again, are there any things you would you do differently the next time?
Initially I did experiments with the full images, which were not performing as well as the resized 36 by 36 square image and were taking more time to run, so if I were to repeat this experiment, I would resize the image dimensions earlier.
6. Did working on this project give you any ideas for other projects?
My accurate results made me think about my impact, and how I can reach out to people to share my prediction. I created a website, in which I plan to post updates each year with the upcoming hurricane season’s intensity prediction.
7. How did COVID-19 affect the completion of your project?
My work was based on computer simulations, and I was able to keep contact with my advisor through email, so my project was not affected by the pandemic.