Predicting Alzheimer’s Disease: Development and Validation of Machine Learning Models
Abstract:Alzheimer’s disease is currently ranked as the fourth leading cause of death in the United States, and over 5.5 million people in the US are currently diagnosed with the disease. One critical step to help alleviate the prevalence of Alzheimer’s is to detect Alzheimer’s swiftly and accurately. Despite the growing prevalence of Alzheimer’s, the accurate clinical diagnosis of Alzheimer’s remains a significant challenge among American doctors. The average clinical diagnosis accuracy among American doctors in rural areas is only 50-60%, about the same chance as flipping a coin. Detecting Alzheimer’s accurately in an early stage is critical because it allows doctors to devise effective strategies to mitigate symptoms and manage long-term care. In addition, it provides patients with early intervention to slow down the disease progression, such as memory deterioration, and most importantly, prolong the patient’s life.
The purpose of this research is to empower doctors to diagnose Alzheimer’s more swiftly and accurately by using publicly available MRI data and demographic data from 373 MRI imaging sessions to build machine learning models to predict Alzheimer’s in patients. Various machine learning models, including Logistic Regression, K-Nearest Neighbor, Support Vector Machine, Random Forest, and Neural Network, were developed. Data were divided into training and testing sets where training sets were used to build the predictive models, and testing sets were used to assess the accuracy of prediction. Key risk factors were identified, and various models were compared to come forward with the best prediction model. Among these models, the Random Forest model appeared to be the best model with a prediction accuracy of 90.34%. This accuracy rate is higher than that of the average clinical diagnosis rate of Alzheimer’s among all rural American doctors. All models indicate that Mini-Mental State Examination (MMSE), normalized whole brain volume (nWBV), and gender were the three most influential factors in detecting Alzheimer’s. Pairwise concordance was calculated to demonstrate high concordance among various models. Among all the models used, the percent in which at least 4 of the 5 models shared the same diagnosis for a testing input was 90.42%. The machine learning models built in this study would allow doctors to detect Alzheimer’s more accurately than conventional clinical diagnosis, which ultimately leads to earlier intervention, slower disease progression, and prolonged patient survival.
- Cobb, B. R., Wells, K. R., & Cataldo, L. J. (2012). Alzheimer's Disease. In K. Key (Ed.), The Gale Encyclopedia of Mental Health (3rd ed., Vol. 1, pp. 59-73). Detroit, MI: Gale. Retrieved from https://link.gale.com/apps/doc/CX4013200025/GPS?u=watchunghrhs& sid=GPS&xid=9f274526
- Martone, R. L., & Piotrowski, N. A., PhD. (2019). Alzheimer’s disease. Magill’s Medical Guide (Online Edition).
- Alzheimer’s Disease. (2018). Funk & Wagnalls New World Encyclopedia, 1;
- Yiğit, A., & Işik, Z. (2020). Applying deep learning models to structural MRI for stage prediction of Alzheimer’s disease. Turkish Journal Of Electrical Engineering & Computer Sciences, 28(1), 196-210. doi:10.3906/elk-1904-172
- Kolata, G. (2019, August 01). A Blood Test for Alzheimer's? It's Coming, Scientists Report. Retrieved from https://www.nytimes.com/2019/08/01/health/alzheimers-blood-test.html
- Reinberg, S. (2016, July 26). 2 in 10 Alzheimer's Cases May Be Misdiagnosed. Retrieved from https://www.webmd.com/alzheimers/news/20160726/2-in-10-alzheimers-cases-may-be-misdiagnosed
- J. D. Warren, J. M. Schott, N. C. Fox, M. Thom, T. Revesz, J. L. Holton, F. Scaravilli, D. G. T. Thomas, G. T. Plant, P. Rudge, M. N. Rossor, Brain biopsy in dementia, Brain, Volume 128, Issue 9, September 2005, Pages 2016–2025, https://doi.org/10.1093/brain/awh543
- Marcus, D. S., Fotenos, A. F., Csernansky, J. G., Morris, J. C., & Buckner, R. L. (2010). Open Access Series of Imaging Studies: Longitudinal MRI Data in non-demented and Demented Older Adults. Journal of Cognitive Neuroscience, 22(12), 2677-2684. doi:10.1162/jocn.2009.21407
- Frisoni, G. B., Fox, N. C., Jack, C. R., Scheltens, P., & Thompson, P. M. (2010). The clinical use of structural MRI in Alzheimer disease. Nature Reviews Neurology, 6(2), 67-77. doi:10.1038/nrneurol.2009.215
- Sargolzaei, S., Sargolzaei, A., Cabrerizo, M., Chen, G., Goryawala, M., Noei, S., . . . Adjouadi, M. (2015). A practical guideline for intracranial volume estimation in patients with Alzheimers disease. BMC Bioinformatics, 16(S7). doi:10.1186/1471-2105-16-s7-s8
- Khan, T. (2016). Biomarkers in Alzheimers disease. Amsterdam: Academic Press.
- Medical Tests. (n.d.). Retrieved from https://www.alz.org/alzheimers-dementia/diagnosis/ medical_tests
- Magnin, B., Mesrob, L., Kinkingnéhun, S. et al. Support vector machine-based classification of Alzheimer’s disease from whole-brain anatomical MRI. Neuroradiology 51, 73–83 (2009). https://doi.org/10.1007/s00234-008-0463-x
Additional Project Information
Rationale: Alzheimer’s disease is ranked as the fourth leading cause of death in the US, with 65,800 fatalities attributable to the disease each year. Currently, over 50 million people worldwide were diagnosed with Alzheimer’s. The disease is caused by the degeneration and eventual death of a large number of neurons in several areas of the brain. Early detection and treatment of Alzheimer’s are critical in order to help doctors devise effective strategies to manage symptoms and long-term care, provide patients with early treatment and delay disease progression, lower the cost of treatment, and prolong the patient’s life. Despite the prevalence of Alzheimer’s, especially among the elderly population, diagnosis for Alzheimer’s remains a major challenge. Studies have shown that the average clinical diagnosis accuracy for all American doctors is only 78%, which can have a severe effect on the 1 in 5 patients who are misdiagnosed, and community doctors in rural areas are only about 55 percent accurate in clinically diagnosing Alzheimer’s, which is about the same as tossing a coin. Therefore, the objective of this study is to use MRI imaging data to build various predictive machine learning models including Logistic Regression, K-Nearest Neighbor, Support Vector Machine, Random Forest, and Neural Network in order to help doctors clinically diagnose Alzheimer’s more accurately.
When machine learning models are adequately trained with clinical data including MRI imaging, cognitive assessment, and demographic/baseline characteristics data, they can diagnose Alzheimer’s in patients more accurately and swiftly than conventional methods used by doctors.
Data: The Open Access Series of Imaging Studies (OASIS) data set of MRI in Nondemented and Demented Older Adults is the principal source of data used to train and build the machine learning models. The individuals in the data set were selected from a larger pool who participated in MRI studies at Washington University. The data set contains 373 MRI imaging sessions. In each session, demographic, MRI, and cognitive assessment data were collected from individuals in the data set, including the independent demographic and baseline characteristics variables sex, age, years of education, estimate total intracranial volume, normalized whole-brain volume, and mini-mental state examination. In addition, a conclusive result (the dependent variable) of whether each individual has Alzheimer’s was obtained through a brain biopsy.
Procedure: The same general procedure is used to train each of the five models, including Logistic Regression, K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest, and Neural Network. Note that the R programming language and RStudio IDE are used to carry out this procedure. The first step in the procedure is to randomly split the data into a training set (60% of the data) and a testing set (40% of the data). The second step is to normalize and prepare the data to optimize it for the model training. The third step is to train the model using the training set data. The training process varies between each model. Logistic regression is modeled by this equation: . P is the probability of being diagnosed with Alzheimer’s, x1 through xn represents the independent variables that will be included in the model, ?0 is the intercept, ?1 through ?n are the corresponding regression coefficients for x1 through xn. The logistic regression training process will use the training set data to estimate the regression coefficients to create the most accurate model possible. In the K-Nearest-Neighbor model training process, each MRI imaging session in the training data set is considered as a data point and is plotted in an n-dimensional space (where n is the # of independent variables). In the testing stage, the KNN model will then determine what class a new data point is by finding what class the majority of the K nearest data points are. When training the Support Vector Machine model, like the K-Nearest-Neighbor model, all the data points are plotted in an n-dimensional field. Classification is then performed by finding the hyperplane that best segregates the classes so that the margin is maximized. In the training process for Random Forest, a bootstrapped resampling data set is made for each decision tree by randomly selecting observations from the training data set. Then, a decision tree is built based on that bootstrapped data set. The random forest in this study will have up to 2,000 decision trees. In the training process for the Neural Network, the training process involves forward propagation and backward propagation. In forward propagation, all the weights in the neural network are assigned random values and the final output is calculated. In backward propagation, the weights and biases are modified to minimize the cost function for each neuron and create the optimal neural network. The fourth step is to assess the accuracy of prediction for each model using the 40% testing set in which the prediction will be compared against the true status of Alzheimer’s to assess prediction accuracy. In the fourth step, the model receives the testing data as input and outputs a prediction of whether the patient has Alzheimer’s. Each of the model’s predictions is then compared to the patient’s actual state. The percent of patients in the testing data in which the model’s prediction and the patient’s actual state of Alzheimer’s are the same is then evaluated as the overall accuracy of the model. The fifth step is to fine-tune the model by adjusting the model’s hyperparameters to optimize model accuracy. Specifically, for the K-Nearest-Neighbor model, the K value is adjusted to optimize model accuracy. For the Support Vector Machine, the c value (which refers to the penalty of having data points in the margin) is adjusted. For the Random Forest, hyperparameters like the number of decision trees and the number of independent variables used in each decision tree are adjusted. For the Neural Network, hyperparameters like the type of cost function, type of backpropagation algorithm, hidden layer layout, and type of activation function are adjusted and fine-tuned. Steps 3-5 are then repeated until a model with the optimal hyperparameters is established. Finally, in Step 6, the results are analyzed, and the pairwise concordance rate between each pair of models is calculated to assess consistency among the five models.
To analyze the results of the machine learning model’s performance, a confusion matrix and Receiver Operating Characteristics (ROC) plot are created for each of the machine learning models. From the confusion matrix, each model’s true positive rate, false-negative rate, and overall accuracy are computed. The ROC (sensitivity versus specificity) plot is used to help visualize and compare the performance of each of the models. Once all the models are finalized, the overall accuracies of each model are compared to each other to find the best model for Alzheimer’s diagnosis. All the independent variables in this study are ranked by their importance in Alzheimer’s diagnosis (or in other words, how much the accuracy of the diagnosis would decrease if the variable was removed). The pairwise concordance rates between each pair of models are computed using the testing set data. The rate at which at least four of the five models in the study outputted the same result for the testing set data is also computed. Finally, the accuracy of the most accurate model is then compared with the accuracy of conventional clinical diagnosis for Alzheimer’s.
There are no safety or health issues for this research. This research was entirely carried out using RStudio IDE on a computer.
Questions and Answers
1. What was the major objective of your project and what was your plan to achieve it?
a. Was that goal the result of any specific situation, experience, or problem you encountered?
b. Were you trying to solve a problem, answer a question, or test a hypothesis?
The major objective of my project was to use available demographic/baseline characteristics and MRI imaging data to build various machine learning models including Logistic Regression, K-Nearest Neighbor, Support Vector Machine, Random Forest, and Neural Network, to help doctors clinically diagnose Alzheimer’s more swiftly and accurately. My plan to achieve my objective was to obtain a dataset containing demographic and clinical patient data consisting of 373 MRI imaging sessions as well as a conclusive result of whether each patient has Alzheimer’s, randomly split the data set into a training set (60% of data) and testing set (40% of data), train each model using the training set, evaluate the performance of each model with the testing set, fine-tune each model’s by adjusting its hyperparameters, and then analyze results using confusion matrix and ROC plots. A more detailed plan to achieve my objective can be found in my research plan.
My goal to help doctors diagnose Alzheimer’s more accurately through machine learning was a result of my own personal experience. Three years ago, one of my family friends’ father was diagnosed with Alzheimer’s disease. His disease progressed rapidly and many incidents occurred such as being unable to remember where he lives and returning back to his home. Oftentimes, when he turned on the stove, he completely forgot to turn it off, which was very dangerous. Finally, he was not able to even recognize his family members and close friends. He lost his ability to live independently and ultimately, he was sent to a nursing home in order to have someone attend to him 24/7. I can truly feel the pain and suffering that he and his family went through, which motivates me to help contribute to the accurate, early diagnosis of Alzheimer’s.
My research is trying to help solve the current problem of poor clinical diagnosis accuracies. As mentioned in my research plan, the average clinical diagnosis accuracy for all American doctors is only 78%, which can have a severe effect on the 1 in 5 patients who are misdiagnosed, and community doctors in rural areas are only about 55 percent accurate in clinically diagnosing Alzheimer’s, which is about the same as tossing a coin. Thus, my research’s main goal is to improve upon the average clinical diagnosis accuracy using the power of machine learning.
2. What were the major tasks you had to perform in order to complete your project?
a. For teams, describe what each member worked on.
The major tasks I had to perform in order to complete my project were to pre-process the data set and build, train, test, and fine-tune five different machine learning models. In pre-processing the data set, I normalized the data, modified the data types of each variable, removed individuals with missing values, and removed unrelated variables (like patient ID) to optimize the data for model training. Each model had its own unique training process, which is described in further detail in my research plan and presentation. I tested each model by evaluating the percentage of individuals in the testing data set in which the model’s prediction was accurate. Then, I fine-tuned each model by adjusting the model’s hyperparameters to optimize the model’s accuracy and repeating the training/testing steps. Once the optimal model was created, I analyzed the results by creating a confusion matrix and ROC plot, comparing each model’s accuracies to each other, computing concordance rates, and comparing the best model’s accuracy to the conventional accuracy for Alzheimer’s diagnosis. I developed all programming work in RStudio.
3. What is new or novel about your project?
a. Is there some aspect of your project's objective, or how you achieved it that you haven't done before?
b. Is your project's objective, or the way you implemented it, different from anything you have seen?
c. If you believe your work to be unique in some way, what research have you done to confirm that it is?
Here is some past research that has also used machine learning for Alzheimer’s diagnosis:
- Korolev and his team used 3D convolutional neural networks (VoxCNN) and residual neural networks (ResNet) to diagnose Alzheimer’s with an 80% accuracy. The entire MRI image was required for the model to make the prediction. (https://ieeexplore.ieee.org/abstract/document/7950647)
- Lu and his team used a deep learning model to predict Alzheimer’s with an 84.6% accuracy. The model requires MRI and FDG-PET images from the patient to make the prediction. (https://www.nature.com/articles/s41598-018-22871-z)
- Salvatore and her team created a support vector machine model to diagnose Alzheimer’s with a 76% accuracy. The model requires MRI images from the patient to make the prediction. (https://www.frontiersin.org/articles/10.3389/fnins.2015.00307/full)
Although projects using machine learning for Alzheimer’s diagnosis have been done before, my project is truly unique and novel because it involves the creation of five different types of machine learning models to see which has the highest accuracy and is best to use in a clinical setting. In my research, I utilized the brain MRI imaging data plus other key demographic and baseline characteristics data to build more sophisticated models with multiple fine-tuning which lead to a much-improved prediction. Past research, like the ones mentioned above, typically only involved one or two types of machine learning models. Furthermore, my models work more swiftly in practice than models used in past research because it only requires numerical and categorical input rather than an entire MRI or PET scan as input. Lastly, my project is unique because the accuracy rate that my best model (90.34% accuracy) was able to reach is higher than that of past projects related to Alzheimer’s diagnosis, as shown above.
4. What was the most challenging part of completing your project?
a. What problems did you encounter, and how did you overcome them?
b. What did you learn from overcoming these problems?
The most challenging part of my project was being able to find high-quality data set including brain MRI imaging data that I could use to train my models. Because of the general lack of free, publicly available data for Alzheimer’s diagnosis, it was very difficult to successfully find a free, publicly available data set that contained 1) a large enough sample size to build practical machine learning models, 2) MRI and cognitive assessment data for each individual, and 3) a verified result of whether each individual’s true status of Alzheimer’s.
I encountered various obstacles when pre-processing my data and building my models. For instance, if my data is pre-processed incorrectly, my models simply will not work well. When building my support vector machine and neural network models specifically, it took me about 10-15 different attempts, as well as tons of time viewing online blogs and forums to search for an answer, to prepare the data correctly for the model training. In this process of overcoming this obstacle, I learned how to better manipulate data for model training. This knowledge was especially useful when doing this research, as the time I spent manipulating data decreased each time I finished building a model. Another obstacle I encountered was my neural network failing to converge, even using the default hyperparameters in R. To overcome this obstacle, I had to use a lot of different values in the hyperparameters just to get the neural network to converge properly. Through overcoming this problem, I learned that machine learning models weren’t just “magical tools” that you could throw data into and create a working model. Rather, building and training machine learning models involves constant adjustment of hyperparameters and algorithmic thinking. Finally, another obstacle I encountered was overfitting the training data when building the models. To overcome it, I tried adjusting hyperparameters in order to prevent overfitting as much as possible. In overcoming this problem, I learned how to build better models that avoided overfitting or underfitting the data. Overall, overcoming obstacles in research teaches me significant lessons that are useful when I do future research and every problem I encounter provides an opportunity for me to expand my thinking and knowledge.
5. If you were going to do this project again, are there any things you would you do differently the next time?
If I did this project again, rather than focusing on trying to find a free, publicly available data set, I would try contacting researchers and institutions for their data set. By obtaining a data set from a researcher or institution directly, I might be able to get a larger sample size that I can use for model training and testing, as well as more independent variables and fewer missing values than the publicly available data sets. Furthermore, I would try to include better visualizations to demonstrate my research. For instance, for my Support Vector Machine model, I would try adding a visualization demonstrating the hyperplane separating the data points in the model. For the K-Nearest-Neighbor model, I would try adding a visualization actually demonstrating all the data points plotted in the multi-dimensional field. For my Random Forest model, a visualization of all the decision trees in the model would better visualize the model than my current visualizations.
6. Did working on this project give you any ideas for other projects?
Yes, working on this project really taught me first-hand how machine learning can make significant contributions to the medical field and improve people’s lives in ways that could never be done before. Through working on this project, I got the idea to do another project where I created a mobile app called FarmNet. FarmNet uses a convolutional neural network to predict what disease a plant has from a photo of the plant that the user takes with his or her smartphone.
7. How did COVID-19 affect the completion of your project?
Since my project was done entirely on a computer, COVID-19 did not affect the completion of my project.