Analysis of semantics and early linguistic symptoms to develop machine learning predictive modeling of Alzheimer's Disease

Table: COMP8
Experimentation location: Home
Regulated Research (Form 1c): No
Project continuation (Form 7): No

Display board image not available




  • Zemla, J. C., & Austerweil, J. L. (2019, May 2). Analyzing knowledge retrieval impairments associated with alzheimer's disease using network analyses. Complexity. Retrieved February 23, 2022, from 
  • Gerkin, R. C., Ohla, K., Veldhuizen, M. G., Joseph, P. V., Kelly, C. E., Bakke, A. J., Steele, K. E., Farruggia, M. C., Pellegrino, R., Pepino, M. Y., Bouysset, C., Soler, G. M., Pereda-Loth, V., Dibattista, M., Cooper, K. W., Croijmans, I., Pizio, A. D., Ozdener, M. H., Fjaeldstad, A. W., … Parma, V. (2020, January 1). The best covid-19 predictor is recent smell loss: A cross-sectional study. medRxiv. Retrieved February 23, 2022, from 


Additional Project Information

Project website: -- No project website --
Presentation files:
Additional Resources: -- No resources provided --

Research Plan:

I prepared to do the project by contacting research facilities and universities that had held trials and collected data on cognitive impairment in comparison with normal cognitive function. I decided to use semantics and linguistic variables as a metric because it is an inexpensive, noninvasive, easy-to-use metric that requires minimum exposure to COVID-19, as this project was developed during the COVID-19 pandemic. I finally decided to use fluency datasets from the University of  California San Diego. 

Over 6 million Americans are living with Alzheimer’s disease, and dementia deaths have seen an increase of 16% during the COVID-19 pandemic. Dementia was predicted to cost Americans $355 billion by the end of 2021. With an increasing percentage of cases and rising healthcare costs, the need for accessible prediction of dementia utilizing the analysis of properties of digital biomarkers is necessary. The majority of individuals who experience the symptoms of Alzheimer’s are 65 years or older. Starting from the age of 65 years, the risk of succumbing to Alzheimer's significantly increases - doubling every 5 years. There is an evident overlap between the high-risk age groups of COVID-19 and Alzheimer’s Disease. Thus, the demand for digital prediction tools has unquestionably increased. The purpose of conducting this research is to employ digital and physically contact-free technology to construct an artificial intelligence model to identify early symptoms of Alzheimer’s Disease. Data was collected from fluency data from UCSD. A machine learning Random Forest Model was the most effective at predicting the advancement of Alzheimer’s Disease in patients who are able to function cognitively normally with an accuracy percentage of 93%.  Further research into this area of building digital Alzheimer’s Disease prediction tools will help aid the early diagnosis of dementia, lower the burden on medical professionals, and assist in meeting the rising healthcare needs of countries with underdeveloped healthcare systems.


Questions and Answers

1. What was the major objective of your project and what was your plan to achieve it? 

       a. Was that goal the result of any specific situation, experience, or problem you encountered?  

       b. Were you trying to solve a problem, answer a question, or test a hypothesis?

The major objective of my project was to create a predictive modeling algorithm that has the ability to predict Alzheimer’s Disease. I had initially planned on using a different metric to measure the progression of Alzheimer’s Disease that required extensive brain imaging and was quite expensive. I was worried that using brain imaging as a metric would not address the main problem I wanted to focus on, which was the lack of access that was prevalent in developing countries, and the growth of this problem during the COVID-19 pandemic. As a result, I decided to use a metric that did not require extensive imaging, costly visits, or exposure to the coronavirus. Thus, I reached out to research facilities and universities that had data regarding semantic and linguistic variables and cognitive disorders, and the correlation between them. 

2. What were the major tasks you had to perform in order to complete your project?

       a. For teams, describe what each member worked on.

The major tasks I had to perform in order to complete my project was to find a database that did not have restricted access due to the COVID-19 Pandemic. Then, I developed a novel approach to using metrics such as semantics and linguistic variables that encouraged potential real world reimplementation. I did not work with a team.

3. What is new or novel about your project?

       a. Is there some aspect of your project's objective, or how you achieved it that you haven't done before?

       b. Is your project's objective, or the way you implemented it, different from anything you have seen?

       c. If you believe your work to be unique in some way, what research have you done to confirm that it is?

The type of metrics that I have used, semantics and early linguistic variables, along with how these metrics were analyzed proved a novel approach to typical machine learning modeling. Additionally, these metrics are simple to collect and the algorithm can easily be reimplemented in a real world setting, allowing for greater positive impact on our community. 

4. What was the most challenging part of completing your project?

      a. What problems did you encounter, and how did you overcome them?

      b. What did you learn from overcoming these problems?

The most challenging part of completing my project was trying to find suitable datasets for my research project. Due to COVID-19, many institutions and research facilities were closed to high school researchers, and limited what kind data I had access to. However, this challenge provided insight into the issue that had grown in prevalence due to the COVID-19 pandemic, as I had witnessed how difficult it had become to access data to research more about cognitive disorders and their progression, and had sought to utilize a unique metric to overcome this challenge. 

5. If you were going to do this project again, are there any things you would you do differently the next time?

If I was given a chance to do this project again, I would try to build an algorithm on another noninvasive, inexpensive, easy-to-collect data type similar to semantics and linguistic variables that would encourage populations to take the first step in seeking medical care. I would expand the usage of this algorithm to other cognitive disorders and bigger datasets to further analyze trends in population in regards to cognitive impairment. 

6. Did working on this project give you any ideas for other projects? 

Yes, working on this project has inspired me to develop plans for other projects. In the future, I hope to be able to implement an algorithm that uses larger datasets to be able to predict cognitive disorder trends on a regional, state, and ultimately national level. I plan on doing this by first starting locally and contacting local institutions for data that represents populations on a regional level.

7. How did COVID-19 affect the completion of your project?

COVID-19 affected the completion of my project because as a high school student who was initially interested in completing research about cognitive disorders and detection methods in a lab, virtually all research opportunities in institutions were closed. This encouraged me to seek metrics and analysis and research methods that could be done virtually to combat issues regarding the lack of access that had become so prevalent during the COVID-19 pandemic.