Mercer Science and Engineering Fair | Fostering STEM Education in Mercer County, NJAffiliated with Regeneron ISEF and Thermo Fisher Junior Innovators Challenge

Designing and Evaluating the Use of Machine Learning Models and Nearest Neighbor Algorithms to Identify Colors for People Who Have Difficulty Identifying Them.

Fair: 2021 Mercer Science and Engineering Fair

Event: Senior Division 2021

Category: Software and Embedded Systems

Student: Lucas Zapata-Sanin

Table: COMP1905

Experimentation location: Home

Regulated Research (Form 1c): No

Project continuation (Form 7): Yes

Abstract:

According to www.color-blindness.com, roughly 8% of all men and .5% of all women in the world are color blind. The purpose of my project is to design and evaluate different algorithms for identifying colors for people who have difficulty identifying them. In the past, I have attempted to identify colors using a scratch program that used conditional statements that had ranges of RGB values to determine a color name. This model was not effective in some cases, which led me to think that other approaches could perform better.

I explored two different approaches to achieve this goal: Machine learning classifiers and the Nearest Neighbors algorithm. My hypothesis is that Machine Learning Classifiers will provide a more effective solution compared to the Nearest Neighbors algorithm. For the first approach, I used a machine learning classifier (neural networks) with different datasets: I started using an existing dataset with 5000 instances of 11 basic colors, which tested well using a random sample of 80% of the dataset as training data and 20% as testing data (accuracy >0.9). However, it did not perform well when tested with colors that are difficult for people with different colorblindness conditions to identify. Hence, I decided to create a new dataset that included more colors (22 colors), including those that are difficult for colorblind individuals to identify. Initially, this dataset had approximately 10 instances of each color which didn’t perform well when tested (accuracy <0.5: 80% training and 20% testing datasets). I found out that this was due to the size of the dataset, so I increased the scale of the dataset from 10 instances per color to approximately 60 instances per color (1309 instances), which produced better results (accuracy >0.9). After this, I decided to create a separate testing dataset with colors that are difficult for people with different colorblindness conditions to identify and colors that are frequently used in color blindness diagnostic tests (141 instances). Using this new testing dataset, the classification accuracy decreased.

I also implemented a Nearest Neighbors algorithm, which used the distance between two points in 3d as the distance measure. Coordinates for each point were based on the RGB values of each color. I tried this approach with two different datasets as centers and tested them with the last testing dataset (141 instances): the first dataset had the original 22 colors as centers and the second dataset was the training dataset used earlier with 1309 instances. The Nearest Neighbor algorithm outperformed the machine learning model regarding accuracy.

In this project, elaborate on the strengths and weaknesses of these approaches in terms of the complexity of the algorithm, the amount of data it requires, and execution speed. I will also elaborate on future work in this field.

Bibliography/Citations:

Format: Author (if applicable). Title. URL

Chavan, A. (2020). Building RGB Color Classifier. https://medium.com/analytics-vidhya/building-rgb-color-classifier-part-1-af58e3bcfef7
RGB Color Table. https://www.rapidtables.com/web/color/RGB_Color.html
Nearest Neighbor Search. https://en.wikipedia.org/wiki/Nearest_neighbor_search
Jupyter. http://jupyterlab.io/
TensorFlow Library. https://www.tensorflow.org/
Pandas Library. https://pandas.pydata.org/
Anaconda. https://www.anaconda.com/
Python multidict example – Map single key to multiple values in dictionary. https://howtodoinjava.com/python/datatypes/multidict-key-to-multiple-values/
Fincher, J. Reading and Writing CSV Files in Python. https://realpython.com/python-csv/
Python Casting. https://www.w3schools.com/python/python_casting.asp
Color blindness. https://en.wikipedia.org/wiki/Color_blindness
NumPy Library. https://numpy.org/
Matplotlib Library. https://matplotlib.org/
Git Library. https://git-scm.com/
Seaborn Library. https://seaborn.pydata.org/
Sklearn Library. https://scikit-learn.org/stable/index.html
Plotly Library. https://plotly.com/
CSV Library. https://docs.python.org/3/library/csv.html
Math Library. https://docs.python.org/3/library/math.html
Flück, D. Color Blind Essentials. https://www.color-blindness.com/wp-content/documents/Color-Blind-Essentials.pdf

Additional Project Information

Project website: -- No project website --

Presentation files:

Science Fair 2021.pptx

Research paper:

Project Journal 2021.pdf

Additional Resources: -- No resources provided --

Project files:

Project files

Presentation files

Science Fair 2021.pptx

Research paper

Project Journal 2021.pdf

Research Plan:

Do research on color blindness and machine learning approaches
Implement and evaluate Machine Learning Classifier

Research current machine learning classifiers that can be adjusted for the project
Create the datasets
Make changes to python program to make use of new datasets.
Observe the performance of the classifier during the training and testing periods

Implement and evaluate Nearest Neighbor algorithm with various datasets

Research python dictionaries
Convert CSV of datasets to Dictionary in python program
Implement distance formula and calculate the distances between test colors and dictionary centers
Document performance of the Nearest Neighbor algorithm with different datasets

Compare Machine Learning Classifier and Nearest Neighbor algorithm results
Analyze results and write conclusions

Questions and Answers

1. What was the major objective of your project and what was your plan to achieve it?

The major objective of my project was to design and evaluate different algorithms for identifying colors, especially those that are difficult for color-blind individuals to identify. My plan included doing research on color blindness and machine learning approaches, implementing and evaluating a Machine Learning Classifier, implementing and evaluating the Nearest Neighbor algorithm with various datasets, comparing the results of the Machine Learning Classifier and the Nearest Neighbor algorithm, and finally analyzing results and writing conclusions.

a. Was that goal the result of any specific situation, experience, or problem you encountered?

For the last three years, I have been doing research to help colorblind individuals identify colors. For my first project in this field, I created a scratch program that recognized colors for colorblind individuals and tested it with individuals who used colored glasses to simulate colorblindness, and colorblind individuals. The results were encouraging, but the program I used could be improved. I have been thinking about future work in this area since then, and for this project, I decided to explore machine learning and the nearest neighbor algorithm to identify colors.

b. Were you trying to solve a problem, answer a question, or test a hypothesis?

I was trying to solve the problem of identifying colors for colorblind individuals by comparing two algorithms, and my hypothesis was that machine learning algorithms were going to perform better than the nearest neighbor algorithm. However, at this point the nearest neighbor algorithm seems to provide better results. In the future, I think with a better dataset, the machine learning algorithm could improve and perhaps outperform the nearest neighbor algorithm. However, this is not clear because with more centers, the nearest neighbor algorithm could also improve.

2. What were the major tasks you had to perform in order to complete your project?

The most important tasks of my project were creating and revising datasets, creating models and algorithms, documenting work, and testing models and algorithms.

3. What is new or novel about your project?

I think that there are new aspects of my project to society. When I made my first color detection program for colorblind individuals in 2019, I used conditions based on ranges of RGB values. When coming back to this project in 2020, I was surprised to see documentation on several people that use different methods to identify colors, including machine learning classifiers, however the majority of them use different color labels that are not catered to the needs of colorblind individuals. On the other hand, my models are focused to detect pairs of colors that tend to be difficult for color blind individuals to identify, such as lime and red, royal blue and silver, green and gold, olive and magenta, and cyan and dark gold. Also, I used the Nearest Neighbor algorithm, which resulted in improvements in performance.

a. Is there some aspect of your project's objective, or how you achieved it that you haven't done before?

Before doing this project, I didn’t have any experience or knowledge of machine learning classifiers and algorithms such as the nearest neighbor search. I also was not familiar with training and testing datasets for machine learning. I have done some programming in python, but I have never done it in Data Science. I also learned how to evaluate the results of my models, using a confusion matrix, and matrix measures like precision, recall, and f-1 score.

b. Is your project's objective, or the way you implemented it, different from anything you have seen?

Yes, my project from other projects because my program is focused on detecting pairs of colors that tend to be difficult for color blind individuals to identify. Also, I am comparing two different approaches to solve the problem (machine learning classifier and nearest neighbor search algorithm).

c. If you believe your work to be unique in some way, what research have you done to confirm that it is?

I have searched the web for existing machine learning solutions. Also, I have read and looked at documentation regarding colorblindness and existing applications. There are a few machine learning programs that work with generic colors. For example, Chavan (2020) and others. However, these authors do not focus specifically on colorblind conditions. Also, I’m not aware of other people using the nearest neighbor algorithm to identify colors by mapping the RGB values to the distance of points in 3d space.

4. What was the most challenging part of completing your project?

The most challenging part of completing my project was learning about machine learning and getting familiar with programs like Anaconda, the python language, and libraries such as TensorFlow, Pandas, and NumPy. Also, creating datasets that were appropriate for the purpose of my project.

a. What problems did you encounter, and how did you overcome them?

When working on datasets, I lost data from forgetting to save them multiple times, causing me to remake multiple datasets. I overcame this by spending more time working on datasets and being patient with mistakes and saving more frequently. When running programs, I encountered errors from improperly importing libraries and not having them installed on my anaconda environment. Most of these issues were resolved by troubleshooting, looking up errors online and researching how to install the libraries. I also had errors with parameters in python, and they mostly resulted from dataset dimensions. I resolved these issues by being patient, finding the errors, and debugging them. Debugging was accomplished by thoroughly reading through programs, looking up errors and learning from forums where programmers had similar problems. When updating datasets and adding them to the program, I encountered situations in which the program detected extra values that were not intended to be included. These extra values turned out to be ghost values in my CSV dataset, and I solved this by deleting empty cells in my dataset after every update.

b. What did you learn from overcoming these problems?

From these problems I had to learn to be more patient, careful, and systematic. When browsing forums with similar problems, I had to be patient with troubleshooting multiple suggestions and reading through my program and documentation to solve problems. When making datasets, I learned to categorize RGB values in a systematic and organized matter. This allowed me to make datasets much faster and avoid many errors. When leaving datasets or taking breaks to work later, I forgot to save my work on multiple occasions. This made me very cautious and careful when working on my project, remembering that every time that this occurred, I had to repeat a long task. Also, I learned to understand procedures through reading documentation. Documentation made it easier to adjust programs and install libraries onto my anaconda environment.

5. If you were going to do this project again, are there any things you would you do differently the next time?

I would try different types of datasets because I think there is still room for improvement. I would also spend more time learning about the functions of the libraries that I used and other libraries that are available. I would also like to learn more about different types of classifiers and algorithms.

6. Did working on this project give you any ideas for other projects?

While researching portions of this project, I learned about other ways to apply machine learning to provide solutions to different problems. For example, it is possible to work on identifying objects and using sound to say the name of colors for people that have none or reduced vision. Also, I want to continue learning about machine learning to improve the models that I created. Some of these improvements include improving the diversity of the dataset and the size of the dataset by generating additional datapoints from the Nearest Neighbor algorithm. Crowdsourcing can also be used to generate more datapoints. Finally, in the future I would implement an application that makes use of my models/algorithms to identify colors from images from a camera on a phone, or pixels on a computer screen.

7. How did COVID-19 affect the completion of your project?

Initially I wanted to test the results of my project with color blind individuals. In an earlier version of my project (2019) I was able to test a different approach with colorblind individuals. I was planning to do similar testing this time, in addition to implementing and evaluating new algorithms. However, due to COVID-19, this portion of the project wasn’t included. Also, COVID-19 made it difficult for me to find time to work on the project, since my classes are online and I needed to meet with classmates online to complete work and meet with other after school clubs online.