Mercer Science and Engineering Fair | Fostering STEM Education in Mercer County, NJAffiliated with Regeneron ISEF and Thermo Fisher Junior Innovators Challenge

Eject, crash, or survive: Using machine learning to predict orbital instability of exoplanetary systems

Fair: 2021 Mercer Science and Engineering Fair

Event: Senior Division 2021

Category: Mathematics, Physics and Astronomy

Student: N A

Table: MATH1700

Experimentation location: Reseach Institution, Home

Regulated Research (Form 1c): No

Project continuation (Form 7): No

Abstract:

Astronomers throughout history, including titans like Kepler and Newton, have tackled planetary dynamics and orbital instability. Despite the strides taken in research, understanding the evolution of and interplay between planetary orbits remains an intricate, computationally expensive, and analytically unsolved problem. I apply machine learning classification methods to numerical simulations of planetary systems in order to predict the long-term fate of the planet---whether the planet remains in a stable orbit or not. My method uses the first 41.1 years (≤500 orbits) of data from a planet’s simulation to calculate 17 dynamically-motivated metrics; I trained my classifier on these features to predict a planet's stability after 10^7 years. At 84.33%, my classifier was comparable in accuracy to pre-existing literature, despite using significantly less computational power than most other methods. In my research, I found that the standard deviation of eccentricity, mass ratios for neighboring planets, and semi-major axis ratio with the outer planet neighbor to be the most predictive features of instability. I propose reasons for the importance of these features, their role in planetary dynamics, as well as a possible explanation for why some planets were misclassified. By understanding the important metrics of instability and reasons for misclassification, we can begin to understand more about system architectures, orbital motion and dynamics, and the formation and evolution of the exoplanetary systems. Not only is this applicable in our own Solar System, but with exoplanet discovery missions such as TESS, this research becomes especially relevant in understanding the new exoplanetary systems we discover.

Bibliography/Citations:

Chambers J. E., Wetherill G., Boss A. P., 1996, Icarus, 119, 261

Cranmer M., Tamayo D., Rein H., Battaglia P., Hadden S., Armitage P. J., Ho S., Spergel D. N., 2021, arXiv:2101.04117 [astro-ph, stat]

Gladman B., 1993, Icarus, 106, 247

Lam C., Kipping D., 2018, Monthly Notices of the Royal Astronomical Society, 476, 5692

Mordasini C., Alibert Y., Benz W., 2009a, Astronomy and Astrophysics, 501, 1139

Mordasini C., Alibert Y., Benz W., Naef D., 2009b, Astronomy & Astrophysics, 501, 1161

Obertas A., Van Laerhoven C., Tamayo D., 2017, Icarus, 293, 52

Pedregosa F., et al., 2011, Journal of Machine Learning Research, 12, 2825

Rein H., Tamayo D., 2015, , 452, 376

Smullen R. A., Volk K., 2020, Monthly Notices of the Royal Astronomical Society, 497, 1391

Smullen R. A., Kratter K. M., Shannon A., 2016, Monthly Notices of the Royal Astronomical Society, 461, 1288

Tamayo D., et al., 2016, The Astrophysical Journal, 832, L22

Tamayo D., et al., 2020, Proceedings of the National Academy of Sciences, 117, 18194

Additional Project Information

Project website: -- No project website --

Presentation files:

doc.pdf

Research paper:

doc_1.pdf

Additional Resources: -- No resources provided --

Project files:

Project files

Presentation files

doc.pdf

Research paper

doc_1.pdf

Research Plan:

Rationale:

For hundreds of years, scientists have explored the problem of orbital instability, but there remain many questions about how planetary systems organize themselves, the evolution of their orbits, and the laws of planetary motion. Assessing the stability of such systems is usually done through computationally exhaustive N-body integrations. There have been efforts to apply machine learning to astrophysics to better understand these chaotic systems in a faster, reliable, and less intensive way. Despite undertakings in this realm of research, there is more to be explored about how computers can study orbital dynamics. New exoplanets are discovered every day with projects such as Kepler and TESS (Transiting Exoplanet Survey Satellite). Further study of this research could help us better understand the dynamics of planetary systems- not only in the context of these new systems we discover, but in studying the implications of these dynamics in our own solar system. This research can be applied to help us understand more about planet compositions, configurations, and system architectures more precisely and on a much larger scale than ever before.

Research questions:

How early can planetary instability be detected? With which time slice is the classifier most accurate?
What impact does time averaging have on the accuracy of the classifier?
Which dynamical features are most indicative of instability?
What hyperparameters are most helpful in improving the classifier’s accuracy?
How does removing certain features the classifier is trained on impact accuracy?
Is there correlation between the dynamical features and the planet’s eventual fate?

Engineering goal:

A machine learning classifier that can predict the orbital instability of exoplanets using initial dynamical features derived from from N-body integrations

Procedure:

For initial training data for the classifier, select an exoplanetary population for N-body integration data.
Track dynamical features and derived values for several time slices. Since it is not known when the classifier will be able to recognize instability, or with what time slice it will be most accurate, tracking several time slices allows for experimentation with various classifiers.
Train and refine gradient boosting random forest classifier using scikit-learn library in Python.
1. Use cross validation techniques to determine accuracy of the model.
2. Test impact of time averaging on classifier accuracy to establish the best time slice for accuracy.
3. Refine hyperparameters to optimize methods for the classifier.
4. Test importance of features on overall classifier accuracy to determine which features have the most predictive power.
Based on results from the prior step, as needed:
1. Using REBOUND, generate more N-body integrations for testing and cross validation.
2. Test classifier with N-body integration data for exoplanetary populations initialized from different initial conditions to determine universality of features indicating instability in original exoplanetary population.
Explore further refinement of classifier to recognize different origins of instability and planet fate. Work towards building a predictive model of orbital instability in exoplanetary systems.

Risk and Safety:

There are no safety concerns for this project.

Questions and Answers

1. What was the major objective of your project and what was your plan to achieve it?

The major objective of my project was to develop a less computationally expensive model that
can predict the long-term stability of planets in exoplanetary systems. To achieve it, I used early
dynamical features from N-body simulations of diverse planet populations. I used this data to
train, test, and refine a gradient boosting random forest classifier.

a. Was that goal the result of any specific situation, experience, or problem you
encountered?

Due to the nature of orbital dynamics problems, there exists a sort of computational bottleneck; it
is a brutally complicated problem that scientists have worked on for centuries. In the past, such
work would have taken thousands of hours, but current literature has made notable
breakthroughs in accelerating the process, and I believe there is room for research to further
optimize it.

b. Were you trying to solve a problem, answer a question, or test a hypothesis?

Since most models still use a significant amount of data, I wanted to see if I could develop a
high-accuracy classifier with less computational effort. Beyond just the development of this
classifier, this research could tell us more about the signs of orbital instability, system
architectures, and dynamics of planetary systems.

2. What were the major tasks you had to perform in order to complete your project?

To complete this research, I first had to process all of the data. With N-body simulations from
100 different systems, with 10 planets each, integrated to 10 million years, there was nearly 75
Gb of raw data. After I did so, I used current literature and my own undertakings with the data to
determine which dynamical features may be indicative of a planet’s long-term fate. After doing
so, I trained several classifiers on derivations of these features for various timesteps, to gain
insight as to how different time integrations affect accuracy. With this, I was able to make an
informed decision for which timestep to use in my final classifier, to maximize efficiency and
minimize computational power. I then refined the hyperparameters and optimized training
features for my classifier, keeping in mind specific metrics that are helpful in evaluating a
machine learning method (e.g. class probabilities). With the classifier trained and tested, I
analyzed the results for causes for misclassifications, feature importance and correlations,
distributions of probability, and signs of orbital instability.

a. For teams, describe what each member worked on.

N/A

3. What is new or novel about your project?

a. Is there some aspect of your project's objective, or how you achieved it that you haven't
done before?

This entire research project was new to me. I had not previously worked with exoplanets, or
orbital dynamics, or N-body simulations. I just began looking into these things before settling on
this project; the objective and how I achieved it was something I was learning along the way.

b. Is your project's objective, or the way you implemented it, different from anything you
have seen?

There are a few things about my research that are different from previous literature I am familiar
with. For one, the planet populations I used in my research are far more uncommon. Most similar
work uses systems with three or five planets, whereas my systems have ten planets. Additionally,
the dynamical feature distributions (inclination, eccentricity, etc.) of my systems are relatively
more diverse. Furthermore, my classifier uses short-term integration data from the first 41.1
years (≤500 orbits) of the systems, which is significantly less data than most other models using,
despite achieving similar overall accuracies.

c. If you believe your work to be unique in some way, what research have you done to
confirm that it is?

I have thoroughly explored the academic literature on the subject. Additionally, my research
mentor (who works in the field) helped me find resources to contextualize my own work with the
broader research field.

4. What was the most challenging part of completing your project?

The most challenging part of my project was some of the conceptual aspects. A conceptual
framework was difficult for several reasons. For one, research in orbital dynamics of exoplanets,
especially with machine learning, is quite a new, and niche, field. Many of the existing literature
is from a fairly small subset of researchers who have worked on such problems, so it was not
very clear what I should expect from my research and how it might be similar or different from
other models. Furthermore, while my research suggested ideas that were supported by other
literature, it wasn’t quite what the previous researchers had found in developing their own
models. And since the nature of orbital dynamics problems are so complicated, there are many
things one could examine in the research and analysis processes; I ended up looking at things
that are fairly unexplored in the machine learning for orbital dynamics niche, so understanding
the underlying physics in my work, and drawing conclusions from these, was quite new and
something I had to navigate. I had to be really considerate in my work, and not overlook the
small things that could impact the physics behind the systems and my understanding/analysis of
them.

a. What problems did you encounter, and how did you overcome them?

In addition to the above, there were some problems with the data and programming. Because of
the large amount of data I needed to process, my personal computer was less than ideal.
However, due to Los Alamos National Laboratory’s policies, I was unable to get access to their
supercomputers. This also meant I couldn’t generate my own simulations (which is so
computationally expensive, it would be impossible on my computer), which is part of the reason
why I opted to use my mentor’s Mordasini population. Another problem I faced was during the
development of my classifier; existing literature has quite different machine learning practices,
and reconciling the metrics of a ‘good’ classifier (e.g. low learning rate, high probability for
correct classifications, low probability for misclassified planets) with accuracy was an act of
balancing the two, a matter of further research in machine learning and numerous rounds of
refinement.

b. What did you learn from overcoming these problems?

Throughout my research, I learned a lot about the physics behind my work and the field of
exoplanets and orbital dynamics, in addition to data analytics and machine learning. Because
these were all fairly new to me, there was a bit of a learning curve at first. But overcoming the
problems I faced throughout the process is what ultimately taught me the most about these. Of
course, I had support from my mentor, but I often worked to figure out things myself- what went
wrong, how to fix it, what to look at next, etc. It taught me a lot about the experimental, curious,
and resilient process that is research itself, and improved my critical thinking and problem
solving skills. It has made me much more thoughtful and meticulous in the way I approach and
solve problems, and has given me a greater sense of research intuition.

5. If you were going to do this project again, are there any things you would you do differently
the next time?

While I did use a fairly diverse planetary population, training and testing my classifier on
populations with different numbers of planets could have been a more comprehensive approach,
and more representative of the diversity of true exoplanetary systems. Furthermore, I would have
also used additional metrics to evaluate my classifier’s accuracy, such as accuracy, precision, and
recall rates. These are often useful metrics for machine learning methods, and using them would
have made it easier to contextualize the performance of my own classifier relative to existing
methods.

6. Did working on this project give you any ideas for other projects?

The progression of this research project has naturally led to more inquiries that offer the potential
for other work. Notably, I could apply similar concepts to develop a classifier for circumbinary
systems, to better understand their dynamics and drivers of instability. This is an emerging field
of research, and is fairly unexplored so far; it offers a lot of potential to learn about a wider range
of exoplanetary systems and system architectures, as well as how planetary motion and orbital
dynamics vary between single and binary star systems.

7. How did COVID-19 affect the completion of your project?

Due to COVID-19, my entire research experience has been virtual. While this did enable me to
work with my mentor (who is based out of Los Alamos National Laboratory in New Mexico), it
did make completing the project a little more challenging, and limit my research to something
computational so it could be done virtually.