Projects
School Projects
Predicting Baseball Game Outcomes
This paper attempts to build a regression model to predict the winner of baseball games for the 2018 MLB season. The regression model is built from two disjoint datasets: baseball statistics from baseball-reference.com and weather data from the Global Historical Climatology Network. The text presents initial results from an exploration of the data combined to create the full dataset. Then the regression model is created and analyzed using recursive models that are trained on the previous games before predicting the games for each day of the season. The model had a predictive power of 55.77%, which is more predictive than coin flips. However, the model did not have more predictive power than just simply choosing the team with the highest win percentange or Pythagorean score.
Modeling the Effectiveness of British Columbia Fire Control
Forest fires are destructive events that can decimate towns, destroy roads, and can be very costly. Understanding and developing best practices to minimize the impact of forest fires on man-made communities and infrastructure is of paramount importance in threat of increasing forest fire potential driven by climate change[1]. Using the British Columbia Wildland Fire Management Strategy (WFMS)[2] as a guideline, we created a model of best practice fire risk reduction techniques. The model used multiple strategies and combinations of budget constraints to analyze the most effective combination of strategies to handle three different wind scenarios.
Detecting Shocking Events on Twitter
I decided to look into the “shocking’’ events on Twitter to analyze what types of events are talked about more than usual for a period of time. Analysis of this phenomenon can give insights into the cycle of information on social media platforms, and the collective “memory’’ regarding certain events. My aims were to characterize the most shocking words of 2018, and then extract out the stories surrounding the sudden rise of a certain word. The data comes from the daily rank plots for the top 100000 words curated by Josh Minot from the Twitter Decahose for analysis.