Projects

DATA 602 Final Project: Time Series Analysis & Machine Learning

For the final project in my Business Analytics course, I worked with two data sets: 1) a weather data set containing weather information for the NY Botanical Gardens and 2) a credit risk data set. More details on this project can be found below and in the presentation below.

For the time series analysis portion of my project, I visualized trends in air temperature, humidity, and solar radiation, including weekly and daily trends, relationships with time of day and daylight, and forecasting.

For the machine learning portion of my project, I used linear regression to predict loan amount and I used various classification techniques to predict loan default status (whether the customer defaulted on their loan or not). These techniques included Logistic Regression, Naïve Bayes, Decision Trees, Support Vector Machines, K-Means, K-Nearest Neighbors, and Hierarchical Clustering. I also utilized Pipelines, Principal Component Analysis (PCA), and Feature Selection to improve my models. My best model used the SVM algorithm and had an accuracy score of 91%.

Powered By EmbedPress

An Evaluation of Recent Large Disasters in the U.S. and FEMA’s Support of Those Affected

For the final project in my Python Programming course, I used an open FEMA data set that contains information for Individual Assistance Housing Registrants of large disasters in the United States. I used Python in Jupyter Notebook to clean the data set, which contained over 6 million rows of data. I also used statistical and explanatory techniques to extract insights regarding the populations affected, disaster severity, and FEMA’s financial support of those affected by large disasters. More details on this project can be found in the presentation below.

Powered By EmbedPress

Predicting Ramus Height in Kangaroo Skulls Using Regression

For the final project in my Applied Regression Analysis course, I used a data set that contained skull measurements for different species of kangaroos. My goal was to predict the ramus height of the skull. I created six models and the best model achieved an adjusted R-squared of 93.6%. More details on this project can be found in the presentation below.

Powered By EmbedPress

Using Classification to Predict Kangaroo Characteristics

For the final project in my Statistical Machine Learning course, I used a data set that contained skull measurements for different species of kangaroos. My goal was to predict two things: kangaroo sex and kangaroo species based on various skull measurements. I used classification for this project, which involved the use of logistic regression, LDA, QDA, and KNN. The best testing error rate for sex was 20.6% and the best testing error rate for species was 14.7%. More details on this project can be found in the presentation below.

Powered By EmbedPress

css.php