Recent Projects
US Energy Production by State
We were taksed with building an effective data visualizetion that tells a story, allowing a reader to explore trends or patterns. I based the project on D3.js, and used lean HTML and CSS to do all the formatting. I decided to use a dataset on energy production by state at the US Energy Information Administration site. After spending some time with the data, it became clear that looking at only energy production for each state didn't give a lot of perspective. One issue is larger states, like California completely overshadowed smaller states because the imbalance between their energy production is so great. It was clear there needed to be some ratio to make the comparisons between states make sense. I found a dataset showing US population by state for the same time period, 1990 - 2014, on the US Census website. I also found a dataset on the US Dept. of Commerce Bureau of Economic Analysis website that showed GDP by state for the same time period. I decided to join both of these datasets by state and year to the energy dataset to compare energy production by state per capita, or energy by state per dollar GDP. The map data was in GeoJSON format, and was drawn using D3 and SVG. I used a timer, and some user controls to provide animation and interactivity with the data. The source code is posted here.
Predicting NYC Subway Ridership
The purpose of this project was to analyze the features a dataset, containig timestamps from turnstyles of the NYC Subway system, which was combined with data from Weather Underground from the same time period to see if there is a significant difference in subway ridership on rainy and non-rainy days. The readings are stored in the file in four hour bins, with totals on entry and exit counts per bin, per Unit ID, which is effectively the same as a station ID. The dataset includes several other features related to the station and weather conditions including fog, and wind conditions, and latitude and longitude of the station and weather reading. Another goal is to explore the data to find and show other interesting features that show a correlation with subway ridership, and report some other interesting findings.
Predicting the Compressive Strength of Concrete
This report is an exportatory data analysis on a dataset provided by the University of California at Irvine’s Machine Learning department. The dataset contains 1030 observations including 9 quantitative variables. The UCI ML Cement dataset records compressive strengh of cement in MPa (Megapascals) given 8 other input variables, which are the amonts of the components in Kg, and it’s age in days. The readme for the dataset can be downloaded here. This study aims to define what a relatively strong concrete is in terms of compressive strength, then to see if we can predict the compressive strength of concrete based on the relative proportion of it’s ingredients and it’s age. I selected this dataset not only becasue it met the requirements for the Udacity Exporatory Data Analysis course final project, but in a former job, I advised on computer datalogger setup for a chemical engineer who performed a similar experiment. This particular experiment was to test the compressive strength of concrete with varying quantities of synthetic and organic fiber additives. I did a little bit of research just now and found the results of that specific study here. I worked for a few months at Buckeye Cellulose during the product development cycle of what became known as the UltraFiber 500 concrete reinforcement product. This assigment was one of the specific moments in my IT career that led me to pursue a path more related to science and data analytics.
Identification of Flower Species Using PyTorch
The goal of this project was to build an image classifier, capable of predicting the species of a flower, given an image of a flower as input. For input data I used a dataset of images of 102 different labeled flower species from University of Oxford, which is available here. I transformed and normalized the image data using the Torchvision libraries that come bundled with the PyTorch framework, and built a custom image classifier which is based on a pre-built image classifier provided by the Torchvision project. A custom classifier output layer was added then trained it on the flower species in this particular dataset using supervised learning techniques on the Tensorflow ML platform using an NVidia GPU. Finally the model was used on a test dataset to predict the species of flowers in the test data with an accuracy of approximately 80%.