A small collection of work and project examples to peruse.
A small collection of work and project examples to peruse.
Expanding on the file API templates, this is a fully fleshed out file processing pipeline for AWS. It still utilizes a containerized Python Flask application with Postgres integration for record management, but is expanded to use AWS's SQS queues to run Lambda processing jobs on the files that are sent through. In addition, there is a full Terraform deployment setup utilizing AWS's secret manager. I've also included locally runnable bash scripts to automate the process of running the Terraform for deploying to AWS, which could easily be converted to github actions for a CI/CD pipeline.
Languages:
Python
Utilizes:
Docker
Postgres
Redis / Celery
AWS S3, SQS and Lambda
Terraform
Every business has to work with files, moving them around and deriving insight from them. They're also a good foundation for doing more interesting work with NLP and text analytics. I've done two different general containerized file management API templates implemented in Python, one using Django and the other using Flask. They both use either/or AWS and Azure for file storage based on the request, and utilize Postgres record management. To make the record management async, they both utilize Celery, with the Flask app also taking advantage of Redis. They're both set up to be easily expandable for more robust file metadata, or if one were inclined to add processing functionality like text extraction or file manipulation.
Languages:
Python
Utilizes:
Django / Flask
Docker
Postgres
AWS S3
Azure Blob
Redis / Celery
Another general purpose file management API template, but this one is implemented in C#.NET with clean architecture principles. Like the Python versions, it is containerized and uses either/or AWS and Azure for file storage based on the request, and utilizes Postgres for asych queue based record management. It's also set up to be easily expandable to add more metadata or to do more advanced file processing.
Languages:
C#
Utilizes:
.Net8
Docker
Postgres / EntityFramework
AWS S3
Azure Blob
Like Bruce Lee said: "I fear not the man who has practiced 10,000 kicks once, but I fear the man who has practiced one kick 10,000 times". So to get some practice with Golang, I made yet another general purpose file management API template, focusing on clean architecture principles that fit the more standard Golang style codebase. Like the other versions, it is containerized but I chose to only use AWS for file storage to keep things a bit more simple. It still utilizes Postgres for record management and can be expanded for more metadata etc. but I really just wanted to build something with Go.
Languages:
Golang
Utilizes:
Docker
Postgres
AWS S3
Qualtrics is one of the most widely used academic/scientific survey platforms, but one thing it's missing is the ability to track user attention to the current task. This project demonstrates how to integrate JavaScript within Qualtrics to monitor user activity and track task engagement. The script detects whether a participant has left or entered the browser window and records these events in real-time. All captured data is stored within a JSON object and saved as an embedded variable within the Qualtrics platform itself, so the data comes out as part of the survey results. I've also provided the base scripts for unpacking the JSON for analysis in both Python and R.
Languages:
JavaScript
Utilizes:
Qualtrics
The goal of this project was to deploy a recommender system app online that would allow a user to select between a basic filter based approach, or a statistical approach built on user data. The app itself uses a classic public movie dataset (movielens) that has been used for recommendation research for quite a while. Since it's built in R's Shiny library, it also has a pretty boiler plate appearance, but I'm pretty happy with the final result as the algorithms run as expected and nothing breaks :)
Languages:
R
Utilizes:
Shiny App
UBCF and IBCF recommendation methods
Accumulates and analyzes news for a specified company via multiple APIS and creates relevancy weighted news sentiment scores with advanced NLP. This was one of my favorite projects to build, as it brings together a lot of different pieces.
Languages:
Python
Utilizes:
Multiple free news APIs to accumulate news
BM25 algorithm adjusted with Query Expansion
Retrained Neural Network for financial based sentiment analysis
Plotly Dash app.
Check out the repo for more information. There are multiple ERDs to help understand the codebase and methodology, and a link to a usage tutorial.
Although I have done tons of coding assignments, including them all here would be cumbersome, redundant, and frankly super boring. However, I found that this one from CS_598 Practical Statistical Learning was pretty challenging and it's a good showcase of some coding in R. The html from the notebook is embedded in the site, so if you want to check it out and judge me on my loop structure or bad R vectorization, feel free.
Languages:
R
Shows:
Coding EM algorithm for gaussian mixtures
Mathematical Derivations
Coding the Baum-Welch algorithm for Hidden Markov Models
Coding the Viterbi algorithm
For the final project of CS484 Parallel programing, the task was to code a repeating histogram sort algorithm in parallel using different paradigms on the University of Illinois's campus cluster. This was a fun way for me to test my coding chops, because if there's one thing sure to add a challenge to coding or to an algorithm, it's to have it run in parallel efficiently.
Languages:
C++
Utilizes:
MPI message passing protocols
Slurm cluster processing
If you're bored enough to check out the repo, see the solution.cpp file for the actual algorithm
To get extra practice building neural networks from scratch, I decided to participate in eBay's 2022 University Machine Learning Competition to see how I stacked up. The goal of the competition is to build a model for Named Entity Recognition to label a massive dataset of handbag listings. I obviously didn't win anything, but I don't think I did half bad training some models in my spare time. All in all it's good practice!
Benchmark F1 Score: 0.800
Best F1 Score: 0.8488
Languages:
Python
Models tried with various embedding strategies and architectures:
Bi-Directional LSTM
Transformer (scratch)
Transformer (re-trained DialoGPT)
Utilizes:
Tensorflow and Keras
LSTM and Transformer network architectures
Some examples in the repo (not all because they're huge and who actually cares)
For the culmination of the IBM Data Science certification, I decided on a value analysis approach to analyzing neighborhood value in the Chicagoland area for new home buyers. To add an extra challenge (and because I was curious) I wanted to see if the amount of tree cover had any effect on clustering with regard to housing prices and neighborhood value.
Languages:
Python
Utilizes:
Clustering analysis
Geospatial polygon mapping
Designed to be used as an extension to excel, this takes an assembly line approach to routine data processing to create and validate large import files. I originally created the first prototype with the goal of helping less technically savvy colleagues work with data faster, without the need to learn complex excel functions or coding. It grew into a decent excel tool to help speed up data manipulation, validation and importing while keeping human operators in the loop. No repo for this one, mostly because I couldn't be bothered to clean up VBA code to put it in a repo.
Languages:
VBA
Utilizes:
Excel with VBA
User defined templates for routine file processing
Assembly line procedure for quality checking and import creation
Creating custom excel report generators and script runners with VBA is a bit of a passion of mine.
The baked in integration with Excel and simple UserForm-code integration allows for incredibly swift development of tools to help with a plethora of routine data tasks.
Whether it be for reporting department metrics, or calculating the grades for my wife's psych 400 class, I say if you can put it into a spreadsheet, you might as well code a reusable solution.
While you're at it, making it look ridiculous is always a bonus.