Ryu Sonoda

Data scientist, ML Engineer and Software Engineer

Education

Columbia University, MS in Data Science

Grinnell College BA, in Computer Science with honor

I am currently pursuing a Master of Science degree in Data Science as a graduate student. My enthusiasm lies in collaborating with organizations that are dedicated to harnessing the power of their data effectively. This extends from the initial concept of defining key metrics and devising precise data collection methods to the meticulous process of data preparation, in-depth analysis, and the application of machine learning techniques. My ultimate goal is to present these insights in a compelling and impactful manner, facilitating informed decision-making.

profile-picture

Portfolio

《View the code and paper》

Project Title 

Brazilian Jiu-Jitsu Image Recognition

In this research project, we undertook an extensive exploration of machine learning models, employing a diverse range of techniques such as Support Vector Machines (SVM), Convolutional Neural Networks (CNN), and Transfer Learning. Our primary objective was to classify Brazilian Jiu-Jitsu images, drawing from a vast dataset comprising 120,279 labeled images, into 18 distinct positions. Impressively, our models achieved remarkable accuracies, with some reaching up to 99.5%.

《View the paper》

Project Title 

Effect of Driving Alone on Poor Mental Health

In our research project, my group investigated the impact of long commutes on mental health using extra sums of squares analysis. Through model analysis, we developed a reduced model with a noteworthy 76.2% adjusted R2 value, identifying significant predictors of poor mental health. Our findings suggested a potential connection between long commutes and diminished mental health when considering other factors. Won an honorable mention in the Undergraduate Statistics Class Project.

《View the code and paper》

Project Title 

Effect of Driving Alone on Poor Mental Health

In this project, my group investigated the factors influencing song popularity and develops a predictive model using machine learning techniques. We explored various regression models, including ridge and lasso regression, random forest regression, and principal component analysis. We identified associations between song attributes and popularity levels, with notable correlations between energy and loudness, and Random Forest proves to be the most effective model.

《View the app》 health-app

U.S. Health Map Dashboard

Users can explore the trend of dozens of county-level health related variables such as % adult with obseity and % excessive drinking across the U.S., and conduct a simpple linear regression between variables.

Tech stack: R, ggplot, R Shiny

《View the app》 Youtube_dashboard

Youtube Comment Analysis Dashboard

Users can analyze YouTube video comments, categorizing them into 8 emotions, visualizing them with word, and using a BERT-based Neural network model to assess sentiment.

Tech stack: Python, PyTorch, matplotlib, Flask, YouTube API

《View the app》 CWLO-app

Course Assessment Dashboard

Users can explore the connections between classes and the learning outcome through various interactive visualizations such as Sankey Diagram and stacked barchart.

Tech stack: R, ggplot, R Shiny, Javascript

《View the code》 Flashcard-app

Chinese Flashcard generator

Users can generate multiple sets of the Chinese flashcards based on the typed paragraph utilizing NLP with the already memorized vocablaries excluded.

Tech stack: Python, tkinter, pandas, jieba

Work Experience

Research Assistant in DitecT Lab (Columbia University: Sep 2023 - present)

Teaching Assistant (Grinnell College: Jan 2023 - May 2023)

Software Engineer Intern (VoicePing: June 2022 - Aug 2022)

Leadership Experience

Financial development director (HLAB: Nov 2020 - Oct 2021)

President of Japanese Cultural Association (Grinnell College: Aug 2020 - May 2021)

My skills

SQL

I am proficient in using SQL Server, and MySQL, and skilled in writing complex queries and stored procedures to extract and manipulate data efficiently.

Python

I have 3 years of experience in Python, creating games and desktop apps such as Chinese-flashcard generator. I am familiar with major libraries such as pandas and flask.

R

I have been using R for a year mainly for conducting statistical analysis and creating interactive visualization app. I am familiar with several libraries such as plotly, ggplot, R shiny.

Software Engineering

I have hands-on experience as a software engineer during my internship, where I developed a web application for VR trip. I can build an efficient app with my strong background in data structure and computer architecture.

Machine
Learning

I am familiar with various Machine Learning algorithms and libraries such as scikit-learn, Matplotlib, and PyTorch. My favorite project was Jiujitsu position image classifcation, where the fine-tuned model achieved 99.5% accuracy in the test data.

Statisical
Analysis

Statistical analysis is a my passion. The topic I worked in the past vary from econimy data, health data, to sports data. One of my projects, "Effect of Driving Alone on Mental Health" recieved honorable mention in USCLAP.

Blog

《Read》

R package ecosystem

Explored a comprehensive understanding of the factors influencing package popularity and characteristics within the ecosystem.

《Read》

College tuition & accpetance rate

Analyzed the 2 decades trend of college tuition and acceptance rate in the U.S. throuugh various visualizaion.

《Read》

Coming soon...

Coming soon...

《Read》

Coming soon...

Coming soon...

Interested in hiring me? Let's have a chat!

CONTACT ME