About me
Hui (Henry) Chen
Hui (Henry) Chen
Love container technologies, ML/DL, Big Data, AWS, and anything related to that!

I like the intersection of software and data, and apply my knowledge and skills to make an impact on real-world problems. Machine/Deep Learning and cloud are awesome, I would love to learn more about them!

😅 Some fun facts about me: I was born in China but grew up in South Africa. After high school, I independently came to the US to pursue higher education as a first-generation college student. During my college career free time, I worked as a laundry general assistant, construction worker, after-school tutor, and open source contributor.

Skills
All
Visualization
Database
Machine Learning
Statistics
Cloud
Java
Python
R
PHP
MySQL
AWS EC2
TensorFlow
scikit-learn
Flask
Apache Spark
Apache Hadoop
Selenium
Postman
Git
Firebase/ Firestore
Mongodb
pandas
Matplotlib
plotly
seaborn
shiny
ggplot2
Apache Superset
Keras
NumPy
Docker
Node.js
RapidMiner Studio
PyTest
Shell
streamlit
Linux OS
Jinja2
HTML5
CSS3
JavaScript
spaCy
NLTK
MapReduce
Education
New York Institute of Technology
Master of Science - Data Science
Sep 2021 – May 2022
CGPA: 3.96/4.00

Courses:
Data Visualization, Computational Statistics, Optimization, Big Data Analytics, Machine Learning, Deep Learning

Involvement:
Supported and updated Stanford University Journalism scraping scripts with latest python code.
New York Institute of Technology
Bachelor of Science - Computer Science
Jan 2018 - May 2021
CGPA: 3.51/4.00

Concentration:
Big Data Management and Analytics

Courses:
Data Structures & Algorithms, Probability and Statistics, Data Mining, Information Retrieval, Distributed Database Systems

Involvement:
Tech Lead, Google Developer Student Club (DSC)
Work Experiences
Software Engineer (Volunteer)
Aug 2021 - Present
Invisible Hands Deliver Inc, NY
  • - Maintained and upgraded an existing legacy system of 16.6K users in order to prevent potential security breaches.
  • - Implemented a real-time interactive geolocation map for serving 16.6K+ NYC users via the Google Map API.
  • - Designed and integrated new scalable modules to enhance the application usability through JavaScript, TypeScript, React Native, and Redux.
Data Scientist (Intern)
Feb 2022 - Jul 2022
Joblogic-X Corporation Inc, NY
  • - Led the design and development of the enterprise-level product recommendation model of the Product and Marketing Team, driving $1.3 million of revenue by collaborating with cross-functional partners to better product targetting.
  • - Analyzed Needfinding results from 90+ people from a Product Management perspective to gain insight into the business challenges and optimize staff-management objectives improving ROI by 63.5%.
  • - Conducted EDA on the client data with 46 features to investigate trends, outliers, missing data, anomalies, and bias.
  • - Built piplines to enrich training data using image augmentation and sampling to improve model's AUC by 5%.
  • - Enhanced the recommendation engine by employing an unsupervised algorithm with computer vision to achieve 94% recall (XGBoost) on the product segmentation.
Lead Project Coordinator & Graduate Assistant
Sep 2021 - May 2022
New York Institute of Technology, NY
  • - Showcased and deployed an end-to-end mobile app on a conference implemented and deployed through AWS Lambda, DynamoDB, React Native, and Node.Js in order to bring awareness of global native land.
  • - Scrapped and aggregated 4.2k data from multiple sources and performed data cleaning for analysis and visualization on Tableau.
  • - Implemented an internal tool to automate the grading and plagiarism checking over 100+ students through Git and JUnit and resulting in 70% time cut‑down.
Tech Lead
Sep 2020 - Sep 2021
Developer Student Club, NYIT, NY
  • - Helped and hosted workshops on various Google Technologies such as Introduction to Product Management, and Introduction to DevOps with CI/CD.
  • - Published new letters to the club member on the latest tech news.
  • - Collaborated with various clubs to host and lead intensive sessions on DevOps and hackathons on sustainable transport challenges.
Full-Stack Developer
Aug 2019 - mar 2020
SkyMobile Inc, NYC, NY
  • - Redesigned all existing websites for cross-device platform and over 10k+ records relational database through Agile and Waterfall Model
  • - Improved website SQL injection vulnerabilities by implementing server-side script through PHP OOP and database IAM.
  • - Implemented a secure payment system for online shopping contains 200+ business partners across NYC by utilizing Stripe SDK increasing sales by 20%.
  • - Reported directly to CEO: Engineering lead for designing and developing the fintech transaction dashboard that provides a rich visual summary of daily business partner transactions powered by Google Chart. Used by different departments in the decision-making process.
  • - Reduced website latency by 10% by configuring DNS records and intergrading with Cloudflare DNS.

  • - Leveraged knowledge in Git, Bootstrap UI, jQuery, Apache Server, MySQL, programmed in PHP using WebStorm IDE, HTML5, CSS3, Ajax, jQuery, Apache JMeter, cPanel and structured project in MVC.
Full-Stack Developer (Volunteer)
May 2019 - Aug 2019
The Artists Forum Inc, NYC, NY
  • - Reported directly to the Founder: Piloted a team of two developers to redesign an existing website that has 1k+ clients for cross-device platform through Agile and Feature-based model.
  • - Transformed all existing server data to a new server, and reconfigured server production environment.
  • - Implemented email system for administrator and webmaster by utilizing PHPMailer and SocketLab API.
  • - Reduced website latency by 12% by configuring DNS records and intergrading with Cloudflare DNS.
  • - Redesigned website UX based on business needs.

  • - Leveraged knowledge in Git, Bootstrap UI, Apache Server, programmed in PHP using WebStorm IDE, HTML5, CSS3, Ajax, jQuery, Apache JMeter, cPanel, and structured project in MVC
IT Administrator
Apr 2014 - Nov 2017
Best Price (CC), Kimberley, Northern Cape, South Africa
  • - Analyzed the POS transactions and bad debts data.
  • - Troubleshoot network router, printers, and biometric devices.
  • - Improved transaction search by 90% by implementing a GUI transaction system for relational databases over 100K records.

  • - Leveraged knowledge in Java, MySQL, NetBeans IDE, Java GUI, UCanAccess JDBC Driver
Projects
Open Source Project Contributions
achoes
Founder and Creator | July 2021 - Present
  • - An open-source lightweight data exploration tool through Python, Streamlit, Plotly, and Pandas.

Project URL: here
Image Classification: Feature Selection, Data Augmentation, and Transferred Learning
Feb 2022 - May 2022
  • - Analyzed the imbalance data of different handcrafted features (LBP, HoG, SIFT) and feature selection (PCA) with SVM for medical and facial expressions recognition using GPU acceleration to achieve 93.3% accuracy and 0.93 AUC.
  • - Applied image augmentation to enrich the training data and boost the SVM model accuracy by 18.5%.
  • - Fine-tuned the model through Stratified K-Fold Cross-Validation, Grid and Random Search to avoid overfitting.
  • - Compared performance in different metrics (AUC) with different feature and selection methods, such as PCA and LBP.
  • - Documented the experiment results of stratified cross-validation and feature engineering through Excel, plotly, and Weights & Biases.

  • - Utilized: TensorFlow, scikit-learn, Nvidia Rapids GPU, cuml, numpy, opencv, scikit-image, pandas, plotly, matplotlib

Other Creators: Michael Trzaskoma

Project URL: here
Data Visualization: Job Skillset Seeking
Feb 2022 - May 2022
  • - Built an interactive data visualization dashboard to better understand the job datasets through R, plotly and shiny.
  • - Applied spaCy and NLTK tool on job documents to extract keyword data, tokenization, and lemmatization from utils packages to better understand NLP.

  • - Utilized: spaCy, NLTK, Python, plotly, ggplot, matplotlib, R, shiny, pandas

Other Creators: Michael Trzaskoma, Bofan He

Project URL: here
Deep Learning: Stochastic Optimization Methods Analysis
Sep 2021 - Dec 2021
  • - Analyzed various optimization methods such as Adagrad, Adadelta, RMSprop, Adam, and Adamax using a CNN application through TensorFlow and Keras on a large scale of 20K image data in a comparative manner.
  • - Fine-tuned the optimizers by adjusting the batch size, learning rates, moving weighted average for momentum and RMSprop to achieve 90% accuracy for Adam through GPU.

  • - Utilized: Git, Python, TensorFlow-gpu, scikit-learn, Keras, NumPy, pandas, Jupyter Notebook, matplotlib

Other Creators: Michael Trzaskoma, Bofan He, Ephraim Hallford

Project URL: here
Machine Learning: Multinomial Classification Split Ratio Analysis
Sep 2021 - Dec 2021
  • - Analyzed various dataset split ratios on the impact classifiers' accuracies for Naïve Bayes, Decision Tree and Random Forest.
  • - Performed Grid Searches on the model selection to adjust the hyperparameters for the imbalanced dataset.
  • - Evaluated and presented the effect of different train and test set split ratios on models' accuracy, per class classification accuracy, confusion matrix, over and underfitting issues.

  • - Utilized: Git, Python, scikit-learn, NumPy, pandas, Jupyter Notebook, matplotlib

Project URL: here
Unsupervised Learning: K-Sample and Clustering Level Analysis
Sep 2021 - Dec 2021
  • - Performed model selection to find optimal K value and clustering level based on Elbow method and Dendrograms for K-means and Hierarchical Clustering.
  • - Analyzed the impact of various features and K sample/ clustering level on the models through Contingency Matrix.

  • - Utilized: Git, Python, scikit-learn, NumPy, pandas, Jupyter Notebook, matplotlib

Project URL: here
ML Regressions: Combined Cycle Power Plant
July 2021
  • - Implemented various scalable machine learning regression models such as MLR, Polynomial Linear Regression, SVR, Decision Tree Regression, and Random Forest Regression to predict the net hourly electrical energy output of the plant.
  • - Performed data cleaning on 9.5K records by feature scaling in order to improve model accuracy.
  • - Performed model selection by selecting Decision Tree Regression as the best performance of 96% of R2, 96% of adjusted R2, 11.2% of MSE, and 3.4% of RMSE.

  • - Utilized: Python, scikit-learn, pandas, numpy, matplotlib, and Jupyter Notebook

Project URL: here
Scholar Recommendation App: Scholar Seek
Feb 2021 - May 2021
Academic Project - Senior Design/ Spring 2021
  • - Showcased a cross-platform mobile app that allows students to easily create their profile and get modular content-based filtering in scholarships, colleges, and majors to the NYIT engineering department.
  • - Implemented web-scraping for scraping 2.7m rows of semi-structured scholarship data and 3.5k+ rows of unstructured US-college data by using selenium with anti-captcha and real-time authentication.
  • - Designed and documented RESTful APIs for backend server to enable a secured and encrypted token integration of the client-side devices, recommendation models, and MongoDB through Flask and JWT.
  • - Configured backend for production and test environments, performed functional testing, and containerized the recommendation models, RESTful APIs, and web-scrapings to AWS EC2 through Docker.

  • - Utilized: Agile methodology, Waterfall Model, React Native, Python, Git, JavaScript, MongoDB, Flask, Jinja2, Selenium, AWS EC2, Docker, RESTful API, JWT, PyTest, Postman, Ngrok, and shell programming.

Other Creators: Jungi Park, Michael Trzaskoma, Zakaria Khan, Greg Salvesen

Project URL: here
Cross-Platform Scholarship Recommendation App
Sep 2020 - Dec 2020
Academic Project - Information Retrieval/ Fall 2020
  • - Created a cross-platform app that allows clients to easily create their profile and get scholarship recommendations.
  • - Implemented web-scraping for scraping 2.7m scholarship data by using selenium with real-time login.
  • - Designed RESTful API backend server enabling the integration of React Native and recommendation model.
  • - Built cross-platform mobile app and UI with React Native.
  • - Utilized Google Authentication and Firestore API to build user role control module.
  • - Set up and configured project production environment in AWS EC2.
  • - Deployed recommendation model, RESTful API and web-scraping application on AWS EC2.

  • - Utilized: React Native, Git, Python, JavaScript, Firestore, Selenium, AWS EC2, Flask, RESTful API, Agile, and Waterfall Model

Other Creators: Michael Trzaskoma, Francis Cheng, Wentao Yang

Project URL: here
Cross-Platform Food Nutrition App
Sep 2020 - Dec 2020
Academic Project - Introduction to Software Engineering/ Fall 2020
  • - Created a cross-platform mobile app and UI with React Native that allows clients to easily create their allergy profile and get food allergic and FDA recall notification.
  • - Utilized Google Authentication and Firestore API to build user role control module.
  • - Set up and configured project production environment on AWS EC2.
  • - Designed RESTful API through Flask and then deployed on AWS EC2.
  • - Contributed to user allergy profile check implementation.
  • - Integrated recall module by utilizing food UPC code for FDA food recall report.

  • - Utilized: React Native, Git, Python, JavaScript, Firestore, AWS EC2, Flask, Agile, and Waterfall Model

Other Creators: Gregory Salvesen, Michael Trzaskoma, Zakaria M Khan

Project URL: here
Big Data: MR Movie
Sep 2020 - Dec 2020
Academic Project - Introduction to Big Data/ Fall 2020
  • - Implemented movie collaborative filtering using Cosine Similarity and Spark through the user ratings and MapReduce.
  • - Performed parallel computations for 1 million rows of MovieLens data in different nodes in the local cluster.
  • - Created a shell script to download the dataset from MovieLens for project environment setup.
  • - Analyzed and interpreted recommendation model result by comparing different threshold and cooccurrence threshold value.

  • - Utilized: Git, Python, Apache Spark, Hortonworks Sandbox, shell script, numpy

Other Creators: Ge Ding, Jinghua Li

Project URL: here
Linear Regression: Airbnb Open Data (NYC)
Sep 2020 - Dec 2020
Academic Project - Introduction to Data Mining/ Fall 2020
  • - Built linear regression models for Airbnb price prediction by examining feature relations, data exploratory analysis, feature engineering, and hyperparameter tunning through Grid Search and k-fold cross-validation.
  • - Created an interactive density map for 44k+ rows of semi-structured data for visualization and analytics through Folium.
  • - Established feature correlational matrix and importance graph for data preprocessing and feature selection.
  • - Evaluated and interpreted models’ prediction results with an MAE of 28%, R2 of 34%, and RMSE of 36%.

  • - Utilized: Git, Python, scikit-learn, linear regression, Folium, seaborn, Jupyter Notebook, Scipy, numpy, matplotlib, RapidMiner Studio

Other Creators: RajAshokbhai Khunt, PoojakumariRameshbhai Patel, VishalkumarRajeshbhai Gabani

Project URL: here
Cross-Platform In-Person Lecture Attendance App
Sep 2020 - Dec 2020
Academic Project - Programming Concept/ Fall 2020
  • - Built a cross-platform app using React Native that allows students and professors to easily create their profile and check-in in-person lecture attendance.
  • - Implemented selenium to web scraping entire NYIT engineering directory.
  • - Designed a user role control module through Google Authentication and Firestore API.
  • - Used Firestore API to create database CRUD for attendance listing.

  • - Utilized: Git, JavaScript, Python, Selenium, Firestore

Other Creators: Julia Capuana, Jungi Park, Jeonghyun Seo, Sakshi Rambhia, Rikin Shah

Project URL: here
Autonomous RC Car + Virtual Driving
Feb 2019 - Apr 2019
Personal Project
  • - Utilized PCA servo driver and Raspberry Pi to control RC car steering speed by integrating Donkey Car API.
  • - Showcased the project and result at NYIT Ventures Pitch Contest to faculty and students.
  • - Collected image data by remotely controlling the RC car by wirelessly displaying and operating the camera FOV and direction.
  • - Trained a supervised classification CNN autopilot model with 11 layers by using image data with Keras for generating a hierarchical file that contains steering and throttle value.
  • - Evaluated and interpreted models’ prediction result with an accuracy of 92%.

  • - Utilized: Git, PCA 9685 Driver, Raspberry Pi, Arduino, Python, Keras, TensorFlow, Donkey Car, Flask, 3D Printer

Project URL: here