Erich Henrique
Data Science and Analytics | Azure Data Engineering | MLOps
Awards
- PDD Data Analytics graduated with distinction and top 1 cohort GPA (4.10).
- Applied Research Day fair prize for “Project that best communicates its results”.
- BEng Mechanical Engineering high class rank award with top 3 academic performance.
Experience
Data Analyst
TransLink | New Westminster / BC – Canada
2022.11 — Present
Data Analyst at the Technology & Enablement (T&E) Project — ERP Program.
- Design and development of data models for data migration from legacy Finance and Asset Management systems to Infor FSM and EAM.
- Development and documentation of data validation rules for data migration and audit.
- Lead communication with Product Owners to identify and assess data model requirements, use cases, integrations, and work-flow analyses.
- System Integration Testing and User Acceptance Testing execution support.
- Liaise with business stakeholders and support internal audit teams.
Research Assistant, Data Science
Langara College | Vancouver / BC – Canada
2022.09 — 2023.04
Research on nonstationary univariate time series modelling for missing value imputation and forecasting on air quality monitoring data.
Course Assistant, Computer Science
Langara College | Vancouver / BC - Canda
2021.09 — 2022.12
- Data Mining and Machine Learning for Data Analytics CPSC-4830 (Fall 2022)
- Data Mining and Machine Learning for Bioinformatics CPSC-4160 (Fall 2022)
- Data Base Systems CPSC-2221 (Spring and Summer 2022)
- Computer and Information Security CPSC-2810 (Fall 2021)
Education
PDD Data Analytics
Langara College | Vancouver / BC - Canada
2021.01 – 2022.04
BEng Mechanical Engineering
Centro Universitario FEI | Sao Paulo / SP - Brazil
2011.01 – 2015.12
Projects
TabularCompare
Open Source tabular data comparison tool. Command Line Interface and Python library for enhanced reporting capabilities on tabular data updates.
Air We Breathe
Study in progress with faculty members of Langara College. Implementation of nonstationary univariate time series imputation with an XGBoost meta-learner and LSTM layers for low and high-volatile air quality monitoring data.
Elderly Wellbeing
To identify the concerns of communities in Canada about well-being among the elderly, social media data was scraped using the Twitter API v2.0 Academic Access and investigated through NLP methods. A Tableau Dashboard rendered our team a winning prize at Langara’s Applied Research Day fair.
Time Series Stationarity Test Microservice
Containerized microservice for Time Series Stationarity testing with Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests. Template repository with best CI/CD practices for cloud or local deployment.
Google Landmark Recognition Challenge 2020
Competition sponsored by Google and hosted on Kaggle, the goal was to build a model that recognizes landmarks in photos. The project is my implementation of a metric learning solution with cosine similarity search (using an EfficientNet backbone for image embedding), paired with a DELF module for reranking based on local features of the images.
Semi-Supervised Learning on Disasters Tweets
To improve data annotation efficiency on text data, the project proposes the method “Representative Labeling”. Results show that with a K-Means clustering, our baseline classification algorithms achieved similar metric scores with up to twenty times less labeled data.
Tech Stack
Programming Languages
Python, SQL, R, Bash, Shell scripting.
Data Scraping and Wrangling
Python: Pandas, NumPy, Beautiful-Soup, SQLite, and others.
R: dplyr, tidyr, reshape2, RMySQL.
Visualization and BI
Python: Matplotlib, Seaborn, Plotly.
R: ggplot2, naniar, and others.
Tableau Prep and Desktop, PowerBI.
Machine Learning
Python: SciKit-Learn, XGBoost, LightGBM, statsmodels, Darts.
R: caret, car, rpart, randomForest, xgboost, and others.
Deep Learning
TensorFlow, Keras, PyTorch, Hugging Face, OpenCV, spaCy, and others.
MLOps and Cloud
AWS: Cloud9, S3, EC2, EBS, Lambda
Docker, Kubernetes, Snowflake.
Data Engineering and Big Data
AWS: Redshift, Apache Airflow, PySpark.
Azure: Databricks, Data Factory, Blob Storage
Version Control and Agile
Git, GitHub, Azure DevOps, Jira, and Confluence
Contact
📞: +1 (672) 513 3761
✉️: erich@esilva.io
in: linkedin.com/in/erich-henrique/