About Me

Erich Henrique

Data Science and Analytics | Azure Data Engineering | MLOps

Awards

  • PDD Data Analytics graduated with distinction and top 1 cohort GPA (4.10).
  • Applied Research Day fair prize for “Project that best communicates its results”.
  • BEng Mechanical Engineering high class rank award with top 3 academic performance.

Experience

Data Analyst

TransLink | New Westminster / BC – Canada

2022.11 — Present

Data Analyst at the Technology & Enablement (T&E) Project — ERP Program.

  • Design and development of data models for data migration from legacy Finance and Asset Management systems to Infor FSM and EAM.
  • Development and documentation of data validation rules for data migration and audit.
  • Lead communication with Product Owners to identify and assess data model requirements, use cases, integrations, and work-flow analyses.
  • System Integration Testing and User Acceptance Testing execution support.
  • Liaise with business stakeholders and support internal audit teams.

Research Assistant, Data Science

Langara College | Vancouver / BC – Canada

2022.09 — 2023.04

Research on nonstationary univariate time series modelling for missing value imputation and forecasting on air quality monitoring data.

Course Assistant, Computer Science

Langara College | Vancouver / BC - Canda

2021.09 — 2022.12

  • Data Mining and Machine Learning for Data Analytics CPSC-4830 (Fall 2022)
  • Data Mining and Machine Learning for Bioinformatics CPSC-4160 (Fall 2022)
  • Data Base Systems CPSC-2221 (Spring and Summer 2022)
  • Computer and Information Security CPSC-2810 (Fall 2021)

Education

PDD Data Analytics

Langara College | Vancouver / BC - Canada

2021.01 – 2022.04

BEng Mechanical Engineering

Centro Universitario FEI | Sao Paulo / SP - Brazil

2011.01 – 2015.12


Projects

TabularCompare

Open Source tabular data comparison tool. Command Line Interface and Python library for enhanced reporting capabilities on tabular data updates.

Air We Breathe

Study in progress with faculty members of Langara College. Implementation of nonstationary univariate time series imputation with an XGBoost meta-learner and LSTM layers for low and high-volatile air quality monitoring data.

Elderly Wellbeing

To identify the concerns of communities in Canada about well-being among the elderly, social media data was scraped using the Twitter API v2.0 Academic Access and investigated through NLP methods. A Tableau Dashboard rendered our team a winning prize at Langara’s Applied Research Day fair.

Time Series Stationarity Test Microservice

Containerized microservice for Time Series Stationarity testing with Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests. Template repository with best CI/CD practices for cloud or local deployment.

Google Landmark Recognition Challenge 2020

Competition sponsored by Google and hosted on Kaggle, the goal was to build a model that recognizes landmarks in photos. The project is my implementation of a metric learning solution with cosine similarity search (using an EfficientNet backbone for image embedding), paired with a DELF module for reranking based on local features of the images.

Semi-Supervised Learning on Disasters Tweets

To improve data annotation efficiency on text data, the project proposes the method “Representative Labeling”. Results show that with a K-Means clustering, our baseline classification algorithms achieved similar metric scores with up to twenty times less labeled data.


Tech Stack

Programming Languages

Python, SQL, R, Bash, Shell scripting.

Data Scraping and Wrangling

Python: Pandas, NumPy, Beautiful-Soup, SQLite, and others.

R: dplyr, tidyr, reshape2, RMySQL.

Visualization and BI

Python: Matplotlib, Seaborn, Plotly.

R: ggplot2, naniar, and others.

Tableau Prep and Desktop, PowerBI.

Machine Learning

Python: SciKit-Learn, XGBoost, LightGBM, statsmodels, Darts.

R: caret, car, rpart, randomForest, xgboost, and others.

Deep Learning

TensorFlow, Keras, PyTorch, Hugging Face, OpenCV, spaCy, and others.

MLOps and Cloud

AWS: Cloud9, S3, EC2, EBS, Lambda

Docker, Kubernetes, Snowflake.

Data Engineering and Big Data

AWS: Redshift, Apache Airflow, PySpark.

Azure: Databricks, Data Factory, Blob Storage

Version Control and Agile

Git, GitHub, Azure DevOps, Jira, and Confluence


Contact

📞: +1 (672) 513 3761

✉️: erich@esilva.io

in: linkedin.com/in/erich-henrique/


Garibaldi Lake, Canada | Unsplash, by Bryce Evans