Work Skills Experience Contact
Available for ML Engineering roles · Global

Mohammad
Umar Haris

Machine Learning Engineer

Building production-ready ML systems — not just models. Focused on NLP pipelines, predictive automation, and cloud-ready deployment across enterprise environments.

4+
Years of experience
22%
Forecast accuracy improvement
35%
Reporting latency reduction
MSc
Data Science · Distinction · Essex

Systems thinking.
Not just modelling.

Most ML work stops at the notebook. Mine starts there and ends in production — with monitored pipelines, versioned models, and measurable business outcomes.

With a background spanning insurance, logistics, and pharmaceuticals, I've built ML systems where correctness and reliability are non-negotiable constraints, not afterthoughts.

01

Pipeline-first engineering

Every model is a component of a larger system. I design for data ingestion, transformation, training, evaluation, and monitoring from the outset.

02

Cloud-native deployment mindset

Azure Databricks, Data Factory, and SQL are my production environment. I build ETL workflows and ML pipelines that scale without manual intervention.

03

Business outcome orientation

Accuracy metrics matter. So do cost reductions, turnaround time, and stakeholder adoption. I connect model performance to operational impact.

04

Precision over volume

I build fewer systems, built well — with clean code, reproducible experiments, and deployment documentation that survives handover.

Featured
Systems

Framed as engineering projects — with architecture, data flow, and deployment strategy.

NLP · Clinical Prediction

Gout Prediction Using NLP

Clinical gout diagnosis relies heavily on subjective symptom descriptions in free-text patient records. This project applies NLP to extract structured clinical signals from unstructured text and trains a prediction model to flag gout risk — reducing dependence on manual clinical interpretation.

Productionising
01

Text Ingestion & Cleaning

Clinical text preprocessing — tokenisation, stopword removal, medical entity normalisation.

02

NLP Feature Extraction

TF-IDF and n-gram features. Symptom keyword extraction for domain-specific signal engineering.

03

Prediction Model

Classification pipeline (Logistic Regression / Random Forest) trained on labelled clinical records.

04

Evaluation

Precision, recall, F1 and AUC-ROC scoring. Cross-validated on held-out clinical split.

05

Deployment Roadmap

FastAPI inference endpoint → Docker image → Azure Container Instance. CI/CD via GitHub Actions.

Python NLTK spaCy Scikit-learn TF-IDF Pandas Jupyter Matplotlib

Refactor notebook to modular Python

Separate data, features, training, and evaluation into clean modules.

Wrap in FastAPI inference endpoint

POST /predict accepts raw text, returns risk score + confidence.

Containerise & deploy to Azure

Docker image → Azure Container Apps. GitHub Actions CI/CD on push.

Bioinformatics · Python Web App

DNA Sequence Analysis & Visualisation Platform

Bioinformatics researchers working with raw sequence data lacked a fast, accessible tool for computing and visualising key metrics — sequence length distributions, GC content, and compositional profiles — without writing custom scripts for each dataset.

Live · Productionising
01

File Upload & Parsing

Users upload FASTA/sequence files. Python backend parses records and validates format.

02

Sequence Metrics Engine

Computes sequence length, GC content, nucleotide frequency distributions, and compositional stats.

03

Interactive Visualisation

Dynamic charts rendered in-browser. Histogram, bar, and line views per metric.

04

Web Interface

Clean Python-powered web app. No external dependencies required from user.

05

Deployment Roadmap

Containerise with Docker → deploy to Azure App Service with CI/CD via GitHub Actions.

Python BioPython Pandas Matplotlib Plotly Streamlit Docker

Containerise with Docker

App + dependencies packaged into a reproducible image.

Deploy to Azure App Service

Publicly accessible endpoint with auto-scaling and health monitoring.

GitHub Actions CI/CD

Automated test and deploy pipeline on every push to main.

Time Series · Forecasting · Enterprise

Demand Forecasting & Logistics Routing Engine

Recovery operations at Call Assist relied on manual demand estimation — planners were reacting to shortfalls rather than anticipating them. No predictive layer existed, leading to inefficient vendor allocation and avoidable transportation costs at scale.

Production · Call Assist Ltd
01

Historical Data ETL

Python + Azure Data Factory pipeline ingesting 3 years of operational records into Databricks.

02

Feature Engineering

Lag features, rolling averages, seasonality decomposition, and vendor capacity signals.

03

Ensemble Forecasting

XGBoost + LightGBM ensemble with Prophet baseline. Cross-validated on 6-month holdout.

04

Routing Optimisation

Forecast outputs feed a constraint-based vendor allocation model, eliminating idle capacity.

05

Power BI Dashboard

Forecast results surfaced to operations team with 7-day rolling prediction window.

Python XGBoost LightGBM Prophet Scikit-learn PySpark Azure Databricks Azure Data Factory Power BI

22% planning accuracy improvement

Measured against 3-month rolling average baseline.

14% transportation cost reduction

Via demand-matched vendor allocation eliminating over-provisioning.

18–20% service cost reduction

Through pipeline automation across operational and financial datasets.

Technical Stack

Machine Learning
  • Predictive Modelling
  • Classification & Regression
  • Time Series Forecasting
  • Feature Engineering
  • Hyperparameter Tuning
  • A/B Testing & Experimentation
  • Cross-Validation
NLP & Deep Learning
  • Text Classification
  • NLP Pipelines (spaCy)
  • HuggingFace Transformers
  • DistilBERT / BERT fine-tuning
  • TensorFlow (Projects)
  • Scikit-learn
  • GenAI Workflows
Data Engineering
  • ETL Pipeline Design
  • Azure Databricks
  • Apache Spark / PySpark
  • Azure Data Factory
  • Azure SQL / PostgreSQL
  • Data Warehousing
  • Python Automation
MLOps
  • Model Monitoring & Logging
  • Docker Containerisation
  • FastAPI Inference APIs
  • CI/CD Pipelines
  • MLflow (Experiment Tracking)
  • Model Versioning
  • Azure Monitor
Cloud & Infrastructure
  • Azure (AZ-900 Certified)
  • Azure Container Instances
  • GCP (Exposure)
  • GitHub Actions
  • Vercel (Frontend)
Languages & Tooling
  • Python (Primary)
  • SQL (PostgreSQL, MySQL)
  • R (Academic)
  • Power BI / DAX
  • Tableau
  • Git / GitHub
  • Agile / Scrum

Experience

Sep 2023 – Jan 2026
Call Assist Ltd
Colchester, UK
Operations Analyst · ML & Data
  • Engineered automated data pipelines in Python and Azure, standardising operational datasets and reducing service costs by 18–20%.
  • Built predictive forecasting models for demand and recovery operations in Databricks, improving planning accuracy by 22% through time-series regression techniques.
  • Implemented MLOps monitoring and logging in Azure to ensure model reliability across high-volume CRM analytics workflows.
  • Applied feature engineering and hypothesis testing on CRM datasets to identify behavioural drivers, improving response efficiency by 15%.
  • Engineered cloud-based ETL pipelines (Azure SQL + Databricks), reducing reporting turnaround by 35% and saving 10+ analyst hours per week.
Mar 2024 – Jan 2025
System Engineering
London, UK · Contract
Analytics Engineer
  • Designed end-to-end data transformation pipelines in Azure Databricks for large-scale datasets, reducing reporting latency by 35%.
  • Optimised Apache Spark SQL workflows, improving BI tool performance by 40% during peak reporting cycles.
  • Developed feature engineering scripts in Python to support predictive analytics and statistical modelling for manufacturing stakeholders.
  • Delivered interactive Power BI dashboards in Agile sprints, increasing stakeholder data adoption by 40%.
Dec 2021 – Sep 2022
Cognizant Technology Solutions
Gurugram, India
Associate Analyst
  • Developed an Excel-based decision automation system, reducing policy evaluation time by 30%.
  • Automated policy validation workflows achieving 99% data accuracy and eliminating manual errors at scale.
  • Conducted A/B testing on ad placements, driving 15–20% increase in conversion rates.
Jan 2020 – Dec 2021
Inletware Ltd
Gurugram, India · Part-Time
Data Analyst
  • Led analytics for a pharmaceutical e-commerce platform integrating Azure and GCP data sources to support expansion into 3 regions.
  • Developed time-series forecasting and demand planning models, reducing stockouts by 12%.
  • Built 30+ BI dashboards in Power BI and Tableau, improving reporting efficiency by 20%.
MSc Data Science and its Applications
University of Essex, United Kingdom
2022 – 2023
Distinction
Bachelor of Pharmacy
Jamia Hamdard University, India
2017 – 2021
First Class / 2:1 Equivalent
Microsoft Certified: Power BI Data Analyst Associate
Microsoft Certified: Azure Fundamentals (AZ-900)

Currently productionising
ML systems.

Projects are in active development — moving from notebook-stage to Docker-containerised, CI/CD-deployed, monitored pipelines on Azure.

01

Docker containerisation

FastAPI + model weights packaged into reproducible images

02

CI/CD via GitHub Actions

Automated test → build → deploy pipeline on every push

03

Azure Container Apps deployment

Scalable serverless hosting for inference endpoints

04

MLflow experiment tracking

Model versioning, parameter logging, and metric dashboards

05

Drift monitoring

Automated alerts on feature and prediction distribution shifts

Open to the right
ML Engineering role.

Looking for production-focused ML positions at companies where systems thinking and deployment quality matter. FAANG, growth-stage tech, and enterprise SaaS all of interest.

Message received. I'll respond within 24 hours.