3 min read

MLOps Principles

MLOps (a compound of machine learning and operations) is a relatively new term that refers to the need for collaboration between data scientists and the operations or production team. The objective is to eliminate waste through automatic, richer and more consistent insights.
MLOps Principles

As machine learning and AI propagate in software products and services, we need to establish best practices and tools to test, deploy, manage, and monitor ML models in real-world production. In short, with MLOps we strive to avoid “technical debt” in machine learning applications.

SIG MLOps defines “an optimal MLOps experience [as] one where Machine Learning assets are treated consistently with all other software assets within a CI/CD environment. Machine Learning models can be deployed alongside the services that wrap them and the services that consume them as part of a unified release process.” By codifying these practices, we hope to accelerate the adoption of ML/AI in software systems and fast delivery of intelligent software.

The complete ML development pipeline includes three levels where changes can occur: Data, ML Model, and Code. This means that in machine learning-based systems, the trigger for a build might be the combination of a code change, data change or model change. The following table summarizes the MLOps principles for building ML-based software:

MLOps PrinciplesDataML ModelCode
Versioning1) Data preparation pipelines
2) Features store
3) Datasets
4) Metadata
1) ML model training pipeline
2) ML model (object)
3) Hyperparameters
4) Experiment tracking
1) Application code
2) Configurations
Testing1) Data Validation (error detection)
2) Feature creation unit testing
1) Model specification is unit tested
2) ML model training pipeline is integration tested
3) ML model is validated before being operationalized
4) ML model staleness test (in production)
5) Testing ML model relevance and correctness
6) Testing non-functional requirements (security, fairness, interpretability)
1) Unit testing
2) Integration testing for the end-to-end pipeline
Automation1) Data transformation
2) Feature creation and manipulation
1) Data engineering pipeline
2) ML model training pipeline
3) Hyperparameter/Parameter selection
1) ML model deployment with CI/CD
2) Application build
Reproducibility1) Backup data
2) Data versioning
3) Extract metadata
4) Versioning of feature engineering
1) Hyperparameter tuning is identical between dev and prod
2) The order of features is the same
3) Ensemble learning: the combination of ML models is same
4)The model pseudo-code is documented
1) Versions of all dependencies in dev and prod are identical
2) Same technical stack for dev and production environments
3) Reproducing results by providing container images or virtual machines
Deployment1) Feature store is used in dev and prod environments1) Containerization of the ML stack
3) On-premise, cloud, or edge
1) On-premise, cloud, or edge
Monitoring1) Data distribution changes (training vs. serving data)
2) Training vs serving features
1) ML model decay
2) Numerical stability
3) Computational performance of the ML model
1) Predictive quality of the application on serving data

Along with the MLOps principles, following the set of best practices should help reducing the “technical debt” of the ML project:

MLOps Best PracticesDataML ModelCode
Documentation1) Data sources
2) Decisions, how/where to get data
3) Labelling methods
1) Model selection criteria
2) Design of experiments
3) Model pseudo-code
1) Deployment process
2) How to run locally
Project Structure1) Data folder for raw and processed data
2) A folder for data engineering pipeline
3) Test folder for data engineering methods
1) A folder that contains the trained model
2) A folder for notebooks
3) A folder for feature engineering
4)A folder for ML model engineering
1) A folder for bash/shell scripts
2) A folder for tests
3) A folder for deployment files (e.g Docker files)
Machine Learning Operations

#probyto #probytoai #datascience #machinelearning #python #artificialintelligence #ai #dataanalytics #data #bigdata #deeplearning #programming #datascientist #technology #coding #datavisualization #computerscience #pythonprogramming #analytics #tech #dataanalysis #iot #programmer #statistics #developer #ml #business #innovation #coder #dataanalyst

Subscribe and follow us for latest news in Data Science, Machine learning, technology and stay updated!

Facebook: https://facebook.com/probyto
Twitter: https://twitter.com/probyto
LinkedIn: https://linkedin.com/company/probyto
Instagram: https://instagram.com/probyto