Module 15 of 15 · 📖 4 min read · ⏱ 30 min total
FI-DPA 15 MLOps — Modelle produktiv betreiben (EN)
Table of contents (6 sections)
FI-DPA 15 MLOps — Operating Models in Production
In this module, you will learn the concepts and practical implementation steps for reliably operating machine learning models in production. You will understand how to version, monitor, and automatically update models when needed, while also considering ethical aspects and model explainability.
Concepts and Background
- Model Registry
- A central storage location for all models in various versions, managing metadata and lifecycle.
- Versioning
- The systematic storage of models with unique identifiers to enable reproducibility and rollbacks.
- A/B Testing
- A procedure for comparing two models where different user groups receive different versions to objectively evaluate performance.
- Drift Detection
- The continuous monitoring of model outputs and input data for deviations from expected behavior that could lead to performance degradation.
- Explainable AI (XAI)
- Methods for explaining predictions from machine learning models to create transparency and trust in the results.
Architecture Diagram
flowchart LR
A[Data Source] --> B[Data Preprocessing]
B --> C[Model Training]
C --> D[MLflow Registry]
D --> E[Model Deployment]
E --> F[A/B Testing]
F --> G[Monitoring]
G --> H[Drift Detection]
H --> I{Drift Detected?}
I -->|Yes| J[Retraining]
I -->|No| K[Production]
J --> C
Practical Steps
- Install and configure MLflow Server. This serves as a central platform for managing your models.
- Create an MLflow experiment for your project to group all runs and models.
- Train a model and automatically register it in the MLflow Registry.
- Mark a version of the model for A/B testing and deploy it.
- Implement a monitoring pipeline for data and concept drift.
- Create a retraining workflow using Apache Airflow or similar tools.
- Implement SHAP integration for model prediction explainability.
- Establish ethical guidelines for model evaluation and selection.
pip install mlflow
mlflow server --host 0.0.0.0 --port 5000
mlflow create experiment --experiment-name "Customer Classification"
mlflow.sklearn.log_model(
sk_model=model,
artifact_path="model",
registered_model_name="customer_churn_model"
)
client = MlflowClient()
client.transition_model_version_stage(
name="customer_churn_model",
version=1,
stage="Staging"
)
from evidently.report import Report
from evidently.metrics import DataDriftMetric
report = Report(metrics=[DataDriftMetric()])
report.run(reference_data=reference_df, current_data=new_data)
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
Common Pitfalls
Further Resources
- MLflow Official Documentation
- Evidently - ML Monitoring Platform
- SHAP Documentation for Model Explainability
- Book: Designing Machine Learning Systems
- Coursera Course: Machine Learning Engineering for Production (MLOps) Specialization
Knowledge Check
Four questions for self-assessment. Click on each question to see the correct answer and explanation.
What is the primary purpose of a Model Registry in MLOps?
- A) The automatic training of models
- B) A central storage location for models in various versions with metadata management
- C) The visualization of model outputs
- D) The data collection for training
Correct Answer: B. A Model Registry serves as a central storage for models in various versions and manages their metadata and lifecycle. Option A describes training, not storage. Option C is the task of XAI tools and Option D relates to data preprocessing.
What is the main purpose of A/B Testing in the context of ML models?
- A) The automatic updating of models
- B) The continuous monitoring of model outputs
- C) The objective comparison of two models by deploying them to different user groups
- D) The explanation of model outputs
Correct Answer: C. A/B Testing compares two models by having different user groups receive different versions to objectively evaluate performance. Option A describes retraining, Option B is Drift Detection and Option D is the task of XAI.
What is Explainable AI (XAI) in the context of MLOps?
- A) A procedure for automatic model optimization
- B) Methods for explaining model outputs for transparency and trust
- C) A system for version management of training data
- D) A protocol for model deployment
Correct Answer: B. XAI encompasses methods for explaining predictions from machine learning models to create transparency and trust. Option A describes hyperparameter optimization, Option C is part of data management and Option D refers to deployment processes.
What happens in the MLOps lifecycle when drift is detected?
- A) The model is automatically archived in the registry
- B) The system automatically performs retraining
- C) The model is flagged for human review
- D) The system alerts the development team
Correct Answer: B. When drift is detected, the system typically triggers an automatic retraining process to address the performance degradation. Option A is part of lifecycle management but not the immediate response to drift. Option C and D may occur depending on the implementation, but automatic retraining is the standard response to detected drift.