Bring Machine Learning to Production at Scale: Machine Learning Operations (MLOps)

by Prachi Sinha and Tyler Femenella

What is MLOps

Machine learning (ML) is becoming increasingly integrated with software packages. A McKinsey survey highlighted that by 2022, over 50% of organizations had implemented machine learning in at least one area of their operations, a significant increase from just 20% in 2017. In healthcare, for instance, the machine learning market, valued at $15.1 billion in 2022, is expected to surge to $187.95 billion by 2030, indicating integration of machine learning in healthcare applications. Like the evolution of CI/CD and DevOps, it is necessary to have standardized practices to streamline development, testing, and deployment. The goal of MLOps is to incorporate concepts and ideas of existing best practices and adopt them for ML development. Via automation and integrations with existing deployment pipelines, MLOps allows data scientists to spend more time on the latest problem, rather than maintaining and deploying existing models.

Model Development

Machine learning development is an intensive process that can cycle through many different iterations. Data scientists test different algorithms and search for optimal input parameters. By creating a workflow that is repeatable, one can create an environment where a team can quickly test and compare many model experiments at once. For example, a team of data scientists working on an entity resolution model for an asset management company. Their goal is to identify and match records across multiple data sources to create a unified view of assets, location, and transactions. To achieve this, they may need to test several algorithms—such as clustering techniques, probabilistic matching—and fine-tune numerous parameters to handle variations in names, addresses, and other identifying information accurately. Without a structured, repeatable workflow, this process could quickly become overwhelming. With hundreds or even thousands of records to analyze and multiple models to compare, it’s easy for the team to lose track of what works best. However, by establishing a consistent and automated workflow, they can streamline the testing process and efficiently compare results, allowing them to find the optimal model faster. A data science team working on a new model would leverage the following to support the creation of ML artifacts that are consistent with other software packages.

49% of organizations ranked basic integration issues as a concern, and the survey found that cross-functional alignment continues to be a major blocker to organizations achieving AI/ML maturity.
64% of organizations take a month or longer to deploy a machine learning model into production.
38% of all organizations are spending more than 50% of their data scientists’ time on model deployment

~2021 survey by Algorithmia

Version Control: FI’s model development team uses versioning tools such as GitHub for not just the source code, but the data artifacts used in the training of the model. By versioning the training data each team member will be able to iterate through feature engineering exploration without impacting other team members. Having these objects versioned allows the team to revisit old experiments.

Container Registry: By leveraging a containerization tool, we have the ability to leverage cloud computing and spin up separate resources, giving them the capability to run multiple experiments or model training activities at once. Having FI’s environment codified, team members can test separate versions of already utilized tools as well as test out new tools without impacting existing resources.

Experiment tracking: With all these tests being done across the team it becomes imperative has a method to track what tests have been done with which models. This is vital to maintain a collaborative working environment for FI team and the model over potentially years of production activity. This will be achieved with the use of version control tools like GitHub which provides detailed change histories or platform solutions like AWS SageMaker that allows users to tag and catalog each model version, keeping a clear record of what has been tested and which parameters were used

Model Registry: Produced models that are ready to be integrated with the wider software solution. Our team leverages tools such as SageMaker Model Registry or existing CI/CD artifact management tools like Nexus or Artifactory. Creating a registry to place trained models and store metadata for each model. This includes meta data and links to governance approvals, bias reports, and other relevant audit/testing information. According to Algorithmia 2021 Report, 56% of all organizations rank governance, security and auditability issues as a concern—and 67% of all organizations report needing to comply with multiple regulations for their AI/ML.

Model Pipeline: A model pipeline can be used to automate various steps during model development from data clean up and ingestion, to training and tuning. They also allow the deployment of models to their endpoints which can be integrated with CI/CD pipelines. Using orchestration tools like Apache Airflow or SageMaker ML Pipelines, FI team can introduce continuous training which allows this process to run on a set frequency. By doing so, models can be trained on the most up-to-date data. These pipelines will make it easy to recreate any production issues by giving the team the ability to import existing models into lower environments, and quickly diagnose any inference issues. For example, fraud detection models that constantly learn from the latest transaction data. With a model pipeline, it can be retrained automatically each week or even daily, enhancing its ability to detect subtle changes in fraudulent behaviors. Once trained, these models can be swiftly deployed to endpoints and integrated into CI/CD pipelines, ensuring seamless updates without downtime.

Model Monitoring

Once the model has been deployed into our production environments, we introduce effective model monitoring that allows for a real-time view of performance over time. By building a process to capture input and output data, we can evaluate performance and changes in the data schema over time. Once our model in production is being properly monitored, all data around the model’s performance being captured. We will have enabled our teams to create alerting and develop automated responses to various model behaviors.

Anomaly Detection: Monitoring for abnormalities in data patterns or unexpected shifts in model performance ensures that any deviations are flagged, allowing for proactive intervention and maintenance.

For example, we have deployed an entity resolution model for an asset management company. This model is designed to identify and link records referring to the same property across multiple databases. Accurate entity resolution is critical to prevent duplicate records, streamline property modifications, and ensure regulatory compliance.

Example Scenario:

Under normal conditions, the model maintains an accuracy rate of 98% in linking entities correctly. However, during a major data migration, the following pattern is observed:

Time	Accuracy Rate	Records Processed per Hour
8 AM	98%	10,000
12 PM	95%	30,000
4 PM	90%	50,000
8 PM	85%	60,000

As the volume of records processed increases during the migration, the accuracy rate begins to decline. This could be due to variations in data quality, formatting inconsistencies, or changes in database schemas that make entity matching more complex. By monitoring these metrics in real-time, we can quickly detect the issue and take corrective actions, such as adjusting the model’s parameters to handle new data formats or adding preprocessing steps to clean and standardize incoming records.

Drift Detection and Automated Retraining: Over time, a model’s performance will slowly decay. In response we can set up a threshold which can be used as a trigger for the model pipeline to run, retaining the performance.

Conclusion

Like the adoption of DevOps and CI/CD practices, MLOps adoption will become a differentiator among companies in their technology maturity. FI sees MLOps as integral to model deployment and management of ML/AI adoption. Treating ML and AI consistently with other software packages creates an environment where models can be deployed automatically at scale and under the same scrutiny as existing applications are today.

Bring Machine Learning to Production at Scale: Machine Learning Operations (MLOps)

What is MLOps

Model Development

Model Monitoring

Conclusion

Sources

Balancing Fairness, Accuracy, and Explainability in Machine Learning Models for Credit Decisioning

Inspiring Confidence in Financial Modeling with Automated Data Quality Monitoring

The Foundations for Evidence-Based Policymaking Act of 2018: Impact and Implications for Government Agencies

Bring Machine Learning to Production at Scale: Machine Learning Operations (MLOps)

What is MLOps

Model Development

Model Monitoring

Conclusion

Sources

Related Content

Balancing Fairness, Accuracy, and Explainability in Machine Learning Models for Credit Decisioning

Inspiring Confidence in Financial Modeling with Automated Data Quality Monitoring

The Foundations for Evidence-Based Policymaking Act of 2018: Impact and Implications for Government Agencies