09/01/2023
read 5 minutes

Tools for building an MLOps approach in a company

/upload/iblock/9e4/l8ro7hwbkl4w2af51jrdfcyjyp9bdpea/ai-identified-as-top-industry-disruptor-by-ceos-scaled_%281%29.jpg

MLOps, which stands for machine learning operations, helps automate machine learning processes and improves collaboration between data scientists and ML engineers. While MLOps is still considered a relatively new approach to working with data, numerous tools have already emerged for its implementation.

Why MLOps Is Needed

According to an IDC study, global spending on artificial intelligence is projected to exceed $300 billion by 2026. Yet, only 13% of ML/DL projects transition to production. This indicates that significant investments in AI may not yield the desired profitability for many companies. MLOps aims to enhance the efficiency of machine learning processes, increase the number of models, and elevate their quality to address business challenges more effectively.

Building on principles from DevOps, the data specialist community adapted these to the realm of machine learning, leading to the creation of MLOps. This methodology integrates the development, deployment, and monitoring of models into a structured workflow. Previously, model development and operations were conducted separately, with releases being strictly manual. MLOps consolidates these elements, ensuring a more consistent and stable operational process.

Choosing Tools for MLOps: Open Source vs Proprietary Software

For any ML project, several critical considerations arise: model management, the selection of tools to oversee the ML project lifecycle, and ways to augment process efficiency. Companies can opt for vendor solutions like SageMaker or build their platforms using open-source tools. When comparing these approaches, several criteria emerge:

  1. Code quality. While vendor solutions are often more reliable and optimized, they aren't infallible. Most proprietary tools restrict full access, forcing companies to await vendor updates when issues arise. Conversely, open-source tools grant everyone access to the source code, allowing for continuous improvements and error corrections by the community.
  2. Universality. Developing a "one-size-fits-all" product is a common approach among vendors, but such a solution can sometimes be either overly comprehensive or not fully applicable. Consequently, there may arise a need to integrate supplemental solutions. Open-Source tools are notable for their adaptability and modular nature, much like a building set. However, tailoring them to specific needs often demands an experienced development team.
  3. Entry threshold. Companies using proprietary MLOps software might encounter difficulties recruiting specialists familiar with those specific platforms. Leveraging Open-Source tools makes it easier to find data experts since many are already acquainted with well-known services like JupyterHub and MLflow.
  4. Flexibility. Open-source provides an extensive range of frameworks and libraries, allowing connections and disconnections as needed.
  5. Support. Maintaining a large Open-Source stack demands substantial resources. As a project grows, troubleshooting can consume time, and scaling further increases the management effort.

For those unfamiliar with open-source tools, proprietary options might be more suitable, especially for large-scale projects or when budget permits purchasing licensed software.On the other hand, open-source becomes the favored choice when customization to meet company requirements is paramount, access to best practices is essential, vendor lock-in risks need mitigation, or the latest features are sought after.

Existing Tools

Open-source solutions cover the entire ML stack in five stages:

  • Data collection, preparation, and labeling;
  • Model training;
  • Model quality assessment;
  • Model deployment to production;
  • Monitoring.

Overall, there are over 50 MLOps tools out there. The MyMLOps team has even developed a toolkit that lets you craft your own MLOps stack using open-source options. An example is provided below.

Tools for MLOps

Metaflow

Metaflow can be used in various machine learning projects. The platform includes the following features:

  • management of external dependencies;
  • management of computational resources;
  • reproduction of working processes;
  • switching local and external modes;
  • container launches execution.

Useful across diverse ML projects, Metaflow offers features like:

  • Managing external dependencies;
  • Controlling computational resources;
  • Reproducing and resuming workflow execution;
  • Switching between local and remote execution modes;
  • Executing container launches.

It acts as an intermediary layer between data scientists, Kubernetes, and infrastructure. An engineer can use Metaflow as a Python library to describe ML models in the form of DAG working processes.

MLflow

The service streamlines the essential phases of AutoML. While commonly used for experiment tracking, it also excels in model reproduction, deployment, and registry maintenance. Moreover, it seamlessly integrates with leading machine learning libraries like TensorFlow and Pytorch.

Kubeflow

Simplifying ML operations, Kubeflow aids in:

  • database creation;
  • AI training and optimization;
  • predictions;
  • real-time model management.

It integrates tools like JupyterLab, RStudio, and Visual Studio Code. Additionally, Kubeflow provides a UI for processing and tracking experiments, tasks, and runs.

Pachyderm

Pachyderm automates data transformation using data versioning. You can use the same syntax as in Git. The repository is the highest level of the object in Pachyderm. That’s why you can use Commit, Branches, File, History, and Provenance to track the data.

Seldon Core

Seldon Core is a popular Open-Source platform for deploying ML models in Kubernetes. It supports REST and gRPC protocols, and manual and automatic scaling.

Below is described the end-to-end workflow with Seldon Core.

end-to-end workflow with Seldon Core

Conclusion

MLOps, inspired by DevOps, streamlines machine learning workflows. As AI investments grow, most projects fail to reach production, highlighting the need for MLOps. While proprietary tools like SageMaker offer reliability for large projects, they lack the flexibility and adaptability of open-source alternatives. Tools like Metaflow, MLflow, and Kubeflow simplify various stages of ML, ensuring efficient model development and deployment. The choice between open-source and proprietary depends on the project's needs and the organization's goals.

News
10 September 202409/10/2024
read 2 minutesread 2 min
Product Digest
30 April 202404/30/2024
read 2 minutesread 2 min
Product digest quarter 1
5 April 202404/05/2024
read 1 minuteread 1 min
Introducing Our New Location in Kazakhstan