Compare 1000s of AI experiments at once.
Works well with: | AWS Lambda Azure Functions Google Cloud Run |
Competes with: | Seldon Core to some extent KFServing to some extent |
BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models.
BentoML tries to bridge the gap between Data Science and DevOps. By providing a standard interface for describing a prediction service, BentoML abstracts away how to run model inference efficiently and how model serving workloads can integrate with cloud infrastructures.
A tool for building feature stores. Transform your raw data into beautiful features.
The library is centered on the following concetps:
Open-source library for implementing CI/CD in machine learning projects.
On every pull request, CML helps you automatically train and evaluate models, then generates a visual report with results and metrics.
Comet enables data scientists and teams to track, compare, explain and optimize experiments and models across the model’s entire lifecycle. From training to production.
DAGsHub enables data scientists and ML engineers to work together, effectively. Integrating open-source tools like Git, DVC, MLflow, and Jenkins so that you can track and version code, data, models, pipelines, and experiments in one place.
git checkout
.Validate and monitor your data and models during training, production and new version releases.
Features:
ELI5 is a Python library which allows to visualize and debug various Machine Learning models using unified API. It has built-in support for several ML frameworks and provides a way to explain black-box models.
Competes with: | Hopsworks |
Feast is an operational data system for managing and serving machine learning features to models in production.
Flyte makes it easy to create concurrent, scalable, and maintainable workflows for machine learning and data processing.
Works well with: | Github Actions |
Provides API-powered synthetic data test fixtures for your tabular data-enabled features — enabling regression and integration tests to be easily built and deployed for your production ML models and data science components
InterpretML is an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof. With this package, you can train interpretable glassbox models and explain blackbox systems. InterpretML helps you understand your model's global behavior, or understand the reasons behind individual predictions.
Interpretability is essential for:
Works well with: | Jupyter Notebooks Cloud Python R-Studio Tensorflow Spark |
Katonic MLOps Platform is a collaborative platform with a Unified UI to manage all data science activities in one place and introduce MLOps practice into the production systems of customers and developers. It is a collection of cloud-native tools for all of these stages of MLOps:
Katonic is for both data scientists and data engineers looking to build production-grade machine learning implementations and can be run either locally in your development environment or on a production cluster. Katonic provides a unified system—leveraging Kubernetes for containerization and scalability for the portability and repeatability of its pipelines.
This project is about explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data) or images, with a package called lime (short for local interpretable model-agnostic explanations). Lime is based on the work presented in this paper (bibtex here for citation).
MLRun is an end-to-end open-source MLOps orchestration framework to manage and automate your entire analytics and machine learning lifecycle, from data ingestion, through model development to full pipeline deployment. MLRun eases the development of machine learning pipelines at scale and helps ML teams build a robust process for moving from the research phase to fully operational production deployments.
OpenPAI is an open-source platform that provides complete AI model training and resource management capabilities, it is easy to extend and supports on-premise, cloud, and hybrid environments on various scales.
Works well with: | Jupyter Notebooks Self-Hosted Cloud |
Build data pipelines, the easy way!
No framework. No YAML. Just write Python and R code in Notebooks.
Features:
Works well with: | Kubernetes AWS Batch Airflow |
Develop and test workflows locally, seamlessly execute them in a distributed environment.
Features:
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions.
Competes with: | KFServing BentoML to some extent |
Seldon Core makes it easier and faster to deploy your machine learning models and experiments at scale on Kubernetes.
Seldon handles scaling to thousands of production machine learning models and provides advanced machine learning capabilities out of the box including Advanced Metrics, Request Logging, Explainers, Outlier Detectors, A/B Tests, Canaries and more.
Competes with: | Hydra |
spock is a framework that helps manage complex parameter configurations during research and development of Python applications. spock lets you focus on the code you need to write instead of re-implementing boilerplate code like creating ArgParsers, reading configuration files, implementing traceability etc. In short, spock configurations are defined by simple and familiar class-based structures. This allows spock to support inheritance, read from multiple markdown formats, and allow hierarchical configuration by composition.
Features:
Works well with: | Kubernetes Grafana Nvidia Triton Intel OpenVINO MLFLow ClearML |
Syndicai is a cloud platform that deploys, manages, and scales any trained AI model in minutes with no configuration & infrastructure setup.
Competes with: | Triton Inference Server |
TorchServe is a flexible and easy to use tool for serving PyTorch models.
Valohai is an MLOps platform that handles machine orchestration, automatic reproducibility and deployment.
The WhyLabs Observability Platform enables any AI practitioner to set up AI monitoring in three easy steps. It follows the standard DevOps model of installing a lightweight logging agent (whylogs) alongside your model and sending data profiles to a fully self-service SaaS platform (WhyLabs). On the platform, you can analyze your profiles to see how your model is performing and get automatically get alerted on changes. The platform includes:
lakeFS is an open-source data lake management platform that transforms your object storage into a Git-like repository. lakeFS enables you to manage your data lake the way you manage your code. Run parallel pipelines for experimentation and CI/CD for your data.
Features:
With Aporia data scientists and ML engineers can easily build monitoring for their ML models running in production.
Features:
Bodywork deploys machine learning projects developed in Python, to Kubernetes. It helps you:
On demand, or on a schedule. It automates repetitive DevOps tasks and frees machine learning engineers to focus on what they do best - solving data problems with machine learning.
An easy-to-use feature store.
The Bytehub Feature Store is designed to:
It is built on Dask to support large datasets and cluster compute environments.
ClearML is an open source suite of tools that automates preparing, executing, and analyzing machine learning experiments.
Features:
Cortex makes it simple to deploy machine learning models in production.
Deploy
Manage
Scale
DVC is an open-source tool for data science and machine learning projects.
Key features:
DVC aims to replace spreadsheet and document sharing tools (such as Excel or Google Docs) which are being used frequently as both knowledge repositories and team ledgers. DVC also replaces both ad-hoc scripts to track, move, and deploy different model versions; as well as ad-hoc data file suffixes and prefixes.
Determined is an open-source deep learning training platform that makes building models fast and easy.
Determined integrates these features into an easy-to-use, high-performance deep learning environment — which means you can spend your time building models instead of managing infrastructure.
Evidently helps analyze machine learning models during validation or production monitoring. It generates interactive reports from pandas DataFramesor csv files.
Features:
Continuously monitor, explain, and analyze AI systems at scale. With actionable insights build trustworthy, fair, and responsible AI monitoring.
Competes with: | Tecton Feast |
The Hopsworks Feature Store manages your features for training and serving models.
The Iguazio Data Science Platform accelerates and scales development, deployment and management of your AI applications with MLOps and end-to-end automation of machine learning pipelines. The platform includes an online and offline feature store, fully integrated with automated model monitoring and drift detection, model serving and dynamic scaling capabilities, all packaged in an open and managed platform.
Competes with: | Seldon Core BentoML to some extent |
KFServing enables serverless inferencing on Kubernetes to solve production model serving use cases.
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
Kubeflow's goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures.
Anywhere you are running Kubernetes, you should be able to run Kubeflow.
MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models.
It offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g your notebook)
Features:
Neptune is a lightweight experiment logging/tracking tool that helps you with your machine learning experiments.
Features:
Competes with: | Triton Inference Server Tensorflow Serving TorchServe |
OpenVINO™ Model Server (OVMS) is a scalable, high-performance solution for serving machine learning models optimized for Intel® architectures.
Pachyderm is a tool for version-controlled, automated, end-to-end data pipelines for data science.
Features:
PrimeHub, an open-source pluggable MLOps platform on the top of Kubernetes for teams of data scientists and administrators. PrimeHub equips enterprises with consistent yet flexible tools to develop, train, and deploy ML models at scale. By improving the iterative process of data science, data teams can collaborate closely and innovate fast.
A command-line utility to train and deploy Machine Learning and Deep Learning models on AWS SageMaker in a few simple steps.
Key features:
Works well with: | Jupyter Notebooks Kubernetes Grafana Weights and Biases Arize |
Spell is an end-to-end deep learning platform that automates complex ML infrastructure and operational work required to train and deploy AI models. Spell is fully hybrid-cloud, and can deploy easily into any cloud or on-prem hardware.
Competes with: | HuggingFace Accelerate PyTorch Lightning (Accelerate) |
stoke is a lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices (e.g. CPU, GPU), distributed modes, mixed-precision, and PyTorch extensions. This allows you to switch from local full-precision CPU to mixed-precision distributed multi-GPU with extensions (like optimizer state sharding) by simply changing a few declarative flags. Additionally, stoke exposes configuration settings for every underlying backend for those that want configurability and raw access to the underlying libraries. In short, stoke is the best of PyTorch Lightning Accelerators disconnected from the rest of PyTorch Lightning. Write whatever PyTorch code you want, but leave device and backend context switching to stoke.
Supports:
Competes with: | Triton Inference Server |
TensorFlow Serving is a flexible, high-performance serving system for TF models, designed for production environments.
Works well with: | Azure ML Google CAIP |
Competes with: | Tensorflow Serving TorchServe |
Triton Inference Server simplifies the deployment of AI models at scale in production.
Track and visualize all the pieces of your machine learning pipeline, from datasets to production models.
Works well with: | Kubeflow Airflow AWS/GCP/Azure MLFlow Tensorflow Pytorch |
ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows.
Features:
Why should I use ZenML: