Virtualization Technology News and Information
Latest Releases of Open Source Tools from Extend Traditional Software Tools for Machine Learning Engineers announced the latest releases of Data Version Control (DVC) and Continuous Machine Learning (CML) open source projects. DVC and CML remove the need for proprietary AI Platforms (such as AWS SageMaker and Microsoft Azure ML Engineer) by extending traditional software tools like Git and CI/CD to meet the needs of ML Engineers.

ML engineers, who work with unstructured data, need GitHub for collaboration and CI/CD systems to resolve issues between each other, between the team and production system. With a lack of adequate tools for versioning data and models to meet the needs of the ML Engineers, has built open source tools, DVC and CML, on top GitHub, GitLab and BitBucket to fill this gap.

"AI Platforms are siloed and require everything to go into their own systems creating vendor lock-in," said Dmitry Petrov, CEO and founder of " allows users to stay within their application development space and effectively extend the familiar dev environments with tools to support Machine Learning Engineers and Data Scientists."

DVC brings agility, reproducibility, and collaboration into the existing data science workflow. DVC provides users with a Git-like interface for versioning data and models, bringing version control to machine learning and solving the challenges of reproducibility. DVC is built on top of git, allowing users to create lightweight metafiles and enabling the system to handle large files, rather than storing them in Git. It works with remote storage for large files in the cloud or on-premise network storage.

CML is an open-source library for implementing continuous integration and delivery (CI/CD) in machine learning projects. Users can automate parts of their development workflow, including model training and evaluation, comparing ML experiments across their project history, and monitoring changing datasets. CML will also auto-generate reports with metrics and plots in each Git pull request.

Together, CML and DVC provide ML Engineers a number of features and benefits that support data provenance, machine learning model management and automation including:

  • GitFlow for data science - Use GitLab or GitHub to manage ML experiments, ML models and modified data tracking.
  • Repository & knowledge library - Maintain a code repository with data files, ML model files, and model metrics. Keep track of ML experiments to share knowledge about successful ideas as well as failures. No Git repo required.
  • Collaboration - Collaborate on ML experiments with pipeline and workflow visualization. Data scientists, ML engineers, DevOps teams work concurrently instead of waiting for handoffs.
  • Reporting - Auto-generate ML experiment reports with metrics and plots in each Git Pull Request.

DVC and CML are available today via GitHub and GitLab.

Published Wednesday, March 03, 2021 3:14 PM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<March 2021>