Virtualization Technology News and Information
Major vulns found in the open source AI/ML training repository Hugging Face conversion model

HiddenLayer researchers show that hundreds of thousands of downloads could be corrupted or attacked

Hugging Face - often referred to as the GitHub of machine learning - is an ML and data science community, platform and repository for developers, offering open source resources for building, deploying and training machine learning models.

Its Transformers Python library and deployment tools enable users to download and train ML models with far less complexity, and hosts ML models that can be integrated in workflows.

Hugging Face lets users share models, research and resources, to accelerate model training. It meaningfully contributes to reducing AI's resource consumption and environmental impact.

Researchers with HiddenLayer, a provider of security for artificial intelligence (AI) models and assets, have published new research - "Silent Sabotage: Hijacking Safetensors Conversion on Hugging Face" showing that Hugging Face's widely-used SFconvertbot, designed to convert insecure machine learning model formats to the more secure Safetensors format, has inadvertently become a vector for potential security breaches.

Malicious actors can exploit the Safetensors conversion process to submit pull requests containing malicious code or backdoored models to any company or individual with a public repository on the platform.

Their research also finds that any user who enters their user token to convert a private repository is liable to have had their token stolen and, consequently, their private model repositories and datasets accessed.

Unlike conventional code review processes, identifying and mitigating these malicious changes is exceptionally challenging and time-consuming for affected companies.

Chris "Tito" Sestito, Co-Founder and CEO of HiddenLayer, said: "The compromise of the conversion service has the potential to rapidly affect the millions of users who rely on these models to kick-start their AI projects, creating a full supply chain issue. Users of the Hugging Face platform place trust not only in the models hosted there but also in the reputable companies behind them, such as Google and Microsoft, making them all the more susceptible to this type of attack. This vulnerability extends beyond any single company hosting a model."

Out of the top 10 most downloaded models from both Google and Microsoft combined, the models that had accepted the merge from the Safetensors bot had a staggering 16,342,855 downloads in the last month. While this is only a small subset of the 500,000+ models hosted on Hugging Face, they reach an incredible number of users. The bot itself has made over 42,657 pull requests to repositories on the site to date, any of which have the potential to be compromised.

HiddenLayer researchers demonstrated how tokens for the official Safetensors conversion bot to submit pull requests could be stolen, and how, from there, an attacker could take over the service to automatically hijack any model submitted to the service.

The potential consequences for such an attack are huge, as an adversary could implant their own model in its stead, push out malicious models to repositories en-masse, or access private repositories and datasets. Moreover, where a repository has already been converted, a malicious actor could still submit a new pull request, or in cases where a new iteration of a PyTorch binary is uploaded and then converted using a compromised conversion service, repositories with hundreds of thousands of downloads could be affected.

Despite the best intentions to secure machine learning models in the Hugging Face ecosystem, the conversion service has proven to be vulnerable and has had the potential to cause a widespread supply chain attack via the Hugging Face official service. Furthermore, we also showed how an attacker could gain a foothold into the container running the service and compromise any model converted by the service.


Published Friday, February 23, 2024 7:18 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<February 2024>