Summary

Open source models are powerful tools for data scientists, but they also come with risks. If your team downloads models from sources like Hugging Face without security checks, you could introduce security threats into your organization. You can eliminate this risk by introducing a process that scans models for vulnerabilities before they enter your organization and are utilized by data scientists. You can ensure that only safe models are used by leveraging HiddenLayer’s Model Scanner combined with your CI/CD platform. In this blog, we’ll walk you through how to set up a system where data scientists request models, security checks run automatically, and approved models are stored in a safe location like cloud storage, a model registry, or Databricks Unity Catalog.

Introduction

Data Scientists download open source AI models from open repositories like Hugging Face or Kaggle every day. As of today security scans are rudimentary and are limited to specific model types and as a result, proper security checks are not taking place. If the model contains malicious code, it could expose sensitive company data, cause system failures, or create security vulnerabilities.

Organizations need a way to ensure that the models they use are safe before deploying them. However, blocking access to open source models isn’t the answer—after all, these models provide huge benefits. Instead, companies should establish a secure process that allows data scientists to use open source models while protecting the organization from hidden threats.

In this blog, we’ll explore how you can implement a secure model approval workflow using HiddenLayer’s Model Scanner and GitHub Actions. This approach enables data scientists to request models through a simple GitHub form, have them automatically scanned for threats, and—if they pass—store them in a trusted location.

The Risk of Downloading Open Source Models

Downloading models directly from public repositories like Hugging Face might seem harmless, but it can introduce serious security risks:

  • Malicious Code Injection: Some models may contain hidden backdoors or harmful scripts that execute when loaded.
  • Unauthorized Data Access: A compromised model could expose your company’s sensitive data or leak information.
  • System Instability: Poorly built or tampered models might crash systems, leading to downtime and productivity loss.
  • Compliance Violations: Using unverified models could put your company at risk of breaking security and privacy regulations.

To prevent these issues, organizations need a structured way to approve and distribute open source models safely.

A Secure Process for Open Source Models

The key to safely using open source models is implementing a secure workflow. Here’s how you can do it:

  1. Model Request Form in GitHub

Instead of allowing direct downloads, require data scientists to request models through a GitHub form. This ensures that every model is reviewed before use.

This can be mandated by globally blocking API access to HuggingFace.

  1. Automated Security Scan with HiddenLayer Model Scanner

Once a request is submitted, a CI/CD pipeline (using GitHub Actions) automatically scans the model using HiddenLayer’s open source Model Scanner. This tool checks for malicious code, security vulnerabilities, and compliance issues.

  1. Secure Storage for Approved Models

If a model passes the security scan, it is pushed to a trusted location, such as: 

  • Cloud storage (AWS S3, Google Cloud Storage, etc.) 
  • A model registry (MLflow, Databricks Unity Catalog, etc.) 
  • A secure internal repository Now, data scientists can safely access and use only the approved models.

Benefits of This Process

Implementing this structured model approval process offers several advantages:

  • Leverages Existing MLOps & GitOps Infrastructure: The workflow integrates seamlessly with existing CI/CD pipelines and security controls, reducing operational overhead.
  • Single Entry Point for Open Source Models: This system ensures that all open source models entering the organization go through a centralized and tightly controlled approval process.
  • Automated Security Checks: HiddenLayer’s Model Scanner automatically scans every model request, ensuring that no unverified models make their way into production.
  • Compliance and Governance: The process ensures adherence to regulatory requirements by providing a documented trail of all approved and rejected models.
  • Improved Collaboration: Data scientists can access secure, organization-approved models without delays while security teams maintain full visibility and control.

Implementing the Secure Model Workflow

Here’s a step-by-step process of how you can set up this workflow:

  1. Create a GitHub Form: Data scientists submit requests for open source models through this form.
  2. Trigger a CI/CD Pipeline: The form submission kicks off an automated workflow using GitHub Actions.
  3. Scan the Model with HiddenLayer: The HiddenLayer Model Scanner runs security checks on the requested model.
  4. Store or Reject the Model:
  • If safe, the model is pushed to a secure storage location.
  • If unsafe, the request is flagged for review and triage.
  1. Access Approved Models: Data scientists can retrieve and use models from a secure storage location.

Figure 1 – Secure Model Workflow

Conclusion

Open source models have moved the needle for AI development, but they come with risks that organizations can’t ignore. By implementing a single point of access into your organization for models that are scanned by HiddenLayer, you can allow data scientists to use these models safely. This process ensures that only verified, threat-free models make their way into your systems, protecting your organization from potential harm.

By taking this proactive approach, you create a balance between innovation and security, allowing your Data Scientists to work with open source models, while keeping your organization safe.