Skip to content

Latest commit

 

History

History
324 lines (219 loc) · 19.9 KB

deployguide_gha.md

File metadata and controls

324 lines (219 loc) · 19.9 KB

Deployment Guide using Github Repositories Workflows

Technical requirements

  • Github as the source control repository
  • Github Actions as the DevOps orchestration tool
  • GitHub client
  • Azure CLI
  • The Terraform extension for Azure DevOps if you are using Azure DevOps + Terraform to spin up infrastructure
  • One or more Azure subscription(s) based on whether you are deploying Prod only or Prod and Dev environments
    • Important - As mentioned in the Prerequisites at the beginning here, if you plan to use either a Free/Trial or similar learning purpose subscriptions, they might pose 'Usage + quotas' limitations in the default Azure region being used for deployment. Please read provided instructions carefully to succeessfully execute this deployment.
  • Azure service principals to access / create Azure resources from Azure DevOps or Github Actions (or the ability to create them)
  • Git bash, WSL or another shell script runner on your local machine
  • When using WSL,
    • make sure to completely work in the context of the unix env (cloning of the repo, defining the file paths,...). You can then connect to this environment with VSCode (if that is your editor) if you install the "Remote - SSH" extension
    • dos2unix: sudo apt-get install dos2unix
    • set up GitHub cli (mentioned above) (or via sudo apt-get install gh)
    • Login to GitHub: gh auth login
    • Config Git locally: git config --global user.email "you@example.com" and git config --global user.name "Your Name"

Note:

Git version 2.27 or newer is required. See these instructions to upgrade.

Configure The GitHub Environment


  1. Replicate MLOps-V2 Template Repositories in your GitHub organization
    Go to https://github.com/Azure/mlops-templates/fork to fork the mlops templates repo into your Github org. This repo has reusable mlops code that can be used across multiple projects.

    image

    Go to https://github.com/Azure/mlops-project-template/generate to create a repository in your Github org using the mlops-project-template. This is the monorepo that you will use to pull example projects from in a later step.

    image

  2. Clone the mlops-v2 repository to local system
    On your local machine, select or create a root directory (ex: 'mlprojects') to hold your project repository as well as the mlops-v2 repository. Change to this directory.

    Clone the mlops-v2 repository to this directory. This provides the documentation and the sparse_checkout.sh script. This repository and folder will be used to bootstrap your projects:
    # git clone https://github.com/Azure/mlops-v2.git

  3. Configure and run sparse checkout
    From your local project root directory, open the /mlops-v2/sparse_checkout.sh for editing. Edit the following variables as needed to select the infastructure management tool used by your organization, the type of Open this file in an editor and set the following variables:

    Note:

When running the script through a "vanilla" WSL, then you'll most likely get strange errors... In that case it might suffice to use dos2unix on the file (in WSL) run; dos2unix sparse_checkout.sh (in the mlops-v2 repo folder)

  • infrastructure_version selects the tool that will be used to deploy cloud resources.
  • project_type selects the AI workload type for your project (classical ml, computer vision, or nlp)
  • mlops_version selects your preferred interaction approach with Azure Machine Learning
  • git_folder_location points to the root project directory to which you cloned mlops-v2 in step 3
  • project_name is the name (case sensitive) of your project. A GitHub repository will be created with this name
  • github_org_name is your GitHub organization (or GitHub username)
  • project_template_github_url is the URL to the original or your generated clone of the mlops_project_template repository from step 1
  • orchestration specifies the CI/CD orchestration to use

    A sparse_checkout.sh example is below:
   #options: terraform / bicep
   infrastructure_version=terraform

   #options: classical / cv / nlp
   project_type=classical
   
   #options: python-sdk / aml-cli-v2
   mlops_version=aml-cli-v2   
   
   #replace with the local root folder location where you want
   git_folder_location='/home/<username>/mlprojects'    
   
   #replace with your project name
   project_name=taxi-fare-regression   
   
   #replace with your github org name
   github_org_name=<orgname>
   
   #replace with the url for the project template for your organization created in step 2.2
   project_template_github_url=https://github.com/azure/mlops-project-template   
   
   #options: github-actions / azure-devops
   orchestration=github-actions 

Currently, the following pipelines are supported:

  • classical
  • cv (computer-vision)
  • nlp (natural language processing)
  1. Run sparse checkout
    The sparse_checkout.sh script will use ssh to authenticate to your GitHub organization. If this is not yet configured in your environment, follow the steps below or refer to the documentation at GitHub Key Setup.

    GitHub Key Setup

    On your local machine, create a new ssh key:
    # ssh-keygen -t ed25519 -C "<your_email@example.com>"
    You may press enter to all three prompts to create a new key in /home/<username>/.ssh/id_ed25519

    Add your SSH key to your SSH agent:
    # eval "$(ssh-agent -s)"
    # ssh-add ~/.ssh/id_ed25519

    Get your public key to add to GitHub:
    # cat ~/.ssh/id_ed25519.pub
    It will be a string of the format 'ssh-ed25519 ... your_email@example.com'. Copy this string.

    Add your SSH key to Github. Under your account menu, select "Settings", then "SSH and GPG Keys". Select "New SSH key" and enter a title. Paste your public key into the key box and click "Add SSH key"

    From your root project directory (ex: mlprojects/), execute the sparse_checkout.sh script:

    # bash mlops-v2/sparse_checkout.sh

    This will run the script, using git sparse checkout to build a local copy of your project repository based on your choices configured in the script. It will then create the GitHub repository and push the project code to it.

    Monitor the script execution for any errors. If there are errors, you can safely remove the local copy of the repository (ex: taxi_fare_regression/) as well as delete the GitHub project repository. After addressing the errors, run the script again.

    After the script runs successfully, the GitHub project will be initialized with your project files.

  2. Configure GitHub Actions Secrets

    This step creates a service principal and GitHub secrets to allow the GitHub action workflows to create and interact with Azure Machine Learning Workspace resources.

    From the command line, execute the following Azure CLI command with your choice of a service principal name:

    # az ad sp create-for-rbac --name <service_principal_name> --role contributor --scopes /subscriptions/<subscription_id> --sdk-auth

    You will get output similar to below:

    {
    "clientId": "<service principal client id>",
    "clientSecret": "<service principal client secret>",
    "subscriptionId": "<Azure subscription id>",
    "tenantId": "<Azure tenant id>",
    "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
    "resourceManagerEndpointUrl": "https://management.azure.com/",
    "activeDirectoryGraphResourceId": "https://graph.windows.net/",
    "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
    "galleryEndpointUrl": "https://gallery.azure.com/",
    "managementEndpointUrl": "https://management.core.windows.net/"
    }

    Copy all of this output, braces included.

    From your GitHub project, select Settings:

    GitHub Settings

    Then select Secrets, then Actions:

    GitHub Secrets

    Select New repository secret. Name this secret AZURE_CREDENTIALS and paste the service principal output as the content of the secret. Select Add secret.

    Note:
    If deploying the infrastructure using terraform, add the following additional GitHub secrets using the corresponding values from the service principal output as the content of the secret:

    ARM_CLIENT_ID
    ARM_CLIENT_SECRET
    ARM_SUBSCRIPTION_ID
    ARM_TENANT_ID

    The GitHub configuration is complete.

Deploy Machine Learning Project Infrastructure Using GitHub Actions

  1. Configure Azure ML Environment Parameters

    In your Github project repository (ex: taxi-fare-regression), there are two configuration files in the root, config-infra-dev.yml and config-infra-prod.yml. These files are used to define and deploy Dev and Prod Azure Machine Learning environments. With the default deployment, config-infra-prod.yml will be used when working with the main branch or your project and config-infra-dev.yml will be used when working with any non-main branch.

    It is recommended to first create a dev branch from main and deploy this environment first.

Important:

Note that config-infra-prod.yml and config-infra-dev.yml files use default region as eastus to deploy resource group and Azure ML Workspace. If you are using Free/Trial or similar learning purpose subscriptions, you must do one of the below -

  1. If you decide to use eastus region, ensure that your subscription(s) have a quota/limit of up to 20 vCPUs for Standard DSv2 Family vCPUs. Visit Subscription page in Azure Portal as show below to validate this. alt text
  2. If not, you should change it to a region where Standard DSv2 Family vCPUs has a quota/limit of up to 20 vCPUs.
  3. You may also choose to change the region and compute type being used for deployment. To do this you have to change region in these two files, and additionally search for STANDARD_DS3_V2 in below listed DevOps pipeline files and change this with a compute type that would work for your setup.
    • mlops-templates/aml-cli-v2/mlops/devops-pipelines/deploy-model-training-pipeline.yml
    • mlops-project-template/classical/aml-cli-v2/mlops/devops-pipelines/deploy-batch-endpoint-pipeline.yml
    • /mlops-project-template/classical/aml-cli-v2/mlops/azureml/deploy/online/online-deployment.yml
  4. Note in the path above that you need to navigate to the right repository (e.g. mlops-templates), and the right ML interface (e.g. aml-cli-v2).

Edit each file to configure a namespace, postfix string, Azure location, and environment for deploying your Dev and Prod Azure ML environments. Default values and settings in the files are show below:

namespace: mlopsv2 #Note: A namespace with many characters will cause storage account creation to fail due to storage account names having a limit of 24 characters.  
postfix: 0001  
location: eastus  
environment: dev  
enable_aml_computecluster: true  
enable_monitoring: false  

The first four values are used to create globally unique names for your Azure environment and contained resources. Edit these values to your liking then save, commit, push, or pr to update these files in the project repository.

If you are running a Deep Learning workload such as CV or NLP, ensure your subscription and Azure location has available GPU compute.

Note:

The enable_monitoring flag in these files defaults to False. Enabling this flag will add additional elements to the deployment to support Azure ML monitoring based on https://github.com/microsoft/AzureML-Observability. This will include an ADX cluster and increase the deployment time and cost of the MLOps solution.

  1. Deploy Azure Machine Learning Infrastructure

    In your GitHub project repository (ex: taxi-fare-regression), select Actions

    GH-actions

    This will display the pre-defined GitHub workflows associated with your project. For a classical machine learning project, the available workflows will look similar to this:

    GH-workflows

    Depending on the the use case, available workflows may vary. Select the workflow to 'deploy-infra'. In this scenario, the workflow to select would be tf-gha-deploy-infra.yml. This would deploy the Azure ML infrastructure using GitHub Actions and Terraform.

    GH-deploy-infra

    On the right side of the page, select Run workflow and select the branch to run the workflow on. This may deploy Dev Infrastructure if you've created a dev branch or Prod infrastructure if deploying from main. Monitor the pipline for successful completion.

    GH-infra-pipeline

    When the pipline has complete successfully, you can find your Azure ML Workspace and associated resources by logging in to the Azure Portal.

    Next, a model training and scoring pipelines will be deployed into the new Azure Machine Learning environment.

Sample Training and Deployment Scenario

The solution accelerator includes code and data for a sample end-to-end machine learning pipeline which runs a linear regression to predict taxi fares in NYC. The pipeline is made up of components, each serving different functions, which can be registered with the workspace, versioned, and reused with various inputs and outputs. Sample pipelines and workflows for the Computer Vision and NLP scenarios will have different steps and deployment steps.

This training pipeline contains the following steps:

Prepare Data
This component takes multiple taxi datasets (yellow and green) and merges/filters the data, and prepare the train/val and evaluation datasets.
Input: Local data under ./data/ (multiple .csv files)
Output: Single prepared dataset (.csv) and train/val/test datasets.

Train Model
This component trains a Linear Regressor with the training set.
Input: Training dataset
Output: Trained model (pickle format)

Evaluate Model
This component uses the trained model to predict taxi fares on the test set.
Input: ML model and Test dataset
Output: Performance of model and a deploy flag whether to deploy or not.
This component compares the performance of the model with all previous deployed models on the new test dataset and decides whether to promote or not model into production. Promoting model into production happens by registering the model in AML workspace.

Register Model
This component scores the model based on how accurate the predictions are in the test set.
Input: Trained model and the deploy flag.
Output: Registered model in Azure Machine Learning.

Deploying the Model Training Pipeline to the Test Environment

Next, you will deploy the model training pipeline to your new Azure Machine Learning workspace. This pipeline will create a compute cluster instance, register a training environment defining the necessary Docker image and python packages, register a training dataset, then start the training pipeline described in the last section. When the job is complete, the trained model will be registered in the Azure ML workspace and be available for deployment.

In your GitHub project repository (ex: taxi-fare-regression), select Actions

GH-actions

Select the deploy-model-training-pipeline from the workflows listed on the left and the click Run Workflow to execute the model training workflow. This will take several minutes to run, depending on the compute size.

Pipeline Run

Once completed, a successful run will register the model in the Azure Machine Learning workspace.

Note: If you want to check the output of each individual step, for example to view output of a failed run, click a job output, and then click each step in the job to view any output of that step.

Training Step

With the trained model registered in the Azure Machine learning workspace, you are ready to deploy the model for scoring.

Deploying the Trained Model in Dev

This scenario includes prebuilt workflows for two approaches to deploying a trained model, batch scoring or a deploying a model to an endpoint for real-time scoring. You may run either or both of these workflows in your dev branch to test the performance of the model in your Dev Azure ML workspace.

In your GitHub project repository (ex: taxi-fare-regression), select Actions

GH-actions

Online Endpoint

Select the deploy-online-endpoint-pipeline from the workflows listed on the left and click Run workflow to execute the online endpoint deployment pipeline workflow. The steps in this pipeline will create an online endpoint in your Azure Machine Learning workspace, create a deployment of your model to this endpoint, then allocate traffic to the endpoint.

gh online endpoint

Once completed, you will find the online endpoint deployed in the Azure ML workspace and available for testing.

aml-taxi-oep

Batch Endpoint

Select the deploy-batch-endpoint-pipeline from the workflows and click Run workflow to execute the batch endpoint deployment pipeline workflow. The steps in this pipeline will create a new AmlCompute cluster on which to execute batch scoring, create the batch endpoint in your Azure Machine Learning workspace, then create a deployment of your model to this endpoint.

gh batch endpoint

Once completed, you will find the batch endpoint deployed in the Azure ML workspace and available for testing.

aml-taxi-bep

Moving to Production

Example scenarios can be trained and deployed both for Dev and Prod branches and environments. When you are satisfied with the performance of the model training pipeline, model, and deployment in Testing, Dev pipelines and models can be replicated and deployed in the Production environment.

The sample training and deployment Azure ML pipelines and GitHub workflows can be used as a starting point to adapt your own modeling code and data.

Next Steps


This finishes the demo according to the architectual pattern: Azure Machine Learning Classical Machine Learning. Next you can dive into your Azure Machine Learning service in the Azure Portal and see the inference results of this example model.

As elements of Azure Machine Learning are still in development, the following components are not part of this demo:

  • Model and pipeline promotion from Dev to Prod
  • Secure Workspaces
  • Model Monitoring for Data/Model Drift
  • Automated Retraining
  • Model and Infrastructure triggers

Interim it is recommended to schedule the deployment pipeline for development for complete model retraining on a timed trigger.

For questions, please submit an issue or reach out to the development team at Microsoft.