chore: bump transformers from 4.33.3 to 4.36.0 in /presets/models/fal…

…con (#195) Bumps [transformers](https://github.com/huggingface/transformers) from 4.33.3 to 4.36.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support</h2> <h2>New model additions</h2> <h3>Mixtral</h3> <p>Mixtral is the new open-source model from Mistral AI announced by the blogpost <a href="https://mistral.ai/news/mixtral-of-experts/">Mixtral of Experts</a>. The model has been proven to have comparable capabilities to Chat-GPT according to the benchmark results shared on the release blogpost.</p>  <p>The architecture is a sparse Mixture of Experts with Top-2 routing strategy, similar as <code>NllbMoe</code> architecture in transformers. You can use it through <code>AutoModelForCausalLM</code> interface:</p> <pre lang="py"><code>>>> import torch >>> from transformers import AutoModelForCausalLM, AutoTokenizer <p>>>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B", torch_dtype=torch.float16, device_map="auto") >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-8x7B")</p> <p>>>> prompt = "My favourite condiment is"</p> <p>>>> model_inputs = tokenizer([prompt], return_tensors="pt").to(device) >>> model.to(device)</p> <p>>>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True) >>> tokenizer.batch_decode(generated_ids)[0] </code></pre></p> <p>The model is compatible with existing optimisation tools such Flash Attention 2, <code>bitsandbytes</code> and PEFT library. The checkpoints are release under <a href="https://huggingface.co/mistralai"><code>mistralai</code></a> organisation on the Hugging Face Hub.</p> <h3>Llava / BakLlava</h3> <p>Llava is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. In other words, it is an multi-modal version of LLMs fine-tuned for chat / instructions.</p>  <p>The Llava model was proposed in <a href="https://arxiv.org/pdf/2310.03744">Improved Baselines with Visual Instruction Tuning</a> by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee.</p> <ul> <li>[<code>Llava</code>] Add Llava to transformers by <a href="https://github.com/younesbelkada"><code>@younesbelkada</code></a> in <a href="https://redirect.github.com/huggingface/transformers/issues/27662">#27662</a></li> <li>[LLaVa] Some improvements by <a href="https://github.com/NielsRogge"><code>@NielsRogge</code></a> in <a href="https://redirect.github.com/huggingface/transformers/issues/27895">#27895</a></li> </ul> <p>The integration also includes <a href="https://github.com/SkunkworksAI/BakLLaVA"><code>BakLlava</code></a> which is a Llava model trained with Mistral backbone.</p> <p>The mode is compatible with <code>"image-to-text"</code> pipeline:</p> <pre lang="py"><code>from transformers import pipeline from PIL import Image import requests <p>model_id = "llava-hf/llava-1.5-7b-hf" </tr></table> </code></pre></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/huggingface/transformers/commit/14666775a296a76c88e1aa686a9547f393d322e2"><code>1466677</code></a> Release: v4.36.0</li> <li><a href="https://github.com/huggingface/transformers/commit/accccdd0087263a1e494e9c9ec30a43043ff3905"><code>accccdd</code></a> [<code>Add Mixtral</code>] Adds support for the Mixtral MoE (<a href="https://redirect.github.com/huggingface/transformers/issues/27942">#27942</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/0676d992a5c1f6107a611018494ec952613a4d7f"><code>0676d99</code></a> [<code>from_pretrained</code>] Make from_pretrained fast again (<a href="https://redirect.github.com/huggingface/transformers/issues/27709">#27709</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/9f18cc6df0b7e0d50f78b9e9fcb3aafa7b5160fe"><code>9f18cc6</code></a> Fix SDPA dispatch & make SDPA CI compatible with torch<2.1.1 (<a href="https://redirect.github.com/huggingface/transformers/issues/27940">#27940</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/7ea21f1f035d683cc39a0c0f4b2605175e1dcfdf"><code>7ea21f1</code></a> [LLaVa] Some improvements (<a href="https://redirect.github.com/huggingface/transformers/issues/27895">#27895</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/5e620a92cf7e6c312435db86ec55e13b75dece75"><code>5e620a9</code></a> Fix <code>SeamlessM4Tv2ModelIntegrationTest</code> (<a href="https://redirect.github.com/huggingface/transformers/issues/27911">#27911</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/e96c1de1913c307fddcb3e5881388a6dbb5b00b1"><code>e96c1de</code></a> Skip <code>UnivNetModelTest::test_multi_gpu_data_parallel_forward</code> (<a href="https://redirect.github.com/huggingface/transformers/issues/27912">#27912</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/8d8970efdd0e21b54f1c82dec21e8a5eeba609a1"><code>8d8970e</code></a> [BEiT] Fix test (<a href="https://redirect.github.com/huggingface/transformers/issues/27934">#27934</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/235be08569000a5361354f766972e653212bf0d3"><code>235be08</code></a> [DETA] fix backbone freeze/unfreeze function (<a href="https://redirect.github.com/huggingface/transformers/issues/27843">#27843</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/df5c5c62ae253055336f5bb0828ca8e3e15ab6bd"><code>df5c5c6</code></a> Fix typo (<a href="https://redirect.github.com/huggingface/transformers/issues/27918">#27918</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.33.3...v4.36.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.33.3&new-version=4.36.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Azure/kaito/network/alerts). </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ishaan Sehgal <ishaanforthewin@gmail.com> Co-authored-by: Heba <31887807+helayoty@users.noreply.github.com>
kaito-project · Jan 9, 2024 · 3d1754f · 3d1754f
1 parent b9abdb4
commit 3d1754f
Show file tree

Hide file tree

Showing 6 changed files with 35 additions and 83 deletions.
diff --git a/.github/workflows/e2e-preset-test.yml b/.github/workflows/e2e-preset-test.yml
@@ -12,78 +12,34 @@ on:
                 required: true
 env:
     GO_VERSION: "1.20"
+    VERSION: 0.0.1
 
 permissions:
     id-token: write
     contents: read
 
 jobs:
   setup:
-    if: false
-    # if: github.event_name == 'workflow_dispatch' || github.event.workflow_run.conclusion == 'success'
+    if: github.event_name == 'workflow_dispatch' || github.event.workflow_run.conclusion == 'success'
     runs-on: self-hosted
     outputs: 
-        IMG_TAG: ${{ steps.set_final_tag.outputs.IMG_TAG }}
-    steps: 
-      - name: Determine tag from dispatch
-        if: github.event_name == 'workflow_dispatch'
-        id: determine_tag
-        run: echo "IMG_TAG=${{ github.event.inputs.image_tag }}" >> $GITHUB_OUTPUT
-
-      - name: Download tag artifact
-        if: github.event_name == 'workflow_run'
-        uses: actions/github-script@v7
-        with:
-            github-token: ${{ secrets.KAITO_ACCESS_TOKEN_READ }}
-            script: |
-                let allArtifacts = await github.rest.actions.listWorkflowRunArtifacts({
-                    owner: context.repo.owner,
-                    repo: context.repo.repo,
-                    run_id: context.payload.workflow_run.id,
-                });
-                let matchArtifact = allArtifacts.data.artifacts.filter((artifact) => {
-                    return artifact.name == "artifacts"
-                })[0];
-                let download = await github.rest.actions.downloadArtifact({
-                    owner: context.repo.owner,
-                    repo: context.repo.repo,
-                    artifact_id: matchArtifact.id,
-                    archive_format: 'zip',
-                });
-                let fs = require('fs');
-                fs.writeFileSync(`/tmp/artifacts.zip`, Buffer.from(download.data));
-      - name: Unzip tag artifact
-        if: github.event_name == 'workflow_run'
-        run: |
-          mkdir -p /tmp/artifacts
-          unzip -o /tmp/artifacts.zip -d /tmp/artifacts
-        shell: bash
-      - name: Display downloaded aritifacts
-        if: github.event_name == 'workflow_run'
-        run: |
-          echo "Downloaded artifacts:"
-          ls -ablh /tmp/artifacts
-        shell: bash
-      - name: Parse artifacts and assign GA environment variables
-        if: github.event_name == 'workflow_run'
-        id: get_image_tag
-        run: |
-            tag=$(tail -n 1 /tmp/artifacts/tag.txt)
-            echo "IMG_TAG=$tag" >> $GITHUB_OUTPUT
-
-      - name: Set final image tag
-        id: set_final_tag
+      image_tag: ${{ steps.set_tag.outputs.image_tag }}
+    steps:
+      - name: Set Image Tag
+        id: set_tag
         run: |
-            if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
-                echo "IMG_TAG=${{ steps.determine_tag.outputs.IMG_TAG }}" >> $GITHUB_OUTPUT
-            else
-                echo "IMG_TAG=${{ steps.get_image_tag.outputs.IMG_TAG }}" >> $GITHUB_OUTPUT
-            fi
+          if [[ "${{ github.event_name }}" == "workflow_dispatch" && -n "${{ github.event.inputs.image_tag }}" ]]; then
+            echo "Using workflow dispatch to set image tag"
+            echo "image_tag=${{ github.event.inputs.image_tag }}" >> $GITHUB_OUTPUT
+          else
+            echo "Setting image tag based on version set"
+            echo "image_tag=${{ env.VERSION }}" >> $GITHUB_OUTPUT
+          fi
 
   e2e-preset-tests:
     if: github.event_name == 'workflow_dispatch' || github.event.workflow_run.conclusion == 'success'
     needs: setup
-    runs-on: self-hosted
+    runs-on: [self-hosted, 'username:runner-3']
     strategy:
       fail-fast: false
       matrix:
@@ -151,7 +107,7 @@ jobs:
         id: get_acr_name
         run: |
             # Set the ACR based on the tag value
-            if [[ "${{ needs.setup.outputs.IMG_TAG }}" == "latest" ]]; then
+            if [[ "${{ needs.setup.outputs.image_tag }}" == "latest" ]]; then
               echo "ACR_NAME=aimodelsregistry" >> $GITHUB_OUTPUT
             else
               echo "ACR_NAME=aimodelsregistrytest" >> $GITHUB_OUTPUT
@@ -169,7 +125,7 @@ jobs:
       - name: 'Az CLI login'
         uses: azure/login@v1.5.1
         with:
-            client-id: ${{ secrets.AZURE_KDM_PRESET_SELF_RUNNER_CLIENT_ID }}
+            client-id: ${{ secrets.AZURE_CLIENT_ID }}
             tenant-id: ${{ secrets.AZURE_TENANT_ID }}
             allow-no-subscriptions: true
 
@@ -181,7 +137,7 @@ jobs:
         run: |
             ACR_NAME=${{ steps.get_acr_name.outputs.ACR_NAME }}
             IMAGE_NAME=${{ matrix.image.name }}
-            TAG=${{ needs.setup.outputs.IMG_TAG }}
+            TAG=${{ needs.setup.outputs.image_tag }}
         
             TAGS=$(az acr repository show-tags -n $ACR_NAME --repository $IMAGE_NAME --output tsv)
         
@@ -266,7 +222,7 @@ jobs:
         if: steps.check_image.outputs.IMAGE_EXISTS == 'true'
         run: |
             sed -i "s/MASTER_ADDR_HERE/${{ steps.get_ip.outputs.SERVICE_IP }}/g" presets/test/manifests/${{ matrix.image.name }}/${{ matrix.image.name }}-statefulset.yaml
-            sed -i "s/TAG_HERE/${{ needs.setup.outputs.IMG_TAG }}/g" presets/test/manifests/${{ matrix.image.name }}/${{ matrix.image.name }}-statefulset.yaml
+            sed -i "s/TAG_HERE/${{ needs.setup.outputs.image_tag }}/g" presets/test/manifests/${{ matrix.image.name }}/${{ matrix.image.name }}-statefulset.yaml
             sed -i "s/REPO_HERE/${{ steps.get_acr_name.outputs.ACR_NAME }}/g" presets/test/manifests/${{ matrix.image.name }}/${{ matrix.image.name }}-statefulset.yaml
             kubectl apply -f presets/test/manifests/${{ matrix.image.name }}/${{ matrix.image.name }}-statefulset.yaml
     

diff --git a/.github/workflows/kind-cluster/docker-job-template.yaml b/.github/workflows/kind-cluster/docker-job-template.yaml
@@ -3,7 +3,7 @@ kind: Job
 metadata:
   name: docker-build-job-{{JOB_ID}}
 spec:
-  ttlSecondsAfterFinished: 600  # Job and its pods are deleted 10 min after job completion
+  ttlSecondsAfterFinished: 10800  # Job and its pods are deleted 3 hr after job completion
   backoffLimit: 3  # Number of retries before marking the job as failed
   template:
     spec:

diff --git a/.github/workflows/kind-cluster/kind.yaml b/.github/workflows/kind-cluster/kind.yaml
@@ -5,3 +5,5 @@ nodes:
       extraMounts:
       - hostPath: /home
         containerPath: /home
+      - hostPath: /datadrive
+        containerPath: /datadrive
diff --git a/.github/workflows/kind-cluster/main.py b/.github/workflows/kind-cluster/main.py
@@ -71,6 +71,10 @@ def main():
             job_name = f"{model}-{unique_id}"
             job_yaml = populate_job_template(model, img_tag, job_name, os.environ)
             write_job_file(job_yaml, job_name)
+
+            output = run_command(f"ls {get_weights_path(model)}")
+            print("Model Weights:", output)
+
             run_command(f"kubectl apply -f {job_name}-job.yaml")
             job_names.append(job_name)
 
@@ -150,7 +154,7 @@ def check_job_status(job_name):
     else: 
         return "running"
 
-def wait_for_jobs_to_complete(job_names, timeout=10800):
+def wait_for_jobs_to_complete(job_names, timeout=21600):
     """Wait for all jobs to complete with a timeout."""
     start_time = time.time()
     while time.time() - start_time < timeout:

diff --git a/presets/models/falcon/inference-api.py b/presets/models/falcon/inference-api.py
@@ -1,46 +1,37 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-# System
-import os
 import argparse
-
-# API
+import os
 from typing import List, Optional
-from pydantic import BaseModel
-from fastapi import FastAPI, HTTPException
-import uvicorn
 
-# ML
-from transformers import AutoTokenizer, AutoModelForCausalLM
-import transformers
 import torch
-# import torch.distributed as dist
+import transformers
+import uvicorn
+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
 
 parser = argparse.ArgumentParser(description='Falcon Model Configuration')
 parser.add_argument('--load_in_8bit', default=False, action='store_true', help='Load model in 8-bit mode')
-parser.add_argument('--disable_trust_remote_code', default=False, action='store_true', help='Disable trusting remote code when loading the model')
 # parser.add_argument('--model_id', required=True, type=str, help='The Falcon ID for the pre-trained model')
 args = parser.parse_args()
 
 app = FastAPI()
 
-tokenizer = AutoTokenizer.from_pretrained("/workspace/tfs/weights")
+tokenizer = AutoTokenizer.from_pretrained("/workspace/tfs/weights", local_files_only=True)
 model = AutoModelForCausalLM.from_pretrained(
     "/workspace/tfs/weights", # args.model_id,
     device_map="auto",
     torch_dtype=torch.bfloat16,
-    trust_remote_code=not args.disable_trust_remote_code, # Use NOT since our flag disables the trust
     load_in_8bit=args.load_in_8bit,
-    # offload_folder="offload",
-    # offload_state_dict = True
+    local_files_only=True
 )
 
 pipeline = transformers.pipeline(
     "text-generation",
     model=model,
     tokenizer=tokenizer,
     torch_dtype=torch.bfloat16,
-    trust_remote_code=True,
     device_map="auto",
 )
 
@@ -85,7 +76,6 @@ class GenerationParams(BaseModel):
     forced_eos_token_id: Optional[int] = None
     remove_invalid_values: Optional[bool] = None
 
-
 @app.post("/chat")
 def generate_text(params: GenerationParams):
     sequences = pipeline(

diff --git a/presets/models/falcon/requirements.txt b/presets/models/falcon/requirements.txt
@@ -1,5 +1,5 @@
 # Dependencies for TFS
-transformers==4.33.3
+transformers==4.36.0
 # torch==2.1.0a0+4136153 Already included in base image
 accelerate==0.23.0
 fastapi==0.103.2