Skip to content

Commit

Permalink
chore: bump transformers from 4.33.3 to 4.36.0 in /presets/models/fal…
Browse files Browse the repository at this point in the history
…con (#195)

Bumps [transformers](https://github.com/huggingface/transformers) from
4.33.3 to 4.36.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/transformers/releases">transformers's
releases</a>.</em></p>
<blockquote>
<h2>v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa
wide-spread support</h2>
<h2>New model additions</h2>
<h3>Mixtral</h3>
<p>Mixtral is the new open-source model from Mistral AI announced by the
blogpost <a href="https://mistral.ai/news/mixtral-of-experts/">Mixtral
of Experts</a>. The model has been proven to have comparable
capabilities to Chat-GPT according to the benchmark results shared on
the release blogpost.</p>
<!-- raw HTML omitted -->
<p>The architecture is a sparse Mixture of Experts with Top-2 routing
strategy, similar as <code>NllbMoe</code> architecture in transformers.
You can use it through <code>AutoModelForCausalLM</code> interface:</p>
<pre lang="py"><code>&gt;&gt;&gt; import torch
&gt;&gt;&gt; from transformers import AutoModelForCausalLM,
AutoTokenizer
<p>&gt;&gt;&gt; model =
AutoModelForCausalLM.from_pretrained(&quot;mistralai/Mixtral-8x7B&quot;,
torch_dtype=torch.float16, device_map=&quot;auto&quot;)
&gt;&gt;&gt; tokenizer =
AutoTokenizer.from_pretrained(&quot;mistralai/Mistral-8x7B&quot;)</p>
<p>&gt;&gt;&gt; prompt = &quot;My favourite condiment is&quot;</p>
<p>&gt;&gt;&gt; model_inputs = tokenizer([prompt],
return_tensors=&quot;pt&quot;).to(device)
&gt;&gt;&gt; model.to(device)</p>
<p>&gt;&gt;&gt; generated_ids = model.generate(**model_inputs,
max_new_tokens=100, do_sample=True)
&gt;&gt;&gt; tokenizer.batch_decode(generated_ids)[0]
</code></pre></p>
<p>The model is compatible with existing optimisation tools such Flash
Attention 2, <code>bitsandbytes</code> and PEFT library. The checkpoints
are release under <a
href="https://huggingface.co/mistralai"><code>mistralai</code></a>
organisation on the Hugging Face Hub.</p>
<h3>Llava / BakLlava</h3>
<p>Llava is an open-source chatbot trained by fine-tuning LlamA/Vicuna
on GPT-generated multimodal instruction-following data. It is an
auto-regressive language model, based on the transformer architecture.
In other words, it is an multi-modal version of LLMs fine-tuned for chat
/ instructions.</p>
<!-- raw HTML omitted -->
<p>The Llava model was proposed in <a
href="https://arxiv.org/pdf/2310.03744">Improved Baselines with Visual
Instruction Tuning</a> by Haotian Liu, Chunyuan Li, Yuheng Li and Yong
Jae Lee.</p>
<ul>
<li>[<code>Llava</code>] Add Llava to transformers by <a
href="https://github.com/younesbelkada"><code>@​younesbelkada</code></a>
in <a
href="https://redirect.github.com/huggingface/transformers/issues/27662">#27662</a></li>
<li>[LLaVa] Some improvements by <a
href="https://github.com/NielsRogge"><code>@​NielsRogge</code></a> in <a
href="https://redirect.github.com/huggingface/transformers/issues/27895">#27895</a></li>
</ul>
<p>The integration also includes <a
href="https://github.com/SkunkworksAI/BakLLaVA"><code>BakLlava</code></a>
which is a Llava model trained with Mistral backbone.</p>
<p>The mode is compatible with <code>&quot;image-to-text&quot;</code>
pipeline:</p>
<pre lang="py"><code>from transformers import pipeline
from PIL import Image    
import requests
<p>model_id = &quot;llava-hf/llava-1.5-7b-hf&quot;
&lt;/tr&gt;&lt;/table&gt;
</code></pre></p>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/huggingface/transformers/commit/14666775a296a76c88e1aa686a9547f393d322e2"><code>1466677</code></a>
Release: v4.36.0</li>
<li><a
href="https://github.com/huggingface/transformers/commit/accccdd0087263a1e494e9c9ec30a43043ff3905"><code>accccdd</code></a>
[<code>Add Mixtral</code>] Adds support for the Mixtral MoE (<a
href="https://redirect.github.com/huggingface/transformers/issues/27942">#27942</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/0676d992a5c1f6107a611018494ec952613a4d7f"><code>0676d99</code></a>
[<code>from_pretrained</code>] Make from_pretrained fast again (<a
href="https://redirect.github.com/huggingface/transformers/issues/27709">#27709</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/9f18cc6df0b7e0d50f78b9e9fcb3aafa7b5160fe"><code>9f18cc6</code></a>
Fix SDPA dispatch &amp; make SDPA CI compatible with torch&lt;2.1.1 (<a
href="https://redirect.github.com/huggingface/transformers/issues/27940">#27940</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/7ea21f1f035d683cc39a0c0f4b2605175e1dcfdf"><code>7ea21f1</code></a>
[LLaVa] Some improvements (<a
href="https://redirect.github.com/huggingface/transformers/issues/27895">#27895</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/5e620a92cf7e6c312435db86ec55e13b75dece75"><code>5e620a9</code></a>
Fix <code>SeamlessM4Tv2ModelIntegrationTest</code> (<a
href="https://redirect.github.com/huggingface/transformers/issues/27911">#27911</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/e96c1de1913c307fddcb3e5881388a6dbb5b00b1"><code>e96c1de</code></a>
Skip <code>UnivNetModelTest::test_multi_gpu_data_parallel_forward</code>
(<a
href="https://redirect.github.com/huggingface/transformers/issues/27912">#27912</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/8d8970efdd0e21b54f1c82dec21e8a5eeba609a1"><code>8d8970e</code></a>
[BEiT] Fix test (<a
href="https://redirect.github.com/huggingface/transformers/issues/27934">#27934</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/235be08569000a5361354f766972e653212bf0d3"><code>235be08</code></a>
[DETA] fix backbone freeze/unfreeze function (<a
href="https://redirect.github.com/huggingface/transformers/issues/27843">#27843</a>)</li>
<li><a
href="https://github.com/huggingface/transformers/commit/df5c5c62ae253055336f5bb0828ca8e3e15ab6bd"><code>df5c5c6</code></a>
Fix typo (<a
href="https://redirect.github.com/huggingface/transformers/issues/27918">#27918</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/huggingface/transformers/compare/v4.33.3...v4.36.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.33.3&new-version=4.36.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts page](https://github.com/Azure/kaito/network/alerts).

</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ishaan Sehgal <ishaanforthewin@gmail.com>
Co-authored-by: Heba <31887807+helayoty@users.noreply.github.com>
  • Loading branch information
3 people authored Jan 9, 2024
1 parent b9abdb4 commit 3d1754f
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 83 deletions.
80 changes: 18 additions & 62 deletions .github/workflows/e2e-preset-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,78 +12,34 @@ on:
required: true
env:
GO_VERSION: "1.20"
VERSION: 0.0.1

permissions:
id-token: write
contents: read

jobs:
setup:
if: false
# if: github.event_name == 'workflow_dispatch' || github.event.workflow_run.conclusion == 'success'
if: github.event_name == 'workflow_dispatch' || github.event.workflow_run.conclusion == 'success'
runs-on: self-hosted
outputs:
IMG_TAG: ${{ steps.set_final_tag.outputs.IMG_TAG }}
steps:
- name: Determine tag from dispatch
if: github.event_name == 'workflow_dispatch'
id: determine_tag
run: echo "IMG_TAG=${{ github.event.inputs.image_tag }}" >> $GITHUB_OUTPUT

- name: Download tag artifact
if: github.event_name == 'workflow_run'
uses: actions/github-script@v7
with:
github-token: ${{ secrets.KAITO_ACCESS_TOKEN_READ }}
script: |
let allArtifacts = await github.rest.actions.listWorkflowRunArtifacts({
owner: context.repo.owner,
repo: context.repo.repo,
run_id: context.payload.workflow_run.id,
});
let matchArtifact = allArtifacts.data.artifacts.filter((artifact) => {
return artifact.name == "artifacts"
})[0];
let download = await github.rest.actions.downloadArtifact({
owner: context.repo.owner,
repo: context.repo.repo,
artifact_id: matchArtifact.id,
archive_format: 'zip',
});
let fs = require('fs');
fs.writeFileSync(`/tmp/artifacts.zip`, Buffer.from(download.data));
- name: Unzip tag artifact
if: github.event_name == 'workflow_run'
run: |
mkdir -p /tmp/artifacts
unzip -o /tmp/artifacts.zip -d /tmp/artifacts
shell: bash
- name: Display downloaded aritifacts
if: github.event_name == 'workflow_run'
run: |
echo "Downloaded artifacts:"
ls -ablh /tmp/artifacts
shell: bash
- name: Parse artifacts and assign GA environment variables
if: github.event_name == 'workflow_run'
id: get_image_tag
run: |
tag=$(tail -n 1 /tmp/artifacts/tag.txt)
echo "IMG_TAG=$tag" >> $GITHUB_OUTPUT
- name: Set final image tag
id: set_final_tag
image_tag: ${{ steps.set_tag.outputs.image_tag }}
steps:
- name: Set Image Tag
id: set_tag
run: |
if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
echo "IMG_TAG=${{ steps.determine_tag.outputs.IMG_TAG }}" >> $GITHUB_OUTPUT
else
echo "IMG_TAG=${{ steps.get_image_tag.outputs.IMG_TAG }}" >> $GITHUB_OUTPUT
fi
if [[ "${{ github.event_name }}" == "workflow_dispatch" && -n "${{ github.event.inputs.image_tag }}" ]]; then
echo "Using workflow dispatch to set image tag"
echo "image_tag=${{ github.event.inputs.image_tag }}" >> $GITHUB_OUTPUT
else
echo "Setting image tag based on version set"
echo "image_tag=${{ env.VERSION }}" >> $GITHUB_OUTPUT
fi
e2e-preset-tests:
if: github.event_name == 'workflow_dispatch' || github.event.workflow_run.conclusion == 'success'
needs: setup
runs-on: self-hosted
runs-on: [self-hosted, 'username:runner-3']
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -151,7 +107,7 @@ jobs:
id: get_acr_name
run: |
# Set the ACR based on the tag value
if [[ "${{ needs.setup.outputs.IMG_TAG }}" == "latest" ]]; then
if [[ "${{ needs.setup.outputs.image_tag }}" == "latest" ]]; then
echo "ACR_NAME=aimodelsregistry" >> $GITHUB_OUTPUT
else
echo "ACR_NAME=aimodelsregistrytest" >> $GITHUB_OUTPUT
Expand All @@ -169,7 +125,7 @@ jobs:
- name: 'Az CLI login'
uses: azure/login@v1.5.1
with:
client-id: ${{ secrets.AZURE_KDM_PRESET_SELF_RUNNER_CLIENT_ID }}
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
allow-no-subscriptions: true

Expand All @@ -181,7 +137,7 @@ jobs:
run: |
ACR_NAME=${{ steps.get_acr_name.outputs.ACR_NAME }}
IMAGE_NAME=${{ matrix.image.name }}
TAG=${{ needs.setup.outputs.IMG_TAG }}
TAG=${{ needs.setup.outputs.image_tag }}
TAGS=$(az acr repository show-tags -n $ACR_NAME --repository $IMAGE_NAME --output tsv)
Expand Down Expand Up @@ -266,7 +222,7 @@ jobs:
if: steps.check_image.outputs.IMAGE_EXISTS == 'true'
run: |
sed -i "s/MASTER_ADDR_HERE/${{ steps.get_ip.outputs.SERVICE_IP }}/g" presets/test/manifests/${{ matrix.image.name }}/${{ matrix.image.name }}-statefulset.yaml
sed -i "s/TAG_HERE/${{ needs.setup.outputs.IMG_TAG }}/g" presets/test/manifests/${{ matrix.image.name }}/${{ matrix.image.name }}-statefulset.yaml
sed -i "s/TAG_HERE/${{ needs.setup.outputs.image_tag }}/g" presets/test/manifests/${{ matrix.image.name }}/${{ matrix.image.name }}-statefulset.yaml
sed -i "s/REPO_HERE/${{ steps.get_acr_name.outputs.ACR_NAME }}/g" presets/test/manifests/${{ matrix.image.name }}/${{ matrix.image.name }}-statefulset.yaml
kubectl apply -f presets/test/manifests/${{ matrix.image.name }}/${{ matrix.image.name }}-statefulset.yaml
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/kind-cluster/docker-job-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ kind: Job
metadata:
name: docker-build-job-{{JOB_ID}}
spec:
ttlSecondsAfterFinished: 600 # Job and its pods are deleted 10 min after job completion
ttlSecondsAfterFinished: 10800 # Job and its pods are deleted 3 hr after job completion
backoffLimit: 3 # Number of retries before marking the job as failed
template:
spec:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/kind-cluster/kind.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ nodes:
extraMounts:
- hostPath: /home
containerPath: /home
- hostPath: /datadrive
containerPath: /datadrive
6 changes: 5 additions & 1 deletion .github/workflows/kind-cluster/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ def main():
job_name = f"{model}-{unique_id}"
job_yaml = populate_job_template(model, img_tag, job_name, os.environ)
write_job_file(job_yaml, job_name)

output = run_command(f"ls {get_weights_path(model)}")
print("Model Weights:", output)

run_command(f"kubectl apply -f {job_name}-job.yaml")
job_names.append(job_name)

Expand Down Expand Up @@ -150,7 +154,7 @@ def check_job_status(job_name):
else:
return "running"

def wait_for_jobs_to_complete(job_names, timeout=10800):
def wait_for_jobs_to_complete(job_names, timeout=21600):
"""Wait for all jobs to complete with a timeout."""
start_time = time.time()
while time.time() - start_time < timeout:
Expand Down
26 changes: 8 additions & 18 deletions presets/models/falcon/inference-api.py
Original file line number Diff line number Diff line change
@@ -1,46 +1,37 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
# System
import os
import argparse

# API
import os
from typing import List, Optional
from pydantic import BaseModel
from fastapi import FastAPI, HTTPException
import uvicorn

# ML
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
# import torch.distributed as dist
import transformers
import uvicorn
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import AutoModelForCausalLM, AutoTokenizer

parser = argparse.ArgumentParser(description='Falcon Model Configuration')
parser.add_argument('--load_in_8bit', default=False, action='store_true', help='Load model in 8-bit mode')
parser.add_argument('--disable_trust_remote_code', default=False, action='store_true', help='Disable trusting remote code when loading the model')
# parser.add_argument('--model_id', required=True, type=str, help='The Falcon ID for the pre-trained model')
args = parser.parse_args()

app = FastAPI()

tokenizer = AutoTokenizer.from_pretrained("/workspace/tfs/weights")
tokenizer = AutoTokenizer.from_pretrained("/workspace/tfs/weights", local_files_only=True)
model = AutoModelForCausalLM.from_pretrained(
"/workspace/tfs/weights", # args.model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=not args.disable_trust_remote_code, # Use NOT since our flag disables the trust
load_in_8bit=args.load_in_8bit,
# offload_folder="offload",
# offload_state_dict = True
local_files_only=True
)

pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)

Expand Down Expand Up @@ -85,7 +76,6 @@ class GenerationParams(BaseModel):
forced_eos_token_id: Optional[int] = None
remove_invalid_values: Optional[bool] = None


@app.post("/chat")
def generate_text(params: GenerationParams):
sequences = pipeline(
Expand Down
2 changes: 1 addition & 1 deletion presets/models/falcon/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Dependencies for TFS
transformers==4.33.3
transformers==4.36.0
# torch==2.1.0a0+4136153 Already included in base image
accelerate==0.23.0
fastapi==0.103.2
Expand Down

0 comments on commit 3d1754f

Please sign in to comment.