vlm

Here are 175 public repositories matching this topic...

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe llama vlm llm llm-serving llava llama2 deepseek-llm deepseek llama3 llama3-1 deepseek-v3

Updated Jan 1, 2025
Python

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

audio sdk transformers tts language-model whisper asr vlm sdk-python edge-computing on-device-ml on-device-ai llm stable-diffusion

Updated Dec 31, 2024
Python

BAAI-Agents / Cradle

Star

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Nov 7, 2024
Python

QiuYannnn / Local-File-Organizer

Star

An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.

vlm file-organizer on-device-ai llm llama3

Updated Oct 21, 2024
Python

xlang-ai / OSWorld

Star

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

Updated Dec 20, 2024
Python

om-ai-lab / OmAgent

Star

A Multimodal Language Agent Framework for Problem Solving and More

Updated Dec 31, 2024
Python

coderonion / awesome-yolo-object-detection

Star

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects.

app qt cuda yolo awesome-list llama object-detection flutter autonomous-driving vlm tensorrt snn onnx spiking-neural-network yolov5 ultralytics llm yolov8 yolov11

Updated Dec 30, 2024

heshengtao / comfyui_LLM_party

Star

LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.3, Linkage graphRAG / RAG

Updated Dec 27, 2024
Python

ThuCCSLab / Awesome-LM-SSP

Star

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

nlp security privacy jailbreak safety awesome-list language-model vlm adversarial-attacks diffusion-models llm

Updated Dec 31, 2024

BAAI-DCAI / Bunny

Star

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Nov 18, 2024
Python

peterdsharpe / AeroSandbox

Sponsor

Star

Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

python analysis simulation optimization aerospace automatic-differentiation airplane cfd aircraft aerodynamics vlm xfoil aerospace-engineering aircraft-design mdo mdao aerodynamic-analysis 3d-panel

Updated Dec 17, 2024
Jupyter Notebook

zubair-irshad / Awesome-Robotics-3D

Star

A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites

computer-vision robotics navigation benchmarks simulations manipulation scene-graph grasping nerf 3d pointclouds vlm diffusion-models pretraining policy-learning foundation-models llm vision-language-model gaussian-splatting

Updated Nov 4, 2024

coderonion / awesome-llm-and-aigc

Star

🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

Updated Dec 30, 2024

gokayfem / awesome-vlm-architectures

Star

Famous Vision Language Models and Their Architectures

awesome awesome-list kosmos clip image-encoder vlm blip multimodal text-encoder vision-language-model llava internlm cogvlm qwen-vl

Updated Sep 8, 2024
Markdown

mbzuai-oryx / GeoChat

Star

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

remote-sensing vlm

Updated Nov 28, 2024
Python

gokayfem / ComfyUI_VLM_nodes

Star

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Nov 6, 2024
Python

yueliu1999 / Awesome-Jailbreak-on-LLMs

Star

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

security privacy ai jailbreak safety vlm llm llms vlms

Updated Dec 27, 2024

THUDM / CogAgent

Star

An open-sourced end-to-end VLM-based GUI Agent

agent glm vlm computer-use gui-agent

Updated Dec 28, 2024
Python

niuzaisheng / ScreenAgent

Star

ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)

agent ai vlm llm

Updated Nov 25, 2024
Python

haoranD / Awesome-Embodied-AI

Star

A curated list of awesome papers on Embodied AI and related research/industry-driven resources.

Updated Nov 29, 2024

Improve this page

Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vlm

Here are 175 public repositories matching this topic...

sgl-project / sglang

NexaAI / nexa-sdk

BAAI-Agents / Cradle

QiuYannnn / Local-File-Organizer

xlang-ai / OSWorld

om-ai-lab / OmAgent

coderonion / awesome-yolo-object-detection

heshengtao / comfyui_LLM_party

ThuCCSLab / Awesome-LM-SSP

BAAI-DCAI / Bunny

peterdsharpe / AeroSandbox

zubair-irshad / Awesome-Robotics-3D

coderonion / awesome-llm-and-aigc

gokayfem / awesome-vlm-architectures

mbzuai-oryx / GeoChat

gokayfem / ComfyUI_VLM_nodes

yueliu1999 / Awesome-Jailbreak-on-LLMs

THUDM / CogAgent

niuzaisheng / ScreenAgent

haoranD / Awesome-Embodied-AI

Improve this page

Add this topic to your repo