We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen2.5
Qwen/Qwen2.5-3B-Instruct
transformers
OS: Ubuntu 22.04 Python: Python 3.10 GPUs: 8 x NVIDIA A100 NVIDIA driver: 535 (from nvidia-smi) CUDA compiler: 12.1 (from nvcc -V) PyTorch: 2.2.1+cu121 (from python -c "import troch; print(torch.version)")
import torch from transformers import AutoTokenizer
model_name_or_path = "Qwen/Qwen2.5-3B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
text = " 那明天呢?"
tokens = tokenizer.tokenize(text) print("Tokenized Text:", tokens)
token_ids = tokenizer.convert_tokens_to_ids(tokens) print("Token IDs:", token_ids)
decoded_text = tokenizer.decode(token_ids) print("Decoded Text:", decoded_text)
Encoded Text: " 那明天呢?" Tokenized Text: ['Ġé', 'Ĥ', '£', 'æĺİ天', 'åij¢', '?'] Token IDs: [18137, 224, 96, 104807, 101036, 30] Decoded Text: 那明天呢?
The text was updated successfully, but these errors were encountered:
请问您这个token计算器是哪个工具
Sorry, something went wrong.
请问您这个token计算器是哪个工具 通过tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) 网页版token计算器链接:https://dashscope.console.aliyun.com/tokenizer 计算的结果是一致的:
No branches or pull requests
Model Series
Qwen2.5
What are the models used?
Qwen/Qwen2.5-3B-Instruct
What is the scenario where the problem happened?
transformers
Is this badcase known and can it be solved using avaiable techniques?
Information about environment
OS: Ubuntu 22.04
Python: Python 3.10
GPUs: 8 x NVIDIA A100
NVIDIA driver: 535 (from nvidia-smi)
CUDA compiler: 12.1 (from nvcc -V)
PyTorch: 2.2.1+cu121 (from python -c "import troch; print(torch.version)")
Description
Steps to reproduce
import torch
from transformers import AutoTokenizer
加载 tokenizer
model_name_or_path = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
例子文本
text = " 那明天呢?"
检查每个 token
tokens = tokenizer.tokenize(text)
print("Tokenized Text:", tokens)
转换成 token ID
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print("Token IDs:", token_ids)
解码回文本
decoded_text = tokenizer.decode(token_ids)
print("Decoded Text:", decoded_text)
Encoded Text: " 那明天呢?"
Tokenized Text: ['Ġé', 'Ĥ', '£', 'æĺİ天', 'åij¢', '?']
Token IDs: [18137, 224, 96, 104807, 101036, 30]
Decoded Text: 那明天呢?
The text was updated successfully, but these errors were encountered: