Skip to content

Commit

Permalink
Implement a caching mechanism for tokenized sequences
Browse files Browse the repository at this point in the history
  • Loading branch information
jshuadvd committed Jul 11, 2024
1 parent 287957e commit 1c1a00b
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions train.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ def preprocess_data(data, tokenizer, max_length, overlap):
end = start + max_length
chunk = data[start:end]
# tokenized_chunk = tokenizer.encode(chunk)
# Cache the tokenized chunk
tokenized_chunk = cached_tokenize(chunk, tokenizer)

# Create sliding window sequences from the tokenized chunk
Expand Down

0 comments on commit 1c1a00b

Please sign in to comment.