fixing _split_by_lang in rag_tokenizer.py #3209

HeyPhiS · 2026-01-09T19:49:09Z

What problem does this PR solve?

Guarding the final slice in _split_by_lang ensures we never run past the end of the segment when e can temporarily exceed len(a) (e.g., when the last run updates e = s + 1). The change now clamps the upper bound with min(e, len(a)), so we always append a valid substring and avoid the occasional “list index out of range” crash the tokenizer was seeing.

Type of change

Bug Fix (non-breaking change which fixes an issue)
Python SDK impacted, Need to update PyPI

KevinHuSh · 2026-01-12T08:03:12Z

python/infinity_sdk/infinity/rag_tokenizer.py

                zh = _zh
-            if s >= len(a):
-                continue
-            txt_lang_pairs.append((a[s:e], zh))


Appreciations!
I can't see any side effects If e exceeds the length of a.
a[s:min(e, len(a))] equals to a[s:e].

fixing_split_by_lang

e766ebf

yingfeng requested a review from KevinHuSh January 12, 2026 05:26

KevinHuSh reviewed Jan 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fixing _split_by_lang in rag_tokenizer.py #3209

fixing _split_by_lang in rag_tokenizer.py #3209

Uh oh!

HeyPhiS commented Jan 9, 2026 •

edited by JinHai-CN

Loading

Uh oh!

KevinHuSh Jan 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fixing _split_by_lang in rag_tokenizer.py #3209

Are you sure you want to change the base?

fixing _split_by_lang in rag_tokenizer.py #3209

Uh oh!

Conversation

HeyPhiS commented Jan 9, 2026 • edited by JinHai-CN Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of change

Uh oh!

KevinHuSh Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HeyPhiS commented Jan 9, 2026 •

edited by JinHai-CN

Loading

KevinHuSh Jan 12, 2026 •

edited

Loading