Skip to content

Commit ca40970

Browse files
authored
Change empty-match exit from break 2 to break and add regression test (#46)
The break 2 was harder to reason about and could incorrectly abort parsing when an empty match occurred mid-buffer with more stream data available. With break (single), the iterator loads the next chunk, which can resolve the empty match into a positive-length one (e.g. when a fixed-width alternative needs more characters than remain in the current buffer).
1 parent a40b76d commit ca40970

File tree

2 files changed

+15
-1
lines changed

2 files changed

+15
-1
lines changed

src/PatternIterator.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ public function getIterator(): Iterator
8282
}
8383

8484
if (strlen($matches[0]) === 0) {
85-
break 2;
85+
break;
8686
}
8787

8888
yield $matches;

tests/cases/PatternIteratorTest.phpt

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,20 @@ class PatternIteratorTest extends TestCase
524524
}
525525

526526

527+
public function testZeroLengthMatchMidBufferLoadsMoreData(): void
528+
{
529+
// Pattern matches exactly 3 chars, or empty. When only 1 char remains in the
530+
// buffer but the stream has more data, the empty match mid-buffer must not abort;
531+
// loading the next chunk makes the 3-char alternative succeed.
532+
$iter = new PatternIterator($this->stream('abXY', 'Zmore'), '~.{3}|~A');
533+
$results = $this->collect($iter);
534+
Assert::count(3, $results);
535+
Assert::same('abX', $results[0][0]);
536+
Assert::same('YZm', $results[1][0]);
537+
Assert::same('ore', $results[2][0]);
538+
}
539+
540+
527541
public function testZeroLengthMatchAcrossChunks(): void
528542
{
529543
$pattern = '~\s*(?:(?<query>[^;]+);|\z)~As';

0 commit comments

Comments
 (0)