Different block extraction results between Windows (local) and Docker using DocstrumBoundingBoxes.Instance

I am experiencing inconsistent results when extracting text blocks from a PDF using DocstrumBoundingBoxes.Instance as the page segmenter. The issue occurs when running the same PDF processing code locally on Windows and in production on Docker.

**Expected behavior**
The block should be correctly extracted into five lines as it is locally on Windows:

```
D*****************S
5 RUE P*****L
93200 SAINT-DENIS
REPRENTE PAR D******* I**********
SIRET : 989 288 774 00015
```

**Actual behavior**
When running inside Docker, the extracted blocks are incorrectly split and partially scrambled:

```
D*****************S
5 RUE P*****L
93200
SAINT-DENIS
REPRENTE PAR D******* I**********
SIRET : 288 00015
989 774
```

**Environment**

- UglyToad version: 0.1.11
- OS (local): Windows 11
- Docker base image : mcr.microsoft.com/dotnet/aspnet:9.0
- .NET runtime version: 9.0

**Additional information**

Both environments use the same code and dependencies.
The attached PDF exhibits the problem reproducibly.
I tried both DefaultWordExtractor and NearestNeighbourWordExtractor.
This suggests a possible difference in floating-point precision, font rendering, or locale behavior between Windows and Linux environments.

[EXAMPLE (2) (1).pdf](https://github.com/user-attachments/files/23355763/EXAMPLE.2.1.pdf)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different block extraction results between Windows (local) and Docker using DocstrumBoundingBoxes.Instance #1200

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Different block extraction results between Windows (local) and Docker using DocstrumBoundingBoxes.Instance #1200

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions