GitHub - Nevermetyou65/Mangosteen-documents

This is part of Mangostten Dataset project

To use this repo

Install uv
Run bash setup.sh
Check if documents dir is there at root project
Put pdf documents in documents/inputs. Each source of pdf should be one folder like this

documents
|____inputs/
     |___<source_1>
     |___<source_2>
     ...
     |___<source_n>

We encourage users to implement pdf downloading pipeline and export to json pipeline yourself. This repo only show experimenal code.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
tests		tests
utils		utils
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
env_example		env_example
marker_run_example_cli.sh		marker_run_example_cli.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
run_marker_loop.py		run_marker_loop.py
setup.sh		setup.sh
uv.lock		uv.lock

Provide feedback