|
1 |
| -# DOI Scraper |
2 |
| -[](https://zenodo.org/badge/latestdoi/640054736) |
3 |
| - |
4 |
| -The DOI Scraper is a Python script that reads a `.bib` file, searches for articles without a DOI (Digital Object Identifier), and retrieves the missing DOIs using the [Crossref API](https://www.crossref.org/documentation/retrieve-metadata/rest-api/). It then updates the `.bib` file with the retrieved DOIs. |
5 |
| - |
6 |
| -## Prerequisites |
7 |
| - |
8 |
| -* Python |
9 |
| -* `requests` library |
10 |
| - |
11 |
| -## Installation |
12 |
| - |
13 |
| -1. Clone the repository or download the `doi_scraper.py` file. |
14 |
| - |
15 |
| -2. Install the required dependencies by running the following command: |
16 |
| - |
17 |
| -```shell |
18 |
| -pip install requests |
19 |
| -``` |
20 |
| - |
21 |
| -# Usage |
22 |
| - |
23 |
| -Place your input `.bib` file in the same directory as the `doi_scraper.py` script. |
24 |
| - |
25 |
| -Open the `doi_scraper.py` file and modify the following variables according to your needs: |
26 |
| - |
27 |
| -```python |
28 |
| -input_file = 'input.bib' # Name of the input .bib file |
29 |
| -output_file = 'output.bib' # Name of the output .bib file |
30 |
| -INDENT_PRE = 4 # Number of spaces before the field name |
31 |
| -INDENT_POST = 16 # Number of spaces after the field name |
32 |
| -``` |
33 |
| - |
34 |
| -Run the script using the following command: |
35 |
| - |
36 |
| -```shell |
37 |
| -python doi_scraper.py |
38 |
| -``` |
39 |
| - |
40 |
| -The script will search for articles without a DOI and retrieve the missing DOIs using the Crossref API. It will then update the output .bib file with the retrieved DOIs. |
41 |
| - |
42 |
| -Once the script completes, you will find the updated .bib file with the retrieved DOIs in the same directory. |
43 |
| - |
44 |
| -# Example |
45 |
| - |
46 |
| -## Before |
47 |
| - |
48 |
| -```bibtex |
49 |
| -@article{Cuadra2020, |
50 |
| -title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities}, |
51 |
| -author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos}, |
52 |
| -year = 2020, |
53 |
| -journal = {Journal of Fluid Mechanics}, |
54 |
| -publisher = {Cambridge University Press}, |
55 |
| -volume = 903, |
56 |
| -pages= {A30 1--39} |
57 |
| -} |
58 |
| -``` |
59 |
| - |
60 |
| -## After |
61 |
| - |
62 |
| -```bibtex |
63 |
| -@article{Cuadra2020, |
64 |
| - title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities}, |
65 |
| - author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos}, |
66 |
| - year = 2020, |
67 |
| - journal = {Journal of Fluid Mechanics}, |
68 |
| - publisher = {Cambridge University Press}, |
69 |
| - volume = 903, |
70 |
| - pages = {A30 1--39}, |
71 |
| - doi = {10.1017/jfm.2020.651} |
72 |
| -} |
73 |
| -``` |
74 |
| - |
75 |
| -# License |
76 |
| - |
77 |
| -This project is licensed under the [MIT License](LICENSE). |
| 1 | +# DOI Scraper |
| 2 | + |
| 3 | +The DOI Scraper is a Python script that reads a `.bib` file, searches for entries missing required fields (such as a DOI), retrieves the missing information using the [Crossref API](https://www.crossref.org/documentation/retrieve-metadata/rest-api/), and reformats the file with consistent indentation. The refactored design supports different entry types (e.g., articles, books, inproceedings, tech reports), with each type defining its own required fields. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +- Python 3.x |
| 8 | +- `requests` library |
| 9 | +- `tqdm` library |
| 10 | + |
| 11 | +## Installation |
| 12 | + |
| 13 | +1. Clone the repository or download the `doi_scraper.py` file. |
| 14 | + |
| 15 | +2. Install the required dependencies by running the following command: |
| 16 | + |
| 17 | +```shell |
| 18 | +pip install -r requirements.txt |
| 19 | +``` |
| 20 | + |
| 21 | +# Usage |
| 22 | + |
| 23 | +Place your input `.bib` file in the same directory as the `doi_scraper.py` script. |
| 24 | + |
| 25 | +Open the `doi_scraper.py` file and modify the following variables according to your needs: |
| 26 | + |
| 27 | +```python |
| 28 | +input_file = 'input.bib' # Name of the input .bib file |
| 29 | +output_file = 'output.bib' # Name of the output .bib file |
| 30 | +INDENT_PRE = 4 # Number of spaces before the field name |
| 31 | +INDENT_POST = 16 # Number of spaces after the field name |
| 32 | +``` |
| 33 | + |
| 34 | +Run the script using the following command: |
| 35 | + |
| 36 | +```shell |
| 37 | +python doi_scraper.py |
| 38 | +``` |
| 39 | + |
| 40 | +The script will search for articles without a DOI and retrieve the missing DOIs using the Crossref API. It will then update the output .bib file with the retrieved DOIs. |
| 41 | + |
| 42 | +Once the script completes, you will find the updated .bib file with the retrieved DOIs in the same directory. |
| 43 | + |
| 44 | +## Optional Arguments |
| 45 | + |
| 46 | +* `--format-only`: If you want to reformat the file without performing any Crossref lookups, pass the --format-only flag: |
| 47 | + |
| 48 | +```shell |
| 49 | +python doi_scraper.py --format-only |
| 50 | +``` |
| 51 | + |
| 52 | +# Example |
| 53 | + |
| 54 | +## Before |
| 55 | + |
| 56 | +```bibtex |
| 57 | +@article{Cuadra2020, |
| 58 | +title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities}, |
| 59 | +author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos}, |
| 60 | +pages= {A30 1--39} |
| 61 | +} |
| 62 | +``` |
| 63 | + |
| 64 | +## After |
| 65 | + |
| 66 | +```bibtex |
| 67 | +@article{Cuadra2020, |
| 68 | + title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities}, |
| 69 | + author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos}, |
| 70 | + pages = {A30 1--39}, |
| 71 | + year = {2020}, |
| 72 | + journal = {Journal of Fluid Mechanics}, |
| 73 | + volume = {903}, |
| 74 | + doi = {10.1017/jfm.2020.651}, |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +# License |
| 79 | + |
| 80 | +This project is licensed under the [MIT License](LICENSE). |
0 commit comments