Skip to content

Commit 7ac94e9

Browse files
Merge pull request #9 from AlbertoCuadra/develop
Update: improve performance, capabilities, and scalability
2 parents 30c4c68 + 1a02db2 commit 7ac94e9

File tree

3 files changed

+660
-210
lines changed

3 files changed

+660
-210
lines changed

README.md

Lines changed: 80 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,77 +1,80 @@
1-
# DOI Scraper
2-
[![DOI](https://zenodo.org/badge/640054736.svg)](https://zenodo.org/badge/latestdoi/640054736)
3-
4-
The DOI Scraper is a Python script that reads a `.bib` file, searches for articles without a DOI (Digital Object Identifier), and retrieves the missing DOIs using the [Crossref API](https://www.crossref.org/documentation/retrieve-metadata/rest-api/). It then updates the `.bib` file with the retrieved DOIs.
5-
6-
## Prerequisites
7-
8-
* Python
9-
* `requests` library
10-
11-
## Installation
12-
13-
1. Clone the repository or download the `doi_scraper.py` file.
14-
15-
2. Install the required dependencies by running the following command:
16-
17-
```shell
18-
pip install requests
19-
```
20-
21-
# Usage
22-
23-
Place your input `.bib` file in the same directory as the `doi_scraper.py` script.
24-
25-
Open the `doi_scraper.py` file and modify the following variables according to your needs:
26-
27-
```python
28-
input_file = 'input.bib' # Name of the input .bib file
29-
output_file = 'output.bib' # Name of the output .bib file
30-
INDENT_PRE = 4 # Number of spaces before the field name
31-
INDENT_POST = 16 # Number of spaces after the field name
32-
```
33-
34-
Run the script using the following command:
35-
36-
```shell
37-
python doi_scraper.py
38-
```
39-
40-
The script will search for articles without a DOI and retrieve the missing DOIs using the Crossref API. It will then update the output .bib file with the retrieved DOIs.
41-
42-
Once the script completes, you will find the updated .bib file with the retrieved DOIs in the same directory.
43-
44-
# Example
45-
46-
## Before
47-
48-
```bibtex
49-
@article{Cuadra2020,
50-
title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities},
51-
author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos},
52-
year = 2020,
53-
journal = {Journal of Fluid Mechanics},
54-
publisher = {Cambridge University Press},
55-
volume = 903,
56-
pages= {A30 1--39}
57-
}
58-
```
59-
60-
## After
61-
62-
```bibtex
63-
@article{Cuadra2020,
64-
title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities},
65-
author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos},
66-
year = 2020,
67-
journal = {Journal of Fluid Mechanics},
68-
publisher = {Cambridge University Press},
69-
volume = 903,
70-
pages = {A30 1--39},
71-
doi = {10.1017/jfm.2020.651}
72-
}
73-
```
74-
75-
# License
76-
77-
This project is licensed under the [MIT License](LICENSE).
1+
# DOI Scraper
2+
3+
The DOI Scraper is a Python script that reads a `.bib` file, searches for entries missing required fields (such as a DOI), retrieves the missing information using the [Crossref API](https://www.crossref.org/documentation/retrieve-metadata/rest-api/), and reformats the file with consistent indentation. The refactored design supports different entry types (e.g., articles, books, inproceedings, tech reports), with each type defining its own required fields.
4+
5+
## Prerequisites
6+
7+
- Python 3.x
8+
- `requests` library
9+
- `tqdm` library
10+
11+
## Installation
12+
13+
1. Clone the repository or download the `doi_scraper.py` file.
14+
15+
2. Install the required dependencies by running the following command:
16+
17+
```shell
18+
pip install -r requirements.txt
19+
```
20+
21+
# Usage
22+
23+
Place your input `.bib` file in the same directory as the `doi_scraper.py` script.
24+
25+
Open the `doi_scraper.py` file and modify the following variables according to your needs:
26+
27+
```python
28+
input_file = 'input.bib' # Name of the input .bib file
29+
output_file = 'output.bib' # Name of the output .bib file
30+
INDENT_PRE = 4 # Number of spaces before the field name
31+
INDENT_POST = 16 # Number of spaces after the field name
32+
```
33+
34+
Run the script using the following command:
35+
36+
```shell
37+
python doi_scraper.py
38+
```
39+
40+
The script will search for articles without a DOI and retrieve the missing DOIs using the Crossref API. It will then update the output .bib file with the retrieved DOIs.
41+
42+
Once the script completes, you will find the updated .bib file with the retrieved DOIs in the same directory.
43+
44+
## Optional Arguments
45+
46+
* `--format-only`: If you want to reformat the file without performing any Crossref lookups, pass the --format-only flag:
47+
48+
```shell
49+
python doi_scraper.py --format-only
50+
```
51+
52+
# Example
53+
54+
## Before
55+
56+
```bibtex
57+
@article{Cuadra2020,
58+
title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities},
59+
author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos},
60+
pages= {A30 1--39}
61+
}
62+
```
63+
64+
## After
65+
66+
```bibtex
67+
@article{Cuadra2020,
68+
title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities},
69+
author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos},
70+
pages = {A30 1--39},
71+
year = {2020},
72+
journal = {Journal of Fluid Mechanics},
73+
volume = {903},
74+
doi = {10.1017/jfm.2020.651},
75+
}
76+
```
77+
78+
# License
79+
80+
This project is licensed under the [MIT License](LICENSE).

0 commit comments

Comments
 (0)