Skip to content

Commit 5001951

Browse files
committed
Updated Readme
1 parent fe0122e commit 5001951

File tree

2 files changed

+26
-22
lines changed

2 files changed

+26
-22
lines changed

README.md

Lines changed: 22 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
## Introduction:
23
#### Here is the official repositorium of our team's software project at the **Ruprecht Karl University of Heidelberg**.
34

@@ -30,7 +31,7 @@ Such examples are commonly represented using _:mod_-relation in AMR. The idea is
3031

3132
## Tools:
3233
### Root:
33-
In the root directory you can find different scripts which follow the pipeline, which is presented below. For more information on how to combine all the scripts to do the magic, please consult our Jupyter Notebook [`walkthrough.ipynb`](https://gitlab.com/denlogv/measuring-variation-in-amr/-/blob/master/walkthrough.ipynb). <br>
34+
In the root directory you can find different scripts which follow the pipeline, which is presented below. For more information on how to combine all the scripts to do the magic, please consult our _Jupyter Notebook_ [`walkthrough.ipynb`](https://gitlab.com/denlogv/measuring-variation-in-amr/-/blob/master/walkthrough.ipynb). <br>
3435
In order to run this pipeline you'll need to ensure that following criteria are met (it is unfortunate that one has to employ multiple vens to ensure multiple penman versions; alternatively, one could try to install one of the versions directly in the project folder and rename it):
3536

3637
| Script|Prerequisites|
@@ -41,30 +42,33 @@ In order to run this pipeline you'll need to ensure that following criteria are
4142
### Pipeline:
4243
1. Convert a corpus (a _.txt_-file with a SICK dataset or a folder with an STS dataset) to a _.tsv_ (tab-sepated values)-file. <br> <br> **Functionalities:** <br> <br>
4344
- `sts2tsv.py` converts a folder with STS-dataset to a single easily readable _.tsv_-file. <br> <br>
44-
- `sick2tsv.py` filters a file (.txt file which has a tab-separated-values-layout with 12 columns) with a SICK-dataset to create a .tsv with columns "sent1", "sent2", "sick" (i.e. relatedness-score) <br> <br>
45-
In our experiments we filtered the dataset to exclude examples, where sentence pairs have entailment label 'CONTRADICTION'
45+
- `sick2tsv.py` filters a file (.txt file which has a tab-separated-values-layout with 12 columns) with a SICK-dataset to create a .tsv with columns "sent1", "sent2", "sick" (i.e. relatedness-score). <br> <br>
46+
In our experiments we filtered the dataset to exclude examples, where sentence pairs have entailment label 'CONTRADICTION'.
4647
```
4748
Usage examples:
4849
4950
python3 sick2tsv.py -i datasets/sick/SICK2014_full.txt -o data/SICK2014.tsv --entailment_exclude contradiction
5051
python3 sts2tsv.py -i datasets/sts/sts2016-english-with-gs-v1.0 -o data/STS2016_full.tsv
5152
```
53+
---
5254
2. Use the created _.tsv_-file to generate 2 _AMR_-files (for each sentence column). <br> <br>
5355
**Functionalities:** <br> <br>
54-
- `tsv2amr.py` converts a _.tsv_-file to 2 _AMR_-files
56+
- `tsv2amr.py` converts a _.tsv_-file to 2 _AMR_-files.
5557
```
5658
Usage example:
5759
5860
python3 tsv2amr.py -i data/SICK2014.tsv -o data/amr/SICK2014_corpus
5961
```
62+
---
6063
3. Create alignment files for each _AMR_-file using the _AMR2Text_-alignment tool (TAMR/JAMR) presented in [**HIT-SCIR CoNLL2019 Unified Transition-Parser**](https://github.com/DreamerDeo/HIT-SCIR-CoNLL2019).<br> <br>
6164
**Functionalities:** <br> <br>
62-
- `amr_pipeline.py` converts a _.tsv_-file to 2 _AMR_-files
65+
- `amr_pipeline.py` <br> 1) converts .amr-files to MRP-format. <br> 2) runs the _AMR2Text_-alignment tool on the MRP-corpora.
6366
```
6467
Usage example:
6568
66-
python3 AMR2text/amr_pipeline.py -o data/amr/STS2016_corpus
69+
amr_pipeline.py -t AMR2Text -o data/amr/STS2016_corpus
6770
```
71+
---
6872
4. Analyse the alignment files and either transform the _AMR_-graphs according to **Method 1** or add metadata to them according to **Method 2**. <br> <br>
6973
**Functionalities:** <br> <br>
7074
- `AMRAnalysis.py` takes 1 or 2 _AMR2Text_-alignment-files and either transforms the graphs, or adds metadata to these file. Outputs 1 or 2 _AMR_-files.
@@ -73,24 +77,25 @@ In order to run this pipeline you'll need to ensure that following criteria are
7377
7478
python3 AMRAnalysis.py -i data/amr/SICK2014_corpus_a_aligned.mrp data/amr/SICK2014_corpus_b_aligned.mrp --output_prefix analysis/sick/SICK2014 --extended_meta
7579
```
80+
---
7681
5. Run $`S^2Match`$ on the resulting _AMR_-files.
7782
6. Evaluate by computing _Spearman rank_ and _Pearson correlation coefficients_ + Visualise the results. <br> <br>
7883
**Functionalities:** <br> <br>
7984
- for steps 5 and 6 please consult our Jupyter Notebook [`walkthrough.ipynb`](https://gitlab.com/denlogv/measuring-variation-in-amr/-/blob/master/walkthrough.ipynb). Standalone scripts will be added soon.
80-
85+
---
8186
### Folders:
8287
We have been working with a lot of data, so we feel that a good overview would facilitate working with this repository. <br>
8388
The file structure is as follows:
84-
- `amr_suite` -- folder with $`S^2Match`$, authored by Juri Opitz, Letitia Parcalabescu and Anette Frank, visit [their repo](https://github.com/Heidelberg-NLP/amr-metric-suite/) for more details. It contains our extensions to the existing codebase of $`S^2Match`$. You can find those extensions under `amr_suite/py3-Smatch-and-S2match/smatch/`. The relevant files are:
89+
- `amr_suite` folder with $`S^2Match`$, authored by Juri Opitz, Letitia Parcalabescu and Anette Frank, visit [**their**](https://github.com/Heidelberg-NLP/amr-metric-suite/) for more details. It contains our extensions to the existing codebase of $`S^2Match`$. You can find those extensions under `amr_suite/py3-Smatch-and-S2match/smatch/`. The relevant files are:
8590
- `s2matchdev_glove.py`,
8691
- `s2matchdev_sbert.py` <br><br>
87-
- `AMR2Text` -- our code is heavily dependent on the _AMR2Text_-alignment tool from the repo [**HIT-SCIR CoNLL2019 Unified Transition-Parser**](https://github.com/DreamerDeo/HIT-SCIR-CoNLL2019), visit their repo for installation details. <br><br>
88-
- `datasets` -- this folder contains all of the datasets in their original form, which we used for our experiments, namely: **SICK** and **STS**. We also have **MSRP** in there, but we haven't conducted any experiments on it.<br><br>
89-
- `data` -- this folder contains the datasets in the form, in which they are used later by our algorithms: _.tsv_-files, _AMR_-files:
90-
- **_.amr_** -- _original AMR-graphs-output format used by [amrlib](https://amrlib.readthedocs.io/)_,
91-
- **_.mrp_, __aligned.mrp_** -- _formats used by the AMR2Text-alignment tool_<br><br>
92-
- `analysis` -- this folder contains the output of `AMRAnalysis.py` (for 3 datasets, namely **SICK**, **STS** and a small corpus (~30 sentences) compiled mainly from STS-sentences and some sentences, which we added by hand to facilitate the testing)<br>
92+
- `AMR2Text` our code is heavily dependent on the _AMR2Text_-alignment tool from the repo [**HIT-SCIR CoNLL2019 Unified Transition-Parser**](https://github.com/DreamerDeo/HIT-SCIR-CoNLL2019), visit their repo for installation details. <br><br>
93+
- `datasets` this folder contains all of the datasets in their original form, which we used for our experiments, namely: **SICK** and **STS**. We also have **MSRP** in there, but we haven't conducted any experiments on it.<br><br>
94+
- `data` this folder contains the datasets in the form, in which they are used later by our algorithms: _.tsv_-files, _AMR_-files:
95+
- **_.amr_** _original AMR-graphs-output format used by [**amrlib**](https://amrlib.readthedocs.io/)_,
96+
- **_.mrp_, __aligned.mrp_** _formats used by the AMR2Text-alignment tool_<br><br>
97+
- `analysis` this folder contains the output of `AMRAnalysis.py` (for 3 datasets, namely **SICK**, **STS** and a small corpus (~30 sentences) compiled mainly from STS-sentences and some sentences, which we added by hand to facilitate the testing)<br>
9398
All these data are used later by $`S^2Match`$, the outputs and the evaluation results of which are also there (**for our GloVe- and [SBERT](sbert.net)-extensions of $`S^2Match`$**)<br><br>
94-
- `papers` -- this folder contains some papers, which were relevant to our work.<br><br>
95-
- `presentation` -- for those interested this folder contains the presentations held at **Ruprecht Karl University of Heidelberg**, where we presented our approach. Not relevant for the functionality of our tools.<br><br>
96-
- `experiments` -- during the development and testing phase we experimented a lot with different things and created *Jupyter Notebooks* for some scenarios. Not relevant for the functionality of our tools.
99+
- `papers` this folder contains some papers, which were relevant to our work.<br><br>
100+
- `presentation` for those interested this folder contains the presentations held at **Ruprecht Karl University of Heidelberg**, where we presented our approach. Not relevant for the functionality of our tools.<br><br>
101+
- `experiments` during the development and testing phase we experimented a lot with different things and created *Jupyter Notebooks* for some scenarios. Not relevant for the functionality of our tools.

amr_pipeline.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
11
"""
22
This script does the following:
3-
4-
1) It converts a .tsv corpus with 2 columns corresponding to sentences A and B
5-
to 2 AMR corpora of these sentences
6-
2) It runs the AMR2Text-alignment tool on the AMR corpora.
3+
4+
1) It converts .amr-files to MRP-format
5+
2) It runs the AMR2Text-alignment tool on the MRP-corpora.
76
87
Usage example:
98
10-
python3 AMR2text/amr_pipeline.py -o data/amr/STS2016_corpus
9+
amr_pipeline.py -t AMR2Text -o data/amr/STS2016_corpus
1110
"""
1211

1312
"""

0 commit comments

Comments
 (0)