You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-17Lines changed: 22 additions & 17 deletions
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,4 @@
1
+
1
2
## Introduction:
2
3
#### Here is the official repositorium of our team's software project at the **Ruprecht Karl University of Heidelberg**.
3
4
@@ -30,7 +31,7 @@ Such examples are commonly represented using _:mod_-relation in AMR. The idea is
30
31
31
32
## Tools:
32
33
### Root:
33
-
In the root directory you can find different scripts which follow the pipeline, which is presented below. For more information on how to combine all the scripts to do the magic, please consult our Jupyter Notebook[`walkthrough.ipynb`](https://gitlab.com/denlogv/measuring-variation-in-amr/-/blob/master/walkthrough.ipynb). <br>
34
+
In the root directory you can find different scripts which follow the pipeline, which is presented below. For more information on how to combine all the scripts to do the magic, please consult our _Jupyter Notebook_[`walkthrough.ipynb`](https://gitlab.com/denlogv/measuring-variation-in-amr/-/blob/master/walkthrough.ipynb). <br>
34
35
In order to run this pipeline you'll need to ensure that following criteria are met (it is unfortunate that one has to employ multiple vens to ensure multiple penman versions; alternatively, one could try to install one of the versions directly in the project folder and rename it):
35
36
36
37
| Script|Prerequisites|
@@ -41,30 +42,33 @@ In order to run this pipeline you'll need to ensure that following criteria are
41
42
### Pipeline:
42
43
1. Convert a corpus (a _.txt_-file with a SICK dataset or a folder with an STS dataset) to a _.tsv_ (tab-sepated values)-file. <br> <br> **Functionalities:** <br> <br>
43
44
-`sts2tsv.py` converts a folder with STS-dataset to a single easily readable _.tsv_-file. <br> <br>
44
-
-`sick2tsv.py` filters a file (.txt file which has a tab-separated-values-layout with 12 columns) with a SICK-dataset to create a .tsv with columns "sent1", "sent2", "sick" (i.e. relatedness-score) <br> <br>
45
-
In our experiments we filtered the dataset to exclude examples, where sentence pairs have entailment label 'CONTRADICTION'
45
+
-`sick2tsv.py` filters a file (.txt file which has a tab-separated-values-layout with 12 columns) with a SICK-dataset to create a .tsv with columns "sent1", "sent2", "sick" (i.e. relatedness-score). <br> <br>
46
+
In our experiments we filtered the dataset to exclude examples, where sentence pairs have entailment label 'CONTRADICTION'.
3. Create alignment files for each _AMR_-file using the _AMR2Text_-alignment tool (TAMR/JAMR) presented in [**HIT-SCIR CoNLL2019 Unified Transition-Parser**](https://github.com/DreamerDeo/HIT-SCIR-CoNLL2019).<br> <br>
61
64
**Functionalities:** <br> <br>
62
-
- `amr_pipeline.py` converts a _.tsv_-file to 2 _AMR_-files
65
+
- `amr_pipeline.py` <br> 1) converts .amr-files to MRP-format. <br> 2) runs the _AMR2Text_-alignment tool on the MRP-corpora.
4. Analyse the alignment files and either transform the _AMR_-graphs according to **Method 1** or add metadata to them according to **Method 2**. <br> <br>
69
73
**Functionalities:** <br> <br>
70
74
- `AMRAnalysis.py` takes 1 or 2 _AMR2Text_-alignment-files and either transforms the graphs, or adds metadata to these file. Outputs 1 or 2 _AMR_-files.
@@ -73,24 +77,25 @@ In order to run this pipeline you'll need to ensure that following criteria are
6. Evaluate by computing _Spearman rank_ and _Pearson correlation coefficients_ + Visualise the results. <br> <br>
78
83
**Functionalities:** <br> <br>
79
84
- for steps 5 and 6 please consult our Jupyter Notebook [`walkthrough.ipynb`](https://gitlab.com/denlogv/measuring-variation-in-amr/-/blob/master/walkthrough.ipynb). Standalone scripts will be added soon.
80
-
85
+
---
81
86
### Folders:
82
87
We have been working with a lot of data, so we feel that a good overview would facilitate working with this repository. <br>
83
88
The file structure is as follows:
84
-
- `amr_suite` -- folder with $`S^2Match`$, authored by Juri Opitz, Letitia Parcalabescu and Anette Frank, visit [their repo](https://github.com/Heidelberg-NLP/amr-metric-suite/) for more details. It contains our extensions to the existing codebase of $`S^2Match`$. You can find those extensions under `amr_suite/py3-Smatch-and-S2match/smatch/`. The relevant files are:
89
+
- `amr_suite` – folder with $`S^2Match`$, authored by Juri Opitz, Letitia Parcalabescu and Anette Frank, visit [**their**](https://github.com/Heidelberg-NLP/amr-metric-suite/) for more details. It contains our extensions to the existing codebase of $`S^2Match`$. You can find those extensions under `amr_suite/py3-Smatch-and-S2match/smatch/`. The relevant files are:
85
90
- `s2matchdev_glove.py`,
86
91
- `s2matchdev_sbert.py` <br><br>
87
-
- `AMR2Text` -- our code is heavily dependent on the _AMR2Text_-alignment tool from the repo [**HIT-SCIR CoNLL2019 Unified Transition-Parser**](https://github.com/DreamerDeo/HIT-SCIR-CoNLL2019), visit their repo for installation details. <br><br>
88
-
- `datasets` -- this folder contains all of the datasets in their original form, which we used for our experiments, namely: **SICK** and **STS**. We also have **MSRP** in there, but we haven't conducted any experiments on it.<br><br>
89
-
- `data` -- this folder contains the datasets in the form, in which they are used later by our algorithms: _.tsv_-files, _AMR_-files:
90
-
- **_.amr_** -- _original AMR-graphs-output format used by [amrlib](https://amrlib.readthedocs.io/)_,
91
-
- **_.mrp_, __aligned.mrp_** -- _formats used by the AMR2Text-alignment tool_<br><br>
92
-
- `analysis` -- this folder contains the output of `AMRAnalysis.py` (for 3 datasets, namely **SICK**, **STS** and a small corpus (~30 sentences) compiled mainly from STS-sentences and some sentences, which we added by hand to facilitate the testing)<br>
92
+
- `AMR2Text` – our code is heavily dependent on the _AMR2Text_-alignment tool from the repo [**HIT-SCIR CoNLL2019 Unified Transition-Parser**](https://github.com/DreamerDeo/HIT-SCIR-CoNLL2019), visit their repo for installation details. <br><br>
93
+
- `datasets` – this folder contains all of the datasets in their original form, which we used for our experiments, namely: **SICK** and **STS**. We also have **MSRP** in there, but we haven't conducted any experiments on it.<br><br>
94
+
- `data` – this folder contains the datasets in the form, in which they are used later by our algorithms: _.tsv_-files, _AMR_-files:
95
+
- **_.amr_** – _original AMR-graphs-output format used by [**amrlib**](https://amrlib.readthedocs.io/)_,
96
+
- **_.mrp_, __aligned.mrp_** – _formats used by the AMR2Text-alignment tool_<br><br>
97
+
- `analysis` – this folder contains the output of `AMRAnalysis.py` (for 3 datasets, namely **SICK**, **STS** and a small corpus (~30 sentences) compiled mainly from STS-sentences and some sentences, which we added by hand to facilitate the testing)<br>
93
98
All these data are used later by $`S^2Match`$, the outputs and the evaluation results of which are also there (**for our GloVe- and [SBERT](sbert.net)-extensions of $`S^2Match`$**)<br><br>
94
-
- `papers` -- this folder contains some papers, which were relevant to our work.<br><br>
95
-
- `presentation` -- for those interested this folder contains the presentations held at **Ruprecht Karl University of Heidelberg**, where we presented our approach. Not relevant for the functionality of our tools.<br><br>
96
-
- `experiments` -- during the development and testing phase we experimented a lot with different things and created *Jupyter Notebooks* for some scenarios. Not relevant for the functionality of our tools.
99
+
- `papers` – this folder contains some papers, which were relevant to our work.<br><br>
100
+
- `presentation` – for those interested this folder contains the presentations held at **Ruprecht Karl University of Heidelberg**, where we presented our approach. Not relevant for the functionality of our tools.<br><br>
101
+
- `experiments` – during the development and testing phase we experimented a lot with different things and created *Jupyter Notebooks* for some scenarios. Not relevant for the functionality of our tools.
0 commit comments