Skip to content

Commit e6e0b37

Browse files
committed
Started working with the SICK-dataset
1 parent 78446bf commit e6e0b37

File tree

8 files changed

+18504
-7866
lines changed

8 files changed

+18504
-7866
lines changed

data/SICK2014.tsv

Lines changed: 8417 additions & 0 deletions
Large diffs are not rendered by default.

datasets/sick/SICK2014_full.txt

Lines changed: 9841 additions & 0 deletions
Large diffs are not rendered by default.

datasets/sick/readme.txt

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
::::::::::::::::::::::: University of Trento - Italy ::::::::::::::::::::::::::::::::
2+
3+
:::::::::::: SICK (Sentences Involving Compositional Knowledge) data set ::::::::::::
4+
5+
:::::::::::::::::::::http://clic.cimec.unitn.it/composes/sick/ ::::::::::::::::::::::
6+
7+
8+
The SICK data set consists of 10,000 English sentence pairs, built starting from two existing
9+
paraphrase sets: the 8K ImageFlickr data set (http://nlp.cs.illinois.edu/HockenmaierGroup/data.html)
10+
and the SEMEVAL-2012 Semantic Textual Similarity Video Descriptions data set
11+
(http://www.cs.york.ac.uk/semeval-2012/task6/index.php?id=data). Each sentence pair is annotated
12+
for relatedness in meaning and for the entailment relation between the two elements.
13+
14+
15+
16+
The SICK data set is released under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0
17+
Unported License (http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US)
18+
19+
When using SICK in published research, please cite:
20+
M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi and R. Zamparelli. 2014. A SICK cure
21+
for the evaluation of compositional distributional semantic models. Proceedings of LREC 2014,
22+
Reykjavik (Iceland): ELRA.
23+
24+
25+
26+
The SICK data set is used in SemEval 2014 - Task 1: Evaluation of compositional distributional
27+
semantic models on full sentences through semantic relatedness and textual entailment
28+
29+
30+
31+
File Structure: tab-separated text file
32+
33+
Fields:
34+
35+
- pair_ID: sentence pair ID
36+
37+
- sentence_A: sentence A
38+
39+
- sentence_B: sentence B
40+
41+
- entailment_label: textual entailment gold label (NEUTRAL, ENTAILMENT, or CONTRADICTION)
42+
43+
- relatedness_score: semantic relatedness gold score (on a 1-5 continuous scale)
44+
45+
- entailment_AB: entailment for the A-B order (A_neutral_B, A_entails_B, or A_contradicts_B)
46+
47+
- entailment_BA: entailment for the B-A order (B_neutral_A, B_entails_A, or B_contradicts_A)
48+
49+
- sentence_A_original: original sentence from which sentence A is derived
50+
51+
- sentence_B_original: original sentence from which sentence B is derived
52+
53+
- sentence_A_dataset: dataset from which the original sentence A was extracted (FLICKR vs. SEMEVAL)
54+
55+
- sentence_B_dataset: dataset from which the original sentence B was extracted (FLICKR vs. SEMEVAL)
56+
57+
- SemEval_set: set including the sentence pair in SemEval 2014 Task 1 (TRIAL, TRAIN, or TEST)

datasets/sts/SICK/SICK_trial.txt

Lines changed: 0 additions & 501 deletions
This file was deleted.

datasets/sts/SICK/SICK_trial_AMR_SMATCH.tsv

Lines changed: 0 additions & 501 deletions
This file was deleted.

0 commit comments

Comments
 (0)