|
| 1 | +::::::::::::::::::::::: University of Trento - Italy :::::::::::::::::::::::::::::::: |
| 2 | + |
| 3 | +:::::::::::: SICK (Sentences Involving Compositional Knowledge) data set :::::::::::: |
| 4 | + |
| 5 | +:::::::::::::::::::::http://clic.cimec.unitn.it/composes/sick/ :::::::::::::::::::::: |
| 6 | + |
| 7 | + |
| 8 | +The SICK data set consists of 10,000 English sentence pairs, built starting from two existing |
| 9 | +paraphrase sets: the 8K ImageFlickr data set (http://nlp.cs.illinois.edu/HockenmaierGroup/data.html) |
| 10 | +and the SEMEVAL-2012 Semantic Textual Similarity Video Descriptions data set |
| 11 | +(http://www.cs.york.ac.uk/semeval-2012/task6/index.php?id=data). Each sentence pair is annotated |
| 12 | +for relatedness in meaning and for the entailment relation between the two elements. |
| 13 | + |
| 14 | + |
| 15 | + |
| 16 | +The SICK data set is released under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 |
| 17 | +Unported License (http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US) |
| 18 | + |
| 19 | +When using SICK in published research, please cite: |
| 20 | +M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi and R. Zamparelli. 2014. A SICK cure |
| 21 | +for the evaluation of compositional distributional semantic models. Proceedings of LREC 2014, |
| 22 | +Reykjavik (Iceland): ELRA. |
| 23 | + |
| 24 | + |
| 25 | + |
| 26 | +The SICK data set is used in SemEval 2014 - Task 1: Evaluation of compositional distributional |
| 27 | +semantic models on full sentences through semantic relatedness and textual entailment |
| 28 | + |
| 29 | + |
| 30 | + |
| 31 | +File Structure: tab-separated text file |
| 32 | + |
| 33 | +Fields: |
| 34 | + |
| 35 | +- pair_ID: sentence pair ID |
| 36 | + |
| 37 | +- sentence_A: sentence A |
| 38 | + |
| 39 | +- sentence_B: sentence B |
| 40 | + |
| 41 | +- entailment_label: textual entailment gold label (NEUTRAL, ENTAILMENT, or CONTRADICTION) |
| 42 | + |
| 43 | +- relatedness_score: semantic relatedness gold score (on a 1-5 continuous scale) |
| 44 | + |
| 45 | +- entailment_AB: entailment for the A-B order (A_neutral_B, A_entails_B, or A_contradicts_B) |
| 46 | + |
| 47 | +- entailment_BA: entailment for the B-A order (B_neutral_A, B_entails_A, or B_contradicts_A) |
| 48 | + |
| 49 | +- sentence_A_original: original sentence from which sentence A is derived |
| 50 | + |
| 51 | +- sentence_B_original: original sentence from which sentence B is derived |
| 52 | + |
| 53 | +- sentence_A_dataset: dataset from which the original sentence A was extracted (FLICKR vs. SEMEVAL) |
| 54 | + |
| 55 | +- sentence_B_dataset: dataset from which the original sentence B was extracted (FLICKR vs. SEMEVAL) |
| 56 | + |
| 57 | +- SemEval_set: set including the sentence pair in SemEval 2014 Task 1 (TRIAL, TRAIN, or TEST) |
0 commit comments