observations.sick
sick(path)
Load the Sentences Involving Compositional Knowledge (SICK) data set (Marelli et al., 2014). It consists of ~10,000 English sentence pairs, where each pair is annotated for relatedness and entailment. There are 923 pairs within the [1,2) range, 1373 pairs within the [2,3) range, 3872 pairs within the [3,4) range, and 3672 pairs within the [4,5] range; the entailment annotation led to 5595 neutral pairs, 1424 contradiction pairs, and 2821 entailment pairs.
Args:
path
: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filenames areSICK_train.txt
,SICK_test_annotated.txt
,SICK_trial.txt
.
Returns:
Tuple of dict x_train, x_test, x_valid
. Each dict has keys ‘relatedness_score’, ‘pair_ID’, ‘sentence_A’, ‘sentence_B’, ‘entailment_judgment’. The kth value in each key comprises of the kth sentence pair and its annotations.
Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., & Zamparelli, R. (2014). A SICK cure for the evaluation of compositional distributional semantic models. In LREC (pp. 216–223).