observations.wikitext2
优质
小牛编辑
128浏览
2023-12-01
wikitext2(
path,
raw=False
)
Load the Wikitext-2 data set (Merity, Xiong, Bradbury, & Socher, 2016). The dataset consists of Wikipedia articles fitting the Good or Featured article criteria and has a vocabulary of 33,278 words. There are 2,088,628 training, 217,646 validation, and 245,569 test tokens.
Args:
path
: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filename iswikitext-2/
.raw
: bool, optional. Whether to load the raw data, which does not preprocess any tokens into and newlines into .
Returns:
Tuple of str x_train, x_valid, x_test
.
Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. arXiv Preprint arXiv:1609.07843.