目录
当前位置: 首页 > 文档资料 > Edward 中文文档 >

observations.wikitext103

优质
小牛编辑
129浏览
2023-12-01
wikitext103(
    path,
    raw=False
)

Load the Wikitext-103 data set (Merity, Xiong, Bradbury, & Socher, 2016). The dataset consists of Wikipedia articles fitting the Good or Featured article criteria and has a vocabulary of 267,735 words. There are 103,227,021 training, 217,646 validation, and 245,569 test tokens.

Args:

  • path: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filename is wikitext-2/.
  • raw: bool, optional. Whether to load the raw data, which does not preprocess any tokens into and newlines into .

Returns:

Tuple of str x_train, x_valid, x_test.

Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. arXiv Preprint arXiv:1609.07843.