目录
当前位置: 首页 > 文档资料 > Edward 中文文档 >

observations.enwik8

优质
小牛编辑
128浏览
2023-12-01
enwik8(path)

Load enwik8 from the Hutter Prize (Hutter, 2012). The dataset is preprocessed and has a vocabulary of 205 characters. There are 100 million characters.

Args:

  • path: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filename is enwik8.

Returns:

Tuple of str x_train, x_test, x_valid.

Hutter, M. (2012). The human knowledge compression contest. Retrieved from http://prize.hutter1.net