目录
当前位置: 首页 > 文档资料 > Edward 中文文档 >

observations.text8

优质
小牛编辑
128浏览
2023-12-01
text8(path)

Load the text8 data set (Mahoney, 2011). The dataset is preprocessed and has a vocabulary of 27 characters. There are 100 million characters.

Args:

  • path: str. Path to directory which either stores file or otherwise file will be downloaded and extracted there. Filename is text8.

Returns:

Tuple of str x_train, x_test, x_valid.

Mahoney, M. (2011). Large text compression benchmark. Retrieved from http://mattmahoney.net/dc/text.html