文件(2)

优质

小牛编辑

147浏览

2023-12-01

你这论断人的，无论你是谁，也无可推诿，你在甚么事上论断人，就在甚么事上定自己的罪。因你这论断人的，自己所行却和别人一样。我们知道这样行的人，神必照真理审判他。(ROMANS 2:1-2)

文件(2)

文件，也是对象。

在Python 2中还是一种内建的类型，在Python 3中被取消了，可是，这并不意味这其地位降低。

文件的状态

很多时候，我们需要获取一个文件的有关状态（也称为属性），比如创建日期，访问日期，修改日期，大小，等等。在os模块中，有这样一个方法，专门让我们查看文件的这些状态参数的。

>>> import os>>> file_stat = os.stat("131.txt")      #查看这个文件的状态>>> file_stat                           #文件状态是这样的。从下面的内容，有不少从英文单词中可以猜测出来。posix.stat_result(st_mode=33204, st_ino=5772566L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=69L, st_atime=1407897031, st_mtime=1407734600, st_ctime=1407734600)>>> file_stat.st_ctime                  #这个是文件创建时间1407734600.0882277

这是什么时间？看不懂！别着急，换一种方式。在Python中，有一个模块time，是专门针对时间设计的。

>>> import time                         >>> time.localtime(file_stat.st_ctime)  #这回看清楚了。time.struct_time(tm_year=2014, tm_mon=8, tm_mday=11, tm_hour=13, tm_min=23, tm_sec=20, tm_wday=0, tm_yday=223, tm_isdst=0)

read/readline/readlines

用open()能够打开文件，在用for循环，可以将文件的内容读取出来。

但是，在查看文件的属性和方法的时候，会看到三个方法：read, readline, readlines。

从名称上看，它们应该都是跟“读”有关的，但是，又应该有差别。

的确如此。

还是对比着看一看，并且还要Python 2和Python 3的文档对比。

Python 2：

>>> help(file.read)Help on method_descriptor:read(...)    read([size]) -> read at most size bytes, returned as a string.    If the size argument is negative or omitted, read until EOF is reached.    Notice that when in non-blocking mode, less data than what was requested    may be returned, even if no size parameter was given.>>> help(file.readline)Help on method_descriptor:readline(...)    readline([size]) -> next line from the file, as a string.    Retain newline.  A non-negative size argument limits the maximum    number of bytes to return (an incomplete line may be returned then).    Return an empty string at EOF.>>> help(file.readlines)Help on method_descriptor:readlines(...)    readlines([size]) -> list of strings, each a line from the file.    Call readline() repeatedly and return a list of the lines so read.    The optional size argument, if given, is an approximate bound on the    total number of bytes in the lines returned.

Python 3：

>>> f = open("130.txt")>>> help(f.read)Help on built-in function read:read(size=-1, /) method of _io.TextIOWrapper instance    Read at most n characters from stream.    Read from underlying buffer until we have n characters or we hit EOF.    If n is negative or omitted, read until EOF.>>> help(f.readline)Help on built-in function readline:readline(size=-1, /) method of _io.TextIOWrapper instance    Read until newline or EOF.    Returns an empty string if EOF is hit immediately.>>> help(f.readlines)Help on built-in function readlines:readlines(hint=-1, /) method of _io.TextIOWrapper instance    Return a list of lines from the stream.    hint can be specified to control the number of lines read: no more    lines will be read if the total size (in bytes/characters) of all    lines so far exceeds hint.

对照一下上面的说明，三个的异同就显现了。而且，要想了解Python 2和Python 3之间的不同，还可以将两个版本的对照一下。好像也没有什么不同哦。

EOF什么意思？End-of-file。在维基百科中居然有对它的解释：

In computing, End Of File (commonly abbreviated EOF[1]) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream. In general, the EOF is either determined when the reader returns null as seen in Java's BufferedReader,[2] or sometimes people will manually insert an EOF character of their choosing to signal when the file has ended.

明白EOF之后，就对比一下：

read：如果指定了参数size，就按照该指定长度从文件中读取内容，否则，就读取全文。被读出来的内容，全部塞到一个字符串里面。这样有好处，就是东西都到内存里面了，随时取用，比较快捷；“成也萧何败萧何”，也是因为这点，如果文件内容太多了，内存会吃不消的。文档中已经提醒注意在“non-blocking”模式下的问题，关于这个问题，不是本节的重点，暂时不讨论。
readline：那个可选参数size的含义同上。它则是以行为单位返回字符串，也就是每次读一行，依次循环，如果不限定size，直到最后一个返回的是空字符串，意味着到文件末尾了(EOF)。
readlines：size同上。它返回的是以行为单位的列表，即相当于先执行readline()，得到每一行，然后把这一行的字符串作为列表中的元素塞到一个列表中，最后将此列表返回。

依次演示操作，即可明了。有这样一个文档，名曰：you.md，其内容和基本格式如下：

You Raise Me UpWhen I am down and, oh my soul, so weary;When troubles come and my heart burdened be;Then, I am still and wait here in the silence,Until you come and sit awhile with me.You raise me up, so I can stand on mountains;You raise me up, to walk on stormy seas;I am strong, when I am on your shoulders;You raise me up: To more than I can be.

分别用上述三种函数读取这个文件。

>>> f = open("you.md")>>> content = f.read()>>> content'You Raise Me Up\nWhen I am down and, oh my soul, so weary;\nWhen troubles come and my heart burdened be;\nThen, I am still and wait here in the silence,\nUntil you come and sit awhile with me.\nYou raise me up, so I can stand on mountains;\nYou raise me up, to walk on stormy seas;\nI am strong, when I am on your shoulders;\nYou raise me up: To more than I can be.\n'>>> f.close()

提示：养成一个好习惯，只要打开文件，不用该文件了，就一定要随手关闭它。

注意：在python中，'\n'表示换行，这也是UNIX系统中的规范。但是，在奇葩的windows中，用'\r\n'表示换行。Python在处理这个的时候，会自动将'\r\n'转换为'\n'。

请仔细观察，得到的就是一个大大的字符串，但是这个字符串里面包含着一些符号\n，因为原文中有换行符。如果用print输出这个字符串，就是这样的了，其中的\n起作用了。

>>> print content    #Python 3: print(content)You Raise Me UpWhen I am down and, oh my soul, so weary;When troubles come and my heart burdened be;Then, I am still and wait here in the silence,Until you come and sit awhile with me.You raise me up, so I can stand on mountains;You raise me up, to walk on stormy seas;I am strong, when I am on your shoulders;You raise me up: To more than I can be.

用readline()读取，则是这样的：

>>> f = open("you.md")>>> f.readline()'You Raise Me Up\n'>>> f.readline()'When I am down and, oh my soul, so weary;\n'>>> f.readline()'When troubles come and my heart burdened be;\n'>>> f.close()

显示出一行一行读取了，每操作一次f.readline()，就读取一行，并且将指针向下移动一行，如此循环。显然，这种是一种循环，或者说可迭代的。因此，就可以用循环语句来完成对全文的读取。

#!/usr/bin/env python# coding=utf-8f = open("you.md")while True:    line = f.readline()    if not line:         #到EOF，返回空字符串，则终止循环        break    print line ,         #Python 3: print(line, end='')f.close()                #别忘记关闭文件

将其和文件"you.md"保存在同一个目录中，我这里命名的文件名是12701.py，然后在该目录中运行python 12701.py，就看到下面的效果了：

~/Documents$ python 12701.py You Raise Me UpWhen I am down and, oh my soul, so weary;When troubles come and my heart burdened be;Then, I am still and wait here in the silence,Until you come and sit awhile with me.You raise me up, so I can stand on mountains;You raise me up, to walk on stormy seas;I am strong, when I am on your shoulders;You raise me up: To more than I can be.

也用readlines()来读取此文件：

>>> f = open("you.md")>>> content = f.readlines()>>> content['You Raise Me Up\n', 'When I am down and, oh my soul, so weary;\n', 'When troubles come and my heart burdened be;\n', 'Then, I am still and wait here in the silence,\n', 'Until you come and sit awhile with me.\n', 'You raise me up, so I can stand on mountains;\n', 'You raise me up, to walk on stormy seas;\n', 'I am strong, when I am on your shoulders;\n', 'You raise me up: To more than I can be.\n']

返回的是一个列表，列表中每个元素都是一个字符串，每个字符串中的内容就是文件的一行文字，含行末的符号。显而易见，它是可以用for来循环的。

>>> for line in content:...     print line ,         #Python 3: print(line, end='')... You Raise Me UpWhen I am down and, oh my soul, so weary;When troubles come and my heart burdened be;Then, I am still and wait here in the silence,Until you come and sit awhile with me.You raise me up, so I can stand on mountains;You raise me up, to walk on stormy seas;I am strong, when I am on your shoulders;You raise me up: To more than I can be.>>> f.close()

读很大的文件

前面已经说明了，如果文件太大，就不能用read()或者readlines()一次性将全部内容读入内存，可以使用while循环和readline()来完成这个任务。

此外，还有一个方法：fileinput模块

>>> import fileinput>>> for line in fileinput.input("you.md"):...     print line ,        #Python 3: print(line, end='')... You Raise Me UpWhen I am down and, oh my soul, so weary;When troubles come and my heart burdened be;Then, I am still and wait here in the silence,Until you come and sit awhile with me.You raise me up, so I can stand on mountains;You raise me up, to walk on stormy seas;I am strong, when I am on your shoulders;You raise me up: To more than I can be.

我比较喜欢这个，用起来是那么得心应手，简洁明快，还用for。

对于这个模块的更多内容，读者可以自己在交互模式下利用dir()，help()去查看明白。

还有一种方法，更为常用：

>>> for line in f:...     print line ,      #Python 3:  print(line, end='')... You Raise Me UpWhen I am down and, oh my soul, so weary;When troubles come and my heart burdened be;Then, I am still and wait here in the silence,Until you come and sit awhile with me.You raise me up, so I can stand on mountains;You raise me up, to walk on stormy seas;I am strong, when I am on your shoulders;You raise me up: To more than I can be.

之所以能够如此，是因为文件是可迭代的对象，直接用for来迭代即可。

seek

这个函数的功能就是让指针移动。比如：

>>> f = open("you.md")>>> f.readline()'You Raise Me Up\n'>>> f.readline()'When I am down and, oh my soul, so weary;\n'

现在来看seek()的能力：

>>> f.seek(0)

意图是要回到文件的最开头，那么如果用f.readline()应该读取第一行。

>>> f.readline()'You Raise Me Up\n'

果然如此。此时指针所在的位置，还可以用tell()来显示，如

>>> f.tell()17L

这不是在低17行。在Python 2中返回的是17L，很容易让人产生如此误解。但是在Python 3中返回的是17，又可能迷茫，单位是什么呢？

读者不妨数一数，是按照字符，'You Raise Me Up\n'从0开始，到最后，是不是正好为17。提醒注意的是，如果你用汉语的等文字，就需要注意编码问题了。请查看本教程有关编码的讨论。

>>> f.seek(4)

f.seek(4)就将位置定位到从开头算起的第四个字符后面，也就是"You "之后，字母"R"之前的位置。

>>> f.tell()4L                     #Python 3返回：4

tell()也是这么说的。这时候如果使用readline()，得到就是从当前位置开始到行末。

>>> f.readline()'Raise Me Up\n'>>> f.close()

seek()还有别的参数，具体如下：

seek(...) seek(offset[, whence]) -> None. Move to new file position.
Argument offset is a byte count. Optional argument whence defaults to 0 (offset from start of file, offset should be >= 0); other values are 1 (move relative to current position, positive or negative), and 2 (move relative to end of file, usually negative, although many platforms allow seeking beyond the end of a file). If the file is opened in text mode, only offsets returned by tell() are legal. Use of other offsets causes undefined behavior. Note that not all file objects are seekable.

whence的值：

默认值是0，表示从文件开头开始计算指针偏移的量（简称偏移量）。这是offset必须是大于等于0的整数。
是1时，表示从当前位置开始计算偏移量。offset如果是负数，表示从当前位置向前移动，整数表示向后移动。
是2时，表示相对文件末尾移动。

前面已经提到了，文件是可迭代的，并且还学过其它可迭代的对象，看来迭代是一个有必要讨论的问题。