Cookbook：4.迭代器和生成器

裴嘉良

2023-12-01

0.迭代器和生成器

https://www.runoob.com/python3/python3-iterator-generator.html

迭代器对象从集合的第一个元素开始访问，直到所有的元素被访问完结束。迭代器只能往前不会后退。

迭代器有两个基本的方法：iter() 和 next()。

字符串，列表或元组对象都可用于创建迭代器：

>> list=[1,2,3,4]
>>> it = iter(list)   # 创建迭代器对象
>>> print(next(it))  # 输出迭代器的下一个元素
1
>>> print(next(it))
2

可用for循环输出，也可用while Try except 加next函数输出

创建一个迭代器

把一个类作为一个迭代器使用需要在类中实现两个方法 iter() 与 next() 。

iter() 方法返回一个特殊的迭代器对象，这个迭代器对象实现了 next() 方法并通过 StopIteration 异常标识迭代的完成。

next() 方法会返回下一个迭代器对象。

1.手动访问迭代器元素

next()函数

with open('/etc/passwd') as f:
    try:
        while True:
            line = next(f)
            print(line,end=' ')
     except StopIteration:
        pass

2.委托迭代

3.用生成器创建新迭代模式

自定义一个迭代模式，区别于内建函数如range(),reserved()

可用生成器函数来定义

def frange(start, stop, increment):
    x = start
    while x < stop:
        yield x
        x+=increment

要使用这个函数，可用for，或者如sum, list

for n in frange(0,4,0.5):
	print(n)

函数只要出现yield就转变成生成器

c=frange()#此时c为对象

next(c)	#即可调用

4.实现迭代协议

迭代器实现DFS

class Node:
    def __init__(self, value):
        self._value = value
        self._children = []

    def __repr__(self):
        return 'Node({!r})'.format(self._value)

    def add_child(self, node):
        self._children.append(node)

    def __iter__(self):
        return iter(self._children)

    def depth_first(self):
        yield self
        for c in self:
            yield from c.depth_first()

if __name__ =='__main__':
    root=Node(0)
    child1=Node(1)
    child2=Node(2)
    root.add_child(child1)
    root.add_child(child2)
    child1.add_child(Node(3))
    child1.add_child(Node(4))
    child2.add_child(Node(5))
    
    for ch in root.depth_first():
        print(ch)

D:\Anaconda\envs\pycharmProjects\python.exe D:/pycharmProjects/main.py
Node(0)
Node(1)
Node(3)
Node(4)
Node(2)
Node(5)

python迭代协议要求_iter_()返回一个特殊的迭代器对象，该对象必须实现_next_()方法

cookbook P119还有内容

5.反向迭代

reversed()函数

a = [1,2,3,4]
for x in reversed(a):
    print(x)
    
4
3
2
1

前提是：待处理对象拥有可确定大小，或者对象满足了_reversed_()特殊方法,r如不满足，需要先转换为列表：

f=open('file')
for line in reversed(list(f)):
    print(line,end=' ')
#注意此方法会消耗大量内存

自己定义一个反向迭代器更高效

class Countdown:
    def __init__(self,start):
        self.start=start

        def __iter__(self):
            n =self.start
            while n > 0:
                yield n
                n -= 1

        def __reversed__(self):
            n = 1
            while n <= self.start:
                yield n
                n += 1

6.定义带有额外状态的生成器函数

想让生成器将状态暴露给用户

#'filename ： iteration.py'
from collections import deque

class linehistory:
    def __init__(self,lines,histlen=3):
        self.lines = lines
        self.history=deque(maxlen=histlen)
        
    def __iter__(self):
        for lineno,line in enumerate(self.lines,1):
            self.history.append((lineno, line))
            yield line
            
    def clear(self):
        self.history.clear()

from iteration import linehistory

with open('somefile.txt') as f:
    lines = linehistory(f)		#将文件传入给类，创建对象 lines
    for line in lines:			#对象 按行检测
        if 'python' in line:	#如果有该字符串
            for lineno,hline in lines.history:		#打印lineno和hline
                print('{}:{}'.format(lineno,hline),end=' ')

D:\Anaconda\envs\pycharmProjects\python.exe D:/pycharmProjects/main2.py
1:we sant sd as we do
 2:as python in les

cookbook P123

7.迭代器作切片操作

itertools.islice()

def count(n):
...     while True:
...         yield n
...         n +=1
...         
c =count(0)
c[10,20]
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: 'generator' object is not subscriptable

import itertools
for x in itertools.islice(c,10,20):
...     print(x)
...     
10
11
12
13
14
15
16
17
18
19

8.跳过可迭代对象的前一部分元素

itertools.dropwhile()

只要提供一个函数和一个可迭代对象即可。

with open('') as f:
    for line in dropwhile(lambda line:line.startswith('#'),f):
        print(line,end=' ')

用islice也可以，前提是知道跳过几行：

items =['a','b','c',1,2,3,4]
for x in itertools.islice(items,3,None):	#即从第四个元素到最后
    print(X)

9.迭代所有可能组合和排列

itertools.permutations()

接受一个元素集合，将其中所有元素重排列为所有可能情况，以元组形式返回

items = ['a','b','c']
from itertools import permutations
for p in permutations(items):
...     print(p)
...     
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')

#第二个参数为可选长度  (items,2)

itertools.combinations()

产生所有组合形式,不重复，无顺序

from itertools import combinations
for c in combinations(items,3):
...     print(c)
...     
('a', 'b', 'c')

for c in combinations(items,2):
...     print(c)
...     
('a', 'b')
('a', 'c')
('b', 'c')

itertools.combinations_with_replacement() 解除了元素不重复限制

10.以索引-值对形式迭代序列

enumerate()

迭代同时返回元素索引

my_list=['a','c','e']
for idx, val in enumerate(my_list):
...     print(idx, val)
...     
0 a
1 c
2 e

规范行号：

enumerate(my_list,1) 从1 开始

练习：跟踪文件错误信息行号

cookbook P130

11.同时迭代多个序列

zip()

zip(x,y) 的原理是创建出一个迭代器，产生元组（）

i1=[,,,,,,]
i2=[,,,]

for x,y in zip(i1,i2):
    print(x,y)
    #输出结果与最短的序列长
    #如果要与最长的匹配：
    itertools.zip_longest()

#用于创建字典
s = dict(zip(list1,list2))
#zip也可接受3个序列的

12.在不同的容器中迭代

itertools.chain()

它接受一系列可迭代对象，并返回一个迭代器

13.创建处理数据的管道

有海量的数据但没法完全加载到内存。

生成器函数是实现管道机制的好方法，假设我们有个超大的目录：

foo/
	access-log-0127007.gz
	access-log-0227007.gz
	access-log-0327007.gz
	...
bar/
	access-log-0117007.gz2
	access-log-0127022.gz

定义一个小型生成器函数，每个函数执行特定的独立任务

import os
import fnmatch
import gzip
import bz2
import re

def gen_find(fillepat,top):
    '''
    Find all fillnames in a directory tree that match a shell wildcard patternn
    '''
    for path, dirlist, filelist in os.walk(top):
        for name in fnmatch.filter(filelist,filepat):
            yield os.path.join(path,name)
            
def gen_opener(filenames):
    for filename in filenames:
        if filename.endswith('.gz'):
            f = gzip.open(filename,'rt')
        elif filename.endswith('.bz2'):
            f = bz2.open(filename,'rt')
        else:
            f = open(filename,'rt')
        yield f
        f.close()

未完

p136

14.扁平化处理嵌套型的序列

15.合并多个有序序列，再迭代

heapq.merge()

import heapq
a = [1, 4, 7, 10]
b = [2, 5, 6, 11]
for c in heapq.merge(a,b):
	print(c)

heapq.merge的迭代性质意味着它对所有提供的序列都不会做一次读取，这样可以处理非常长的序列，开销很小

import heapq

with open('','rt') as file1, \
	open('','rt') as file2,\
    open('','rt')as out_file:
        
for line in heapq.merge(file1,file2):
    outf.write(line)

16.用迭代器取代while循环

在设计io的程序中，编写这样的代码是很常见的：

CHUNKSIZE=8192

def reader(s):
    while True:
        data = s.recv(CHUNKSIZE)
        if data =='b':
            break
            process_data(data)

用iter替代：

def reader(s):
    for chunk in iter(lambda: s.recv(CHUNKSIZE),b''):
    process_data(data)

附录：heapq模块

它包含6个函数，其中前4个与堆操作直接相关。必须使用列表来表示堆对象本身。

    **模块heapq中一些重要的函数**
                             函 数                              描 述
heappush(heap, x)                    将x压入堆中
heappop(heap)                   从堆中弹出最小的元素
heapify(heap)  让列表具备堆特征即位置i处的元素总是大于位置i // 2处的元素（反过来说就是小于位置2 * i和2 * i + 1处的元素）。
heapreplace(heap, x)              弹出最小的元素，并将x压入堆中
nlargest(n, iter)                返回iter中n个最大的元素
nsmallest(n, iter)              返回iter中n个最小的元素

import heapq
from random import shuffle
data=list(range(10))
shuffle(data)
heap=[]
for n in data:
	heappush(heap, n)

heappush(heap,0.5)	#随机插入0.5

元素的排列顺序并不像看起来那么随意。它们虽然不是严格排序的，但必须保证一点：位置i处的元素总是大于位置i // 2处的元素（反过来说就是小于位置2 * i和2 * i + 1处的元素）。这是底层堆算法的基础，称为堆特征（heap property）