问题：

zip_始终是左边列表中最长的

袁枫涟

2023-03-14

我知道zip函数（它将根据最短列表进行压缩）和zip_longest（它将根据最长列表进行压缩），但我如何根据第一个列表进行压缩，无论它是否最长？

例如：

Input:  ['a', 'b', 'c'], [1, 2]
Output: [('a', 1), ('b', 2), ('c', None)]

而且：

Input:  ['a', 'b'], [1, 2, 3]
Output: [('a', 1), ('b', 2)]

这两种功能是否都存在于一个功能中？

共有3个答案

尹正奇

2023-03-14

让第二个无限大，然后使用普通拉链：

from itertools import chain, repeat

a = ['a', 'b', 'c']
b = [1, 2]

b = chain(b, repeat(None))

print(*zip(a, b))

叶冥夜

2023-03-14

您可以重新利用itertools文档中显示的“大致相当”的python代码。zip_longest根据第一个参数的长度制作一个通用版本：

from itertools import repeat

def zip_by_first(*args, fillvalue=None):
    # zip_by_first('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    # zip_by_first('ABC', 'xyzw', fillvalue='-') --> Ax By Cz
    if not args:
        return
    iterators = [iter(it) for it in args]
    while True:
        values = []
        for i, it in enumerate(iterators):
            try:
                value = next(it)
            except StopIteration:
                if i == 0:
                    return
                iterators[i] = repeat(fillvalue)
                value = fillvalue
            values.append(value)
        yield tuple(values)

您也许可以做一些小的改进，比如缓存重复（fillvalue）之类的。这个实现的问题是它是用Python编写的，而大多数iterool使用的是更快的C实现。通过与Kelly Bundy的答案进行比较，您可以看到这一点的效果。

江宏放

2023-03-14

将重复的fillvalue链接到iterables后面，而不是第一个：

from itertools import chain, repeat

def zip_first(first, *rest, fillvalue=None):
    return zip(first, *map(chain, rest, repeat(repeat(fillvalue))))

或者使用zip_longest并用压缩和zip技巧修剪它：

def zip_first(first, *rest, fillvalue=None):
    a, b = tee(first)
    return compress(zip_longest(b, *rest, fillvalue=fillvalue), zip(a))

就像zip和zip_longest一样，它们接受任何类型的iterables（包括无限个）中的任意数（至少一个），并返回一个迭代器（如果需要，转换为list）。

与其他同样通用的解决方案的基准：

10 iterables of 10,000 to 90,000 elements, first has 50,000:
------------------------------------------------------------
 49.9 ms   50.1 ms   50.4 ms  CrazyChucky
 74.3 ms   74.6 ms   75.1 ms  Mad_Physicist
  2.5 ms    2.6 ms    2.6 ms  Kelly_Bundy_chain
  3.2 ms    3.2 ms    3.3 ms  Kelly_Bundy_compress
  5.2 ms    5.3 ms    5.3 ms  Kelly_Bundy_3
  5.9 ms    6.0 ms    6.0 ms  Kelly_Bundy_4
  4.5 ms    4.6 ms    4.6 ms  Kelly_Bundy_5
  2.3 ms    2.3 ms    2.3 ms  limit_cheat

20,000 iterables of 0 to 100 elements, first has 50:
----------------------------------------------------
 54.8 ms   55.6 ms   56.2 ms  CrazyChucky
164.1 ms  165.4 ms  165.7 ms  Mad_Physicist
 18.6 ms   18.9 ms   19.0 ms  Kelly_Bundy_chain
 11.1 ms   11.1 ms   11.1 ms  Kelly_Bundy_compress
 11.2 ms   11.3 ms   11.4 ms  Kelly_Bundy_3
 11.6 ms   11.6 ms   11.8 ms  Kelly_Bundy_4
 11.6 ms   11.8 ms   11.8 ms  Kelly_Bundy_5
 10.8 ms   10.9 ms   10.9 ms  limit_cheat

最后一个是一个知道长度的作弊，包括显示我们能以多快的速度到达的极限。

基准代码（在线试用！）：

def CrazyChucky(*iterables, fillvalue=None):
    SENTINEL = object()
    
    for first, *others in zip_longest(*iterables, fillvalue=SENTINEL):
        if first is SENTINEL:
            return
        others = [i if i is not SENTINEL else fillvalue for i in others]
        yield (first, *others)

def Mad_Physicist(*args, fillvalue=None):
    # zip_by_first('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    # zip_by_first('ABC', 'xyzw', fillvalue='-') --> Ax By Cz
    if not args:
        return
    iterators = [iter(it) for it in args]
    while True:
        values = []
        for i, it in enumerate(iterators):
            try:
                value = next(it)
            except StopIteration:
                if i == 0:
                    return
                iterators[i] = repeat(fillvalue)
                value = fillvalue
            values.append(value)
        yield tuple(values)

def Kelly_Bundy_chain(first, *rest, fillvalue=None):
    return zip(first, *map(chain, rest, repeat(repeat(fillvalue))))

def Kelly_Bundy_compress(first, *rest, fillvalue=None):
    a, b = tee(first)
    return compress(zip_longest(b, *rest, fillvalue=fillvalue), zip(a))

def Kelly_Bundy_3(first, *rest, fillvalue=None):
    a, b = tee(first)
    return map(itemgetter(1), zip(a, zip_longest(b, *rest, fillvalue=fillvalue)))

def Kelly_Bundy_4(first, *rest, fillvalue=None):
    sentinel = object()
    for z in zip_longest(chain(first, [sentinel]), *rest, fillvalue=fillvalue):
        if z[0] is sentinel:
            break
        yield z

def Kelly_Bundy_5(first, *rest, fillvalue=None):
    stopped = False
    def stop():
        nonlocal stopped
        stopped = True
        return
        yield
    for z in zip_longest(chain(first, stop()), *rest, fillvalue=fillvalue):
        if stopped:
            break
        yield z

def limit_cheat(*iterables, fillvalue=None):
    return islice(zip_longest(*iterables, fillvalue=fillvalue), 50_000)


import timeit
from itertools import chain, repeat, zip_longest, islice, tee, compress
from operator import itemgetter
from collections import deque

funcs = [
    CrazyChucky,
    Mad_Physicist,
    Kelly_Bundy_chain,
    Kelly_Bundy_compress,
    Kelly_Bundy_3,
    Kelly_Bundy_4,
    Kelly_Bundy_5,
    limit_cheat,
]

def args():
    first = repeat(0, 50_000)
    rest = [repeat(i, 10_000 * i) for i in range(1, 10)]
    return first, *rest

def args2():
    first = repeat(0, 50)
    rest = [repeat(i, i % 101) for i in range(1, 20_000)]
    return first, *rest

expect = list(funcs[0](*args()))
for func in funcs:
    result = list(func(*args()))
    print(result == expect, func.__name__)
    
for _ in range(3):
    print()
    for func in funcs:
        times = timeit.repeat(lambda: deque(func(*args()), 0), number=1)
        print(*('%4.1f ms ' % (t * 1e3) for t in sorted(times)[:3]), func.__name__)

zip_始终是左边列表中最长的

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档