python typing overload

秦伯寅

2023-12-01

最近在看transformers包的源码。

在文件src/transformers/tokenization_utils.py里面发现了这个用法，感觉还是挺有意思的。

在看到前几行导入的时候，导入了typing的overload。

定位到代码中，竟然是装饰一个函数：convert_ids_to_tokens。这个函数在类里面，出现了3次。

其中前面两次出现都是被overload装饰了。前面两个函数唯一的区别就是参数类型组合不一样。比如，第一个函数是int，第二个是List[int].
但是第三个函数，就没有被overload装饰。这是为什么？

from typing import Any, Dict, List, Optional, Tuple, Union, overload

class PreTrainedTokenizer(PreTrainedTokenizerBase):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

        # Added tokens - We store this for both slow and fast tokenizers
        # until the serialization of Fast tokenizers is updated
        self.added_tokens_encoder: Dict[str, int] = {}
        self.added_tokens_decoder: Dict[int, str] = {}
        self.unique_no_split_tokens: List[str] = []
        self.tokens_trie = Trie()

        self._decode_use_source_tokenizer = False

    @overload
    def convert_ids_to_tokens(self, ids: int, skip_special_tokens: bool = False) -> str:
        ...

    @overload
    def convert_ids_to_tokens(self, ids: List[int], skip_special_tokens: bool = False) -> List[str]:
        ...

    def convert_ids_to_tokens(
        self, ids: Union[int, List[int]], skip_special_tokens: bool = False
    ) -> Union[str, List[str]]:

        if isinstance(ids, int):
            if ids in self.added_tokens_decoder:
                return self.added_tokens_decoder[ids]
            else:
                return self._convert_id_to_token(ids)
        tokens = []
        for index in ids:
            index = int(index)
            if skip_special_tokens and index in self.all_special_ids:
                continue
            if index in self.added_tokens_decoder:
                tokens.append(self.added_tokens_decoder[index])
            else:
                tokens.append(self._convert_id_to_token(index))
        return tokens

官网是这么说的：

@overload 装饰器可以修饰支持多个不同参数类型组合的函数或方法。@overload - 装饰定义的系列必须紧跟一个非 @overload-装饰定义（用于同一个函数/方法）。@overload-装饰定义仅是为了协助类型检查器，因为该装饰器会被非 @overload-装饰定义覆盖，后者用于运行时，而且会被类型检查器忽略。在运行时直接调用 @overload 装饰的函数会触发 NotImplementedError。下面的重载示例给出了比联合类型或类型变量更精准的类型：

这个overload就是重载的意思：函数（或者叫方法）名称可以相同，但是类型组合不同。

比如上面出现了convert_ids_to_tokens三次。

加overload装饰的函数（方法），就是用于被类型器检查的.
最后一个没有加overload，就是在运行的时候用的，而且类型监测器不检查他的类型了.
最后一个不能加overload，也不需要加overload,最后一个就是用来运行的，当然不可以加。

可是这么做有什么好处呢？多写了6行代码。

优点

重载给出了比联合类型或类型变量更精准的类型，也就是类型检查更精确了.
用来做类型检查器的，用来运行的。面向这两个需求，分工明确.
检查的时候，直接查看被overload装饰器的函数（方法）.
运行的时候，直接用没有被overload装饰的函数（方法）.

缺点

我感觉唯一的缺点，就是要多写一些代码。(但是着没什么缺点)

参考链接

https://docs.python.org/zh-cn/3/library/typing.html#typing.overload

python typing overload

优点

缺点

参考链接

相关阅读

相关文章

相关问答