如何控制包含东亚字符的Unicode字符串的填充

楚瑞

2023-03-14

问题内容：

我遇到了三个UTF-8刺痛：

hello, world
hello, 世界
hello, 世rld

我只想要前10个ascii-char-width，这样一栏中的括号即可：

[hello, wor]
[hello, 世 ]
[hello, 世r]

在控制台中：

width('世界')==width('worl')
width('世 ')==width('wor')  #a white space behind '世'

一个中文字符是三个字节，但是在控制台中显示时，它只有2个ascii字符宽度：

>>> bytes("hello, 世界", encoding='utf-8')
b'hello, \xe4\xb8\x96\xe7\x95\x8c'

format()当UTF-8字符混入时，python并没有帮助

>>> for s in ['[{0:<{1}.{1}}]'.format(s, 10) for s in ['hello, world', 'hello, 世界', 'hello, 世rld']]:
...    print(s)
...
[hello, wor]
[hello, 世界 ]
[hello, 世rl]

这不是很漂亮：

因此，我想知道是否有标准的方法来执行UTF-8填充人员？

问题答案：

尝试以固定宽度字体将ASCII文本与中文对齐时，存在一组可打印ASCII字符的全角版本。下面我制作了一张ASCII到全角版本的转换表：

    # coding: utf8

    # full width versions (SPACE is non-contiguous with ! through ~)
    SPACE = '\N{IDEOGRAPHIC SPACE}'
    EXCLA = '\N{FULLWIDTH EXCLAMATION MARK}'
    TILDE = '\N{FULLWIDTH TILDE}'

    # strings of ASCII and full-width characters (same order)
    west = ''.join(chr(i) for i in range(ord(' '),ord('~')))
    east = SPACE + ''.join(chr(i) for i in range(ord(EXCLA),ord(TILDE)))

    # build the translation table
    full = str.maketrans(west,east)

    data = '''\
Butterfly (a song)
Another song
Support your lover (yet another song)
Rooted seeds
Cucurrucucu Palo whatever
Between woodlands
Blu ray
In your eyes
Chopin's farewell
Journey to the West
Deep in love
Love the earth
Time goes by
Cannon
Serenade by Schubert
Sweet lullaby
    '''

    # Replace the ASCII characters with full width, and create a song list.
    data = data.translate(full).rstrip().split('\n')

    # translate each printable line.
    print(' ----------Songs-----------'.translate(full))
    for i,song in enumerate(data):
        line = '|{:4}: {:20.20}|'.format(i+1,song)
        print(line.translate(full))
    print(' --------------------------'.translate(full))

输出量

　－－－－－－－－－－Ｓｏｎｇｓ－－－－－－－－－－－
｜　　　1: Butterfly (asong)　　        　　　　　　　｜
｜　　　2: anothersong　　　                          ｜
｜　　  3: support your lovers                        ｜
｜　　　4:the root seeds　　　　　　　　　　　　　　　｜
｜　　　５：cucurrucupalo                             ｜
｜　　　between woodlands　　　　　　　　　　 　　　　｜
｜　　　７：　Blu ray　　　　　　　　　　　　　 　　　｜
｜　　　８：　in your eyes　　　　　　　　　　　　　　｜
｜　　　９：　Chopin's farewell song　　　　　　　　　｜
｜　　１０：　Journey to the West                     ｜
｜　　１１：　deep in love　　　　　　　　　　　　　  ｜
｜　　１２：　love the earth　　　　　　　　　　　　　｜
｜　　１３：　time goes by　　　　　　　　　　　　　　｜
｜　　１４：　Canon　　　　　　　　　　　　　　　 　　｜
｜　　１５：　Serenade　　　                        　｜
｜　　１６：　sweet lullaby                           ｜
　－－－－－－－－－－－－－－－－－－－－－－－－－－

它不是太漂亮，但是排列整齐。

如何控制包含东亚字符的Unicode字符串的填充

输出量

相关阅读

相关文章

相关问答

相关工具

相关文档