现在需要统计若干段文字(英文)中的单词数量,并且还需统计每个单词出现的次数。
注1:单词之间以空格(1个或多个空格)为间隔。
注2:忽略空行或者空格行。
基本版:
统计时,区分字母大小写,且不删除指定标点符号。
进阶版:
统计前,需要从文字中删除指定标点符号!.,:*?
。 注意:所谓的删除,就是用1个空格替换掉相应字符。
统计单词时需要忽略单词的大小写。
若干行英文,最后以!!!为结束。
单词数量
出现次数排名前10的单词(次数按照降序排序,如果次数相同,则按照键值的字母升序排序)及出现次数。
failure is probably the fortification in your pole
it is like a peek your wallet as the thief when you
are thinking how to spend several hard-won lepta
when you are wondering whether new money it has laid
background because of you then at the heart of the
most lax alert and most low awareness and left it
godsend failed
!!!!!
46
the=4
it=3
you=3
and=2
are=2
is=2
most=2
of=2
when=2
your=2
结尾无空行
Failure is probably The fortification in your pole!
It is like a peek your wallet as the thief when You
are thinking how to. spend several hard-won lepta.
when yoU are? wondering whether new money it has laid
background Because of: yOu?, then at the heart of the
Tom say: Who is the best? No one dare to say yes.
most lax alert and! most low awareness and* left it
godsend failed
!!!!!
结尾无空行
54
the=5
is=3
it=3
you=3
and=2
are=2
most=2
of=2
say=2
to=2
结尾无空行
# 处理多个空格间隔的函数:
def HandleBlock(list):
new_list =[]
for item in list:
if item != "":
new_list.append(item)
return new_list
removechars = "!.,:*?"
total_list = []
line = ""
while line != "!!!!!":
line = input()
if len(line) == 0:
pass
elif line != "!!!!!":
# 去除 "!.,:*?"
for char in removechars:
line = line.replace(char,' ')
# 去除空格及做列表拆分
line = line.split(" ")
line = HandleBlock(line)
# 统一以小写状态输入
for item in line:
total_list.append(item.lower())
else:
pass
# 登记出现的个数
# 初始化统计表
statistical_table = {}
for item in total_list:
statistical_table[item] = 0
# 统计
for item in total_list:
statistical_table[item] += 1
# 第一层按次数排序
temp_table = list(zip(statistical_table.values(),statistical_table.keys()))
ordered_table = sorted(temp_table,reverse=True)
# 第二层次数相同按键的字母序排序
final_ordered_table = []
cut_list = []
cur_times = ordered_table[0][0]
for item in ordered_table:
if item[0] == cur_times:
cut_list.append((item[1],item[0]))
else:
ordered_cut_list = sorted(cut_list)
final_ordered_table += ordered_cut_list
# 更新当前的值
cur_times = item[0]
cut_list = []
cut_list.append((item[1], item[0]))
ordered_cut_list = sorted(cut_list)
final_ordered_table += ordered_cut_list
# 输出
numb = len(final_ordered_table)
print(numb)
for i in range(10):
print(f"{final_ordered_table[i][0]}={final_ordered_table[i][1]}")
!.,:*?
标点符号进行处理,需要用到字符串自带的replace()函数
.sorted()函数
对元素是元组的列表进行排序的时候,排序的参考变量只会是每个元组中的第一个元素。以及sorted()默认升序排列,要求降序的话,需要加上reverse=True
。zip()函数
能灵活按 (键:值) 或者 (值:键) 的循序来整合处理字典,方便分别依照键或依照值为标准来进行排序。