问题：

如何有效地删除字符串中连续重复的单词或短语[重复]

萧宏峻

2023-03-14

我有一个字符串，其中包含重复出现的短语，或者它甚至可能是一个连续出现多次的单词。

尝试了各种方法，但找不到更好的节省时间和空间的方法。

这是我尝试过的方法

Groupby（）
re

String = "what type of people were most likely to be able to be able to be able to be able to be 1.35 ?"
s1 = " ".join([k for k,v in groupby(String.replace("&lt;/Sent&gt;","").split())])
s2 = re.sub(r'\b(.+)(\s+\1\b)+', r'\1', String)

在我的情况下，他们两个似乎都不起作用

我的预期结果：

什么类型的人最有可能成为1.35？

这些是我提到的一些帖子

有没有办法删除字符串中重复和连续的单词/短语？-不工作
我怎么能删除重复的字在一个字符串与Python？-工程部分，但需要一个最佳的方式为大字符串也

请不要在上面的帖子中将我的问题标记为重复，因为我尝试了大多数实现，但没有找到有效的解决方案。

共有2个答案

闾丘京

2023-03-14

我非常肯定，在这种方法中，Python 3.7中保持了顺序，但我并不完全确定较旧的版本。

String = "what type of people were most likely to be able to be able to be able to be able to be 1.35 ?"
unique_words = dict.fromkeys(String.split())
print(' '.join(unique_words))
>>> what type of people were most likely to be able 1.35 ?

段曦

2023-03-14

我会采用这种创造性的方法来寻找长度不断增长的复制品：

input = "what type of people were most likely to be able to be able to be able to be able to be 1.35 ?"
def combine_words(input,length):
    combined_inputs = []
    if len(splitted_input)>1:
        for i in range(len(input)-1):
            combined_inputs.append(input[i]+" "+last_word_of(splitted_input[i+1],length)) #add the last word of the right-neighbour (overlapping) sequence (before it has expanded), which is the next word in the original sentence
    return combined_inputs, length+1

def remove_duplicates(input, length):
    bool_broke=False #this means we didn't find any duplicates here
    for i in range(len(input) - length):
        if input[i]==input[i + length]: #found a duplicate piece of sentence!
            for j in range(0,length): #remove the overlapping sequences in reverse order
                del input[i + length - j]
            bool_broke = True
            break #break the for loop as the loop length does not matches the length of splitted_input anymore as we removed elements
    if bool_broke:
        return remove_duplicates(input, length) #if we found a duplicate, look for another duplicate of the same length
    return input

def last_word_of(input,length):
    splitted = input.split(" ")
    if len(splitted)==0:
        return input
    else:
        return splitted[length-1]

#make a list of strings which represent every sequence of word_length adjacent words
splitted_input = input.split(" ")
word_length = 1
splitted_input,word_length = combine_words(splitted_input,word_length)

intermediate_output = False

while len(splitted_input)>1:
    splitted_input = remove_duplicates(splitted_input,word_length) #look whether two sequences of length n (with distance n apart) are equal. If so, remove the n overlapping sequences
    splitted_input, word_length = combine_words(splitted_input,word_length) #make even bigger sequences
    if intermediate_output:
        print(splitted_input)
        print(word_length)
output = splitted_input[0] #In the end you have a list of length 1, with all possible lengths of repetitive words removed

输出流畅的

what type of people were most likely to be able to be 1.35 ?

尽管它不是期望的输出，但我不知道它如何识别删除前面3处出现的“to be”（长度为2）。

类似资料：

有没有办法删除字符串中重复和连续的单词/短语？

有没有办法删除字符串中重复和连续的单词/短语？例如。 [输入]： [out]：我试过这个：当它变得更复杂，我想删除短语（假设短语最多由5个单词组成）时会发生什么？怎样才能做到呢？例如。 [输入]： [out]：另一个例子： [in]：这是一个短语重复的句子。句子不是短语。
如何使用Python删除字符串中的重复单词？

问题内容：以下示例：如何删除后两个重复项和？结果应该看起来像仅应删除第二个重复项，并且不应更改单词的顺序！问题答案：
从单词中删除重复字符

问题内容：我想知道将“ haaaaapppppyyyy”转换为“ haappyy”的最佳方法是什么。基本上，解析语时，人们有时会重复字符以增加重点。我想知道这样做的最好方法是什么？使用不起作用，因为字母的顺序显然很重要。有任何想法吗？我正在使用Python + nltk。问题答案：可以使用正则表达式来完成：用一个字符的两倍来补充任何字符（）后跟一个或多个相同字符（由于backref必
python在字符串中查找确切的单词或短语[重复]

我有以下短语：我想从列表中找到特定的短语。如何在短语串中找到短语列表中的确切短语？我试过了：问题是这打印：我只希望出现完全匹配的“ict”：我如何在大量短语中实现这一点？
消除字符串中的连续重复项

我想从字符串中消除连续重复，如这是我的密码我得到了错误非穷举模式，我想这是第二行的错误，当只剩下1个字符时，程序不知道如何处理。我该怎么修？
删除字符串中的重复字符

问题内容：我有像这样的字符串“ aaaabbbccccaaddddcfggghhhh”，我想删除重复的字符，得到像这样的字符串“ abcadcfgh”。一个简单的实现是：使用正则表达式是否可能有更好的实现？问题答案：你可以这样做：正则表达式使用反向引用和捕获组。正常的正则表达式是，但是您必须在Java中使用另一个反斜杠来使反斜杠转义。如果您想要重复的字符数：演示版

如何有效地删除字符串中连续重复的单词或短语[重复]

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档