问题：

将刽子手难度级别的单词分类为“易”、“中”或“难”的算法

于捷

2023-03-14

对于刽子手游戏，确定单词“难度”的好算法是什么，以便游戏可以选择与指定难度级别匹配的单词？

难度似乎与所需的猜测次数、字母的相对使用频率（例如，含有许多不常见字母的单词可能更难猜测）以及单词的长度有关。

还有一些主观因素需要（尝试）补偿，例如一个词在玩家词汇表中的可能性，并且可以被识别，允许从仅基于字母频率的猜测策略转移到基于已知匹配词列表的猜测。

下面是我在ruby中的尝试。关于如何改进分类有什么建议吗？

def classify_word(w)
  n = w.chars.to_a.uniq.length # Num. unique chars in w
  if n < 5 and w.length > 4
    return WordDifficulty::Easy
  end
  if n > w.length / 2
    return WordDifficulty::Hard
  else
    return WordDifficulty::Medium
  end
end

我正在写一个我想让我的孩子玩的刽子手游戏；我太老了，不能尝试“家庭作业”，这可能就是为什么这个问题得到这么多反对票...单词是从大型单词数据库中随机抽取的，其中包括许多晦涩的单词，并通过为单词确定的难度级别进行过滤。

共有3个答案

孙阳舒

2023-03-14

您可以使用蒙特卡洛方法来估计单词的难度：

通过每次猜测一个随机字母来模拟一个游戏，根据字母在目标语言中的频率进行加权，并计算随机玩家得出解决方案所需的猜测次数。请注意，由于每次猜测都会消除一个字母，因此该过程是有限的，它返回一个从1到26（包括1到26）的数字

郁光熙

2023-03-14

一种非常简单的方法是根据单词中缺少元音、唯一字母的数量和每个字母的共性来计算分数：

letters = 'etaoinshrdlcumwfgypbvkjxqz'
vowels = set('aeiou')

def difficulty(word):
    unique = set(word)
    positions = sum(letters.index(c) for c in word)

    return len(word) * len(unique) * (7 - len(unique & vowels)) * positions

words = ['the', 'potato', 'school', 'egypt', 'floccinaucinihilipilification']

for word in words:
    print difficulty(word), word

和输出：

432 the
3360 potato
7200 school
7800 egypt
194271 floccinaucinihilipilification

然后，您可以用以下内容为单词打分：

        score < 2000   # Easy
 2000 < score < 10000  # Medium
10000 < score          # Hard

章博耘

2023-03-14

有一种方法可以系统地解决这个问题：如果你有一个能很好地扮演刽子手的算法，那么你可以把每个单词的难度看作是你的程序在猜测这个单词时会出现的错误猜测的数量。

在其他一些答案和评论中有一个隐含的想法，即求解者的最佳策略是根据英语字母的频率或某些语料库中单词的频率来做出决定。这是一个诱人的想法，但并不完全正确。如果求解者准确地模拟了设置者选择的单词的分布，那么求解者做得最好，而人类设置者很可能会根据单词的稀有性或对常用字母的避免来选择单词。例如，虽然E是英语中最常用的字母，但如果setter总是从单词JUGFUL、RHYTHM、SYZYGY和ZYTHUM中进行选择，那么完美的求解器不会从猜测E开始！

建模setter的最佳方法取决于上下文，但我想某种贝叶斯归纳推理在求解器与同一个setter或一组相似setter进行多次博弈的上下文中会很好地工作。

在这里，我将概述一个非常好（但远非完美）的求解器。它将setter建模为从固定字典中均匀地选择单词。这是一个贪婪的算法：在每个阶段，它都猜测最小化未命中次数的字母，即不包含猜测的单词。例如，如果到目前为止还没有猜测，并且可能的单词是DEED、DEAD和DARE，那么：

如果您猜测D或E，则没有遗漏
如果你猜一个，就错过了一个（契约）
如果您猜测，则有两个未命中（DEAD和DEAD）
如果你猜到其他字母，有三个遗漏

所以在这种情况下，D或E都是一个很好的猜测。

（感谢Panic上校在评论中指出正确的猜测在刽子手中是免费的——我第一次尝试时完全忘记了这一点！）

下面是该算法在Python中的实现：

from collections import defaultdict
from string import ascii_lowercase

def partition(guess, words):
    """Apply the single letter 'guess' to the sequence 'words' and return
    a dictionary mapping the pattern of occurrences of 'guess' in a
    word to the list of words with that pattern.

    >>> words = 'deed even eyes mews peep star'.split()
    >>> sorted(list(partition('e', words).items()))
    [(0, ['star']), (2, ['mews']), (5, ['even', 'eyes']), (6, ['deed', 'peep'])]

    """
    result = defaultdict(list)
    for word in words:
        key = sum(1 << i for i, letter in enumerate(word) if letter == guess)
        result[key].append(word)
    return result

def guess_cost(guess, words):
    """Return the cost of a guess, namely the number of words that don't
    contain the guess.

    >>> words = 'deed even eyes mews peep star'.split()
    >>> guess_cost('e', words)
    1
    >>> guess_cost('s', words)
    3

    """
    return sum(guess not in word for word in words)

def word_guesses(words, wrong = 0, letters = ''):
    """Given the collection 'words' that match all letters guessed so far,
    generate tuples (wrong, nguesses, word, guesses) where
    'word' is the word that was guessed;
    'guesses' is the sequence of letters guessed;
    'wrong' is the number of these guesses that were wrong;
    'nguesses' is len(guesses).

    >>> words = 'deed even eyes heel mere peep star'.split()
    >>> from pprint import pprint
    >>> pprint(sorted(word_guesses(words)))
    [(0, 1, 'mere', 'e'),
     (0, 2, 'deed', 'ed'),
     (0, 2, 'even', 'en'),
     (1, 1, 'star', 'e'),
     (1, 2, 'eyes', 'en'),
     (1, 3, 'heel', 'edh'),
     (2, 3, 'peep', 'edh')]

    """
    if len(words) == 1:
        yield wrong, len(letters), words[0], letters
        return
    best_guess = min((g for g in ascii_lowercase if g not in letters),
                     key = lambda g:guess_cost(g, words))
    best_partition = partition(best_guess, words)
    letters += best_guess
    for pattern, words in best_partition.items():
        for guess in word_guesses(words, wrong + (pattern == 0), letters):
            yield guess

使用此策略可以评估猜测集合中每个单词的难度。在这里，我考虑一下我的系统词典中的六个字母单词：

>>> words = [w.strip() for w in open('/usr/share/dict/words') if w.lower() == w]
>>> six_letter_words = set(w for w in words if len(w) == 6)
>>> len(six_letter_words)
15066
>>> results = sorted(word_guesses(six_letter_words))

本词典中最容易猜测的单词（以及解算器猜测单词所需的猜测序列）如下：

>>> from pprint import pprint
>>> pprint(results[:10])
[(0, 1, 'eelery', 'e'),
 (0, 2, 'coneen', 'en'),
 (0, 2, 'earlet', 'er'),
 (0, 2, 'earner', 'er'),
 (0, 2, 'edgrew', 'er'),
 (0, 2, 'eerily', 'el'),
 (0, 2, 'egence', 'eg'),
 (0, 2, 'eleven', 'el'),
 (0, 2, 'enaena', 'en'),
 (0, 2, 'ennead', 'en')]

最难的词是：

>>> pprint(results[-10:])
[(12, 16, 'buzzer', 'eraoiutlnsmdbcfg'),
 (12, 16, 'cuffer', 'eraoiutlnsmdbpgc'),
 (12, 16, 'jugger', 'eraoiutlnsmdbpgh'),
 (12, 16, 'pugger', 'eraoiutlnsmdbpcf'),
 (12, 16, 'suddle', 'eaioulbrdcfghmnp'),
 (12, 16, 'yucker', 'eraoiutlnsmdbpgc'),
 (12, 16, 'zipper', 'eraoinltsdgcbpjk'),
 (12, 17, 'tuzzle', 'eaioulbrdcgszmnpt'),
 (13, 16, 'wuzzer', 'eraoiutlnsmdbpgc'),
 (13, 17, 'wuzzle', 'eaioulbrdcgszmnpt')]

之所以很难做到这一点，是因为在您猜测了UZZLE之后，您仍然有七种可能性：

>>> ' '.join(sorted(w for w in six_letter_words if w.endswith('uzzle')))
'buzzle guzzle muzzle nuzzle puzzle tuzzle wuzzle'

当然，在为孩子准备词表时，你不会从电脑的系统词典开始，而是从你认为他们可能知道的单词列表开始。例如，你可以看看Wiktionary列出的各种英语语料库中最常用的单词列表。

例如，截至2006年，古腾堡计划中10,000个最常见单词中的1,700个六字母单词中，最难的十个是：

[(6, 10, 'losing', 'eaoignvwch'),
 (6, 10, 'monkey', 'erdstaoync'),
 (6, 10, 'pulled', 'erdaioupfh'),
 (6, 10, 'slaves', 'erdsacthkl'),
 (6, 10, 'supper', 'eriaoubsfm'),
 (6, 11, 'hunter', 'eriaoubshng'),
 (6, 11, 'nought', 'eaoiustghbf'),
 (6, 11, 'wounds', 'eaoiusdnhpr'),
 (6, 11, 'wright', 'eaoithglrbf'),
 (7, 10, 'soames', 'erdsacthkl')]

（索姆斯·福赛特（Soames Forsyte）是约翰·高尔斯华绥（John Galsworthy）的《福赛特传奇》（Forsyte Saga）中的一个角色；字表已转换为小写，因此我无法快速删除专有名称。）

将刽子手难度级别的单词分类为“易”、“中”或“难”的算法

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档