AI 玩2048

储国发

2023-12-01

最近写了个AI 玩2048 的小游戏，简单记录一下过程

项目地址 : https://github.com/DylanFrank/Learn/tree/master/CSMM101AI/project/2048-game

核心代码是这一块儿 :

playerAI_3.py

其他代码都是系统给的

核心逻辑

最开始我是想写一个alpha-beta剪枝来着，但是效果不好，后面我改了一下逻辑，变成期望最大化computer 是在任意一个格子上随机放一格， $90\%$ 的概率是 $2$ , $10\%$ 的概率是 $4$ , 然后平均一下整体的效用 (utility), 关于这个效用评估函数，后面再说。人就是尝试各种方向每步找一个期望效用最大的方向滑动就好。

    def minimize(self, grid, dep):
        # self.dep +=1
        cells = grid.getAvailableCells()

        if self.terminal_test(cells, dep):
            return self.estimateFun(grid)

        utility = 0

        for cell in cells:
            child = grid.clone()  # grid is not change

            # for val in tileVal:
            child.setCellValue(cell, tileVal[0])
            u1 = self.maximize(child, dep - 1)[1]

            child.setCellValue(cell, tileVal[1])
            u2 = self.maximize(child, dep - 1)[1]

            utility += prob * u1 + (1 - prob) * u2

        return utility / len(cells)

    def maximize(self, grid, dep):
        # self.dep +=1
        moves = grid.getAvailableMoves()

        if self.terminal_test(moves, dep):
            return (None, self.estimateFun(grid))

        max_utility = -1
        mov = None
        shuffle(moves)
        for m in moves:

            child = grid.clone()
            if not child.move(m):
                continue

            utility = self.minimize(child, dep - 1)
            # print("minimize utility = ", utility)
            if utility > max_utility:
                max_utility = utility
                mov = m

        return (mov, max_utility)

效用评估函数

这里我用了这样一些特征:

第一个特征很简单，就是为了让滑块合并，你得对每个不同的值赋一个权重，而且你要让合并后的权重会增加，这样，AI才会选择去合并，设值为 $v$ 的快的权重是 $p^k$ , 两个这样的滑块合并为 $2 v$ 的权重就为 $p^{k+1}$ , 那么就有 :
$p^{k+1} > 2p^k$
所以至少要让 $p > 2$ 才行，函数见 estimate_score, 至于后面那段最大值大于 1024的时候权重有所增加是应为我观察到系统到了1024后合并有些缓慢，很多时候明明能达到2048取得胜利可就是不采取合并的举动，所以我增加了这个feature的咱的比重
penalty，这个想法很简单，尽量放值相近的在一起，所以给了一个penalty惩罚，相邻如果差距大就会得到一个大的惩罚。，同时为了让其快速合并，评估局面如果两值相等，会减少惩罚，这样在搜索时间不够的时候能很快采取一个快的方向
这个feature是因为室友告诉我，将大的快往顶上挪更容易得到胜利，所以加了这个featrue。同时我观察到由于 $2, 4$ 会是电脑随机给的结果，有时候AI会主动让位将2,4放在顶部获得巨大feature，所以我去除了这个 $2, 4$ 的收益

# some helper function

# some selected para

max_power = 20

weight = [2.5 ** 5] + [2.5 ** i for i in range(max_power)]
# weight matrix of position
weight_mat = [[13, 9, 6, 4],
              [9, 6, 4, 2],
              [6, 4, 2, 1],
              [4, 2, 1, 0],
              ]

# estimate function
def heuristics_fun(grid):
    return feature2(grid) - penalty(grid) + estimate_score(grid)


def estimate_score(grid, weight=weight):
    weight[1] = weight[2] = 0
    ret = 0
    max_v = 0
    for i in range(grid_size):
        for j in range(grid_size):
            idx = int(math.log2(grid.getCellValue((i, j)) + 0.0000001) + 0.5)
            if idx < 0:
                idx = 0
            if idx > max_v:
                max_v = idx
            ret += weight[idx]

    if idx >= 10:
        ret += (1 << idx)*idx/6
        ret =ret * idx /5
    return ret


def feature2(grid):
    ret = 0
    for i in range(grid_size):
        for j in range(grid_size):
            val = grid.getCellValue((i, j))
            if val > 4:
                ret += weight_mat[i][j] * val
    return ret


def penalty(grid):
    ret = 0
    for i in range(grid_size):
        for j in range(grid_size):

            cur_pos = (i, j)
            val = grid.getCellValue(cur_pos)
            for dir in directionVectors:
                pos = (cur_pos[0] + dir[0], cur_pos[1] + dir[1])

                if grid.crossBound(pos):
                    continue
                neibor_val = grid.getCellValue(pos)

                if neibor_val == val:
                    ret -= val
                ret += abs(val - neibor_val)

    return ret

实验结果

每步最多 $0.2 s e c$ ,

随机测试10次结果是 $20-60\% 2048, 其他1024$

其实系统还可以改进，1是各个feature 的参数，而是将速度提上来，因为运行速度太慢，很多节点根本没有达到最大搜索深度 $16$ 步，提升速度的方式我知道的有将整个 grid按位压缩成一个 64bit的整数，这样会更快。

最后完整代码见 github

AI 玩2048

核心逻辑

效用评估函数

实验结果

相关阅读

相关文章

相关问答

相关文档