Computers can tell what will matter (slightly) better than humans can
电脑可以比人类更清楚的分辨出哪个是更好的
As 2019 draws to a close, prepare for endless roundups of the year’s most important news stories. But few of those stories may be remembered by 2039: new research shows the difficulty of predicting which events will make the history books.
Philosopher Arthur Danto argued in 1965 that even the most informed person, an “ideal chronicler,” cannot judge a recent event’s ultimate significance because it depends on chain reactions that have not happened yet. Duncan Watts, a computa- tional social scientist at the University of Pennsylvania, had long wanted to test Dan to’s idea. He got his chance when Columbia University historian Matthew Connelly suggested analyzing a set of two million declassified State Department cables sent between 1973 and 1979, along with a compendium of the 0.1 percent of them that turned out to be the most historically important (compiled by historians decades after their transmission).
随着2019年快结束的时候,今年最重要的新闻故事正在准备着。但在2039年的时候,这些故事很少被记住:新的研究显示:很难预测哪些事件将会载入史册。哲学家Arthur Danto在1965年认为即使是最博学的人来看待这件事也会产生争论。一个理想的记录着无法预测一个最近的事件最终的意义,因为它取决于至今还未发生事件的反应链。宾夕法尼亚大学的计算机社会科学家Duncan Watts说道,他一直想测试一下Arthur Danto的想法。他获得了一个机会,在哥伦比亚大学历史学家Matthew Connelly建议分析了1973年至1979年美国国务院发送的2百万份解密数据,连同内容中0.1%的概要描述部分,这些都被证明是最重要的(由历史学家在传播了几十年后编译)。
Connelly, Watts and their colleagues first scored each cable’s “perceived contemporaneous importance” (PCI), based on metadata such as how urgent or secret it had been rated. This score corresponded only weakly with inclusion in the later compendium, they reported in September in Nature Human Behaviour: the highest-scoring cables were only four percentage points more likely to be included than the lowest- scoring ones. The most common prediction errors were false positives—cables that got high scores but later proved unimportant.“I do think there’s a kind of narcissism of the present,” Connelly says. “I’ve been struck by how many times sports fans say, ‘That’s one for the history books.’ ”
Matthew Connelly与Duncan Watts和他们的团队首先对线索的“可感知同期重要性”(PCI)进行了基于紧急程度和秘密程度的元数据的评分。这个在后期概述中的评分标准仅作为较弱的分数项,他们在9月将这个研究成果发布在了Nature Human Behaviour杂志上:最高评分比最低评分的只有4%的差距。大多数常见的预测错误是因为部分拥有高评分的线索到最后被证明为并不重要。Matthew Connelly说道:体育迷在球场上反复的口号让我感到震惊,而我也确实认为现在存在一种自己对事物认知良好的情况,而这,就是一段足以被记载的历史。
Next, Watts says, to approximate an ideal chronicler, the scientists decided to “build the beefiest, fanciest machinelearn- ing model we could and throw everything into it—all the metadata, all the text.” The resulting AI algorithm significantly out performed humans’ contemporaneous judgment. In one statistical measure of its ability to pick out cables later deemed significant, where 1 denotes no incorrect inclusions or exclusions, it scored 0.14, whereas the PCI scored 0.05. Although the algorithm’s performance was far from perfect, the researchers suggest that such an “artificial archivist” could help to narrow the field of events to highlight for posterity. When tuned for this purpose, their model weeded out 96 percent of the cables while retaining 80 percent of those that wound up in the compendium.
接下来,Watts说道:为了使自己与一理想的记录者更接近,科学家决定去构建强力,流行的极其学习模型,我们可以将任何东西抛给它(对它来说是所有的元数据),所有的文本。而这个AI算法的判断力明显优于当前人类的判断。它能在后来的测量统计中识别出哪些东西被认为是重要的,1表示包含和排除的数据中没有不正确的,它的分值是0.14,PCI成绩是0.05,尽管这个算法的性能离完美还有一段距离,但依然产生了巨大的作用。研究人员建议:例如“人工档案管理员”可以帮助缩小事件范围,有助于为以后查询时突出重点。当调整到这个程度时,他们的模型淘汰了96%的线索,直至剩余80%线索时停止。
Emily Erikson, a sociologist at Yale University, who was not involved in the new research, says that despite its use of imperfect data—compendium inclusion was up to the subjective judgment of a few historians, for example—the study offers a practical tool and addresses Danto’s hypothesis. “To see a machine-learning empirical test of this conceptual puzzle is really exciting,” she says, “and just kind of fun to think through.”
耶鲁大学社会科学家Emily Erikson没有参与到这项新研究中,说道:尽管使用了不完整的数据-是否包含概览取决于一些历史学家的主观臆断,例如-这项研究提供了一个实际工具并证明了Danto’s 假设,她说,当看到通过机器学习解决概念的问题被证实时真的令人兴奋,即使是想想都觉得很有趣。
—Matthew Hutson