计算机科学的三个错误观念

优质
小牛编辑
141浏览
2023-12-01

Not to rain on everybody's parade, but there are three important ideas from computer science which are, frankly, wrong, and people are starting to notice. Ignore them at your peril.

不是想毁了每个人的三观,不过坦白说计算机科学领域有三种错误的观念,人们已经开始注意到这点。忽视这些必然承受风险。

I'm sure there are more, but these are the three biggies that have been driving me to distraction:

1 . The difficult part about searching is finding enough results, 2 . Anti-aliased text looks better, and 3 . Network software should make resources on the network behave just like local resources.

肯定还有其他的,不过这三个是让我分心的最大三个:

1 . 搜索困难的部分是要发现足够多的结果集 2 . 反锯齿的文本看起来更漂亮 3 . 网络软件应该让网络资源看起来就像在本地一样

Well, all I can say is,

1 . Wrong, 2 . Wrong, 3 . WRONG!

Let us take a quick tour.

评论就是:

1 . 错 2 . 错 3 . 错误至极!

Searching

搜索

Most of the academic work on searching is positively obsessed with problems like "what happens if you search for 'car', and the document you want says 'automobile'".

Indeed there is an awful lot of academic research into concepts likestemming, in which the word you searched for is de-conjugated, so that searching for "searching" also finds documents containing the word "searched" or "sought".

So when the big Internet search engines like Altavista first came out, they bragged about how they found zillions of results. An Altavista search for Joel on Software yields 1,033,555 pages. This is, of course, useless. The known Internet contains maybe a billion pages. By reducing the search from one billion to one million pages, Altavista has done absolutely nothing for me.

The real problem in searching is how to sort the results. In defense of the computer scientists, this is something nobody even noticed until they starting indexing gigantic corpora the size of the Internet.

学术派对于搜索的研究过分执着于这样的问题:“如果你想搜的是‘汽车’,而返回的文件却关于‘轿车’”。

实际上有大量的学术研究对本源进行了研究,所谓本源就是你输入的单词会被去形态变化,例如搜索“正在搜索 ”也会返回包含“搜索过”的文档

所以当大的因特网搜索引擎如Altavisa首先推出的时候,他们吹嘘自己是如何发现数以亿计的结果的。 如果你用Altavisa搜索Joel谈软件会有1,033,555个结果页面。 当然这没有任何意义。 因特网已知共有才越10亿个页面 。 如果说要把搜索结果集从10亿精简到1百万,那么AltaVisa显然没干任何有意义的事情。

But somebody noticed. Larry Page and Sergey Brin over at Googlerealized that ranking the pages in the right order was more important than grabbing every possible page. Their PageRank algorithm is a great way to sort the zillions of results so that the one you want is probably in the top ten. Indeed, search for Joel on Software on Googleand you'll see that it comes up first. On Altavista, it's not even on the first five pages, after which I gave up looking for it.

不过有的人注意到了。 LarryPage和SergeyBrin的Google注意到了把结果页面集按照正确的顺序排列要比抓取每一个相关的页面要重要的多。 他们的PageRank算法是一种排序数以亿计结果的伟大方法,这样你也许就只要看一下前10个结果就好了。 事实上,用Google搜索Joel谈软件,你看到的第一个页面就是了。而在AltaVisa上,网站甚至都不在前五个页面里,所以从那以后我就放弃它了。

Anti-aliased text

反锯齿文本

Antialiasing was invented way back in 1972 at the Architecture Machine Group of MIT, which was later incorporated into the famous Media Lab. The idea is that if you have a color display that is low resolution, you might as well use shades of grey to create the "illusion" of resolution. Here's how that looks:

反锯齿技术早在1972年就由MIT的机器架构组发明了, 该组后来并入了著名的媒体实验室。 中心思想是如果你有一个低分辨率的彩色显示器,你可以用灰色的阴影来创建高分辨率的“幻觉”。 看起来大概就像这样:

Notice that the normal text on the left is nice and sharp, while the antialiased text on the right appears to be blurred on the edges. If you squint or step back a little bit, the normal text has weird "steps" due to the limited resolution of a computer display. But the anti-aliased text looks smoother and more pleasant.

So this is why everybody got excited about anti-aliasing. It's everywhere, now. Microsoft Windows even includes a checkbox to turn it on for all text in the system.

可以看到左边的正常字体通常尖锐而漂亮, 而右边的反锯齿过的字体则在边界上看起来比较模糊。 如果你斜着或者退后一点,因为低分辨率的缘故,正常的字体看起来像有一些奇怪的“阶梯”一样。而反锯齿的文字看起来则非常平滑而更舒服。

这就是为什么每个人都看到反锯齿都很激动。 到处都是,现在微软的Windows操作系统甚至包含了一个单选框来激活整个系统的反锯齿选项。

The problem? If you try to read a paragraph of antialiased text, it just looks blurry. There's nothing I can do about it, it's the truth. Compare these two paragraphs:

问题呢?如果想阅读一段反锯齿的文章,反正看起来就是很模糊。不是说我要做什么,这就是事实。 比较下面两段文字:

The paragraph on the left is not antialiased; the one on the right was antialiased using Corel PHOTO-PAINT. Frankly, antialiased text just looks bad.

左边的段落并没有反锯齿; 而右边的使用了Corel软件进行了反锯齿。 老实说反锯齿的文字看起来反而更糟糕。

Somebody finally noticed this: the Microsoft Typography group. They created several excellent fonts like Georgia and Verdana which are "designed for easy screen readability." Basically, instead of creating a high-resolution font and then trying to hammer it into the pixel grid, they finally accepted the pixel grid as a "given" and designed a font that fits neatly into it. Somebody didn't notice this: the Microsoft Reader group, which is using a form of antialiasing they call "ClearType" designed for color LCD screens, which, I'm sorry, still looks blurry, even on a color LCD screen.

(Before I get lots of irate responses for the graphics professionals among my readers, I should mention that anti-aliasing is still a great technique for two things: headlines and logos, where the overall appearance is more important than the sustained readability; and pictures. Antialiasing is a great way to scale photographic images to smaller sizes.)

终于有人还是注意到了这一点:微软字体工作组。 他们创建过几款超级棒的字体,如Georgia和Verdana。这些字体都是为了让屏幕阅读更轻松。 基本上,他们最后接受了给定大小的像素网格然后在该网格里进行字体设计,而不是设计高分辨率的字体最后再把字体塞进网格里。 有些人则没有注意到这一点: 微软读者工作组, 他们使用一种为彩色LCD屏幕设计的叫做“ClearType”的反锯齿技术, 很抱歉,我觉得看起来还是很模糊,哪怕是在彩色的LCD屏幕上。

Network Transparency

网络透明

Ever since the first networks, the "holy grail" of networking computing has been to provide a programming interface in which you can access remote resources the same way as you access local resources. The network becomes "transparent".

One example of network transparency is the famous RPC (remote procedure call), a system designed so that you can call procedures (subroutines) running on another computer on the network exactly as if they were running on the local computer. An awful lot of energy went into this. Another example, built on top of RPC, is Microsoft's Distributed COM (DCOM), in which you can access objects running on another computer as if they were on the current computer.

自从有了网络, 网络计算的“圣杯”就是要提供这样的一款编程接口,通过这种接口,你访问网络资源就像访问本地资源一样。 网络看起来就像“透明”(不存在)的一样。

网络透明的最著名的一个例子就是RPC(远程过程调用),这个设计的系统能够让你调用运行在另一台机器上的程序(子程序)就好像他们运行在本地一样。 大量的精力被花在了这上面, 另一个例子,就是基于RPC的微软的分布式COM(DCOM),通过这项技术你可以访问运行在另一台计算机上的对象,就好像这些对象运行在本地一样。

Sounds logical, right?

Wrong.

听起来很对,不是么?

错!

There are three very major differences between accessing resources on another machine and accessing resources on the local machine:

1 . Availability, 2 . Latency, and 3 . Reliability.

When you access another machine, there's a good chance that machine will not be available, or the network won't be available. And the speed of the network means that it's likely that the request will take a while: you might be running over a modem at 28.8kbps. Or the other machine might crash, or the network connection might go away while you are talking to the other machine (when the cat trips over the phone cord).

访问另一台机器上的资源和访问本地的资源有着三大本质的区别:

1 . 可用性 2 . 延迟 3 . 可靠性

当你访问另一台机器,很有可能那台机器就不可用,或者网络不可用。 还有网络的速度情况决定了请求可能要持续一段时间:你可能通过一台28.8K的猫上网。或者那台机器刚刚宕机了, 或者你在连接到另一台机器的时候网络连接断掉了 ( 猫踩到了电话线)

Any reliable software that uses the network absolutely must take this into account. Using programming interfaces that hide all this stuff from you is a great way to make a lousy software program.

所有使用网络的可靠软件都必须考虑到这些情况。 使用隐藏这些细节的编程接口只会让你更容易做出很差劲的软件。

A quick example: suppose I've got some software that needs to copy a file from one computer to another. On the Windows platform, the old "transparent" way to do this is to call the usual CopyFile method, using UNC names for the files such as \SERVER\SHARE\Filename.

快速的举个例子:我有个软件要把文件从一台电脑拷到另一台电脑。 在Windows平台上,老式的“透明”方法就是调用CopyFile方法拷贝用UNC命名的文件,如\SERVER\SHARE\Filename

If all is well with the network, this works nicely. But if the file is a megabyte long, and the network is being accessed over a modem, all kinds of things go wrong. The entire application freezes while a megabyte file is transferred. There is no way to make a progress indicator, because when CopyFile was invented, it was assumed that it would always be "fast". There is no way to resume the transfer if the phone connection is lost.

如果网络一切正常,那么拷贝也会非常顺利。 但是如果文件有1MB大,而访问的网络是一个拨号网络,则各种各样的事情都会出错: 在文件传输的时候整个程序都会卡住。 没有办法创建一个进度条显示进度,因为当初CopyFile实现的时候,就是假设这个过程会很快完成。 如果电话连接挂断的话根本没有办法断点续传。

Realistically, if you want to transfer a file over a network, it's better to use an API like FtpOpenFile and its related functions. No, it's not the same as copying a file locally, and it's harder to use, but this function was built with the knowledge that network programming is different than local programming, and it provides hooks to make a progress indicator, to fail gracefully if the network is unavailable or becomes unavailable, and to operate asynchronously.

Conclusion: the next time someone tries to sell you a programming product that lets you access network resources the same was as you access local resources, run full speed in the opposite direction.

实际一点,如果你要通过网络传输一个文件,最好使用像FtpOpenFile这样的编程接口。 不,它肯定不会像从本地拷贝文件一样,它用起来更复杂, 但是这个API接口的设计理念就是网络编程和本地编程是不一样的, 而且它提供了各种各样的回调让你方便的定制进度条, 如果网络不可用或变得不可用的时候能够优雅的退出,异步操作。

结论:下次有人尝试向你推销能够像访问本地资源一样访问远程资源的编程产品的时候,全速逃离。


后记:译者一点的讨论

我想Joel的这种质疑精神肯定是可取的,2000年的时候Joel就极其睿智的见证了Google的崛起无疑证明了他的眼光。 直至现在信息爆炸的结果从来就不是找不到相关的内容,而是找到的内容并不相关。

关于反锯齿的讨论,就今天来说,分辨率早已不是2000年的那个水平, Joel的辩解虽然没错,但是难免还是有他的时代局限性, 现在ClearType的字体已经几乎成了字体的标准,这跟显示设备分辨率的极大改观密不可分。 再者抗锯齿的主要应用还是在图形的显示上。

关于网络访问透明性的讨论: 就一般意义而言,如果程序员对网络资源编程,那么Joel的争辩也是完全正确的, 程序员的工作当然不是隐藏这些细节, 但是反过来说:这种网络设备透明性是不是完全不可取呢? UNIX的那种统一模型是不是完全不可取呢? 也未必,还是分具体的应用场合而言。