git bisect-lk2009

优质
小牛编辑
130浏览
2023-12-01

摘要

“git bisect”使软件用户和开发人员能够轻松找到引入回归的提交。我们将说明为什么有很好的工具来对抗回归很重要。我们描述了“git bisect”如何从外部工作以及内部使用的算法。然后我们解释如何利用“git bisect”来改进当前的实践。我们将讨论“git bisect”在将来如何改进。

“git bisect”介绍

Git 是由 Linus Torvalds 创建并由 Junio Hamano 维护的分布式版本控制系统(DVCS)。

在 Git 中,就像许多其他版本控制系统(VCS)一样,系统管理的数据的不同状态称为提交。而且,由于VCS主要用于管理软件源代码,因此在某些提交中引入了软件中行为的“有趣”更改。

实际上,人们对引入“坏”行为的提交特别感兴趣,称为 bug 或回归。他们对这些提交感兴趣,因为提交(希望)包含一小部分源代码更改。当你只需要检查一小部分变化时,比当你不知道放在哪里时更容易理解和正确地解决问题。

因此,为了帮助人们找到引入“不良”行为的提交,发明了“git bisect”命令集。当然,在“git bisect”的说法中,承认有趣行为出现的地方称为“坏”承诺,而其他承诺则称为“良好”承诺。而引入我们感兴趣的行为的提交被称为“第一次错误提交”。请注意,在我们正在搜索的提交空间中可能会有多个“第一个错误提交”。

所以“git bisect”旨在帮助找到“第一个错误的提交”。为了尽可能高效,它会尝试执行二分查找。

Fighting regressions overview(战斗回归概述)

回归:一个大问题

回归是软件行业的一个大问题。但是很难在这个说法背后加入一些真实的数字。

一般来说,有一些关于错误的数字,比如2002年的 NIST 研究[1]说:

根据美国商务部国家标准学会委托的一项新发布的研究显示,软件错误或错误非常普遍,因此对美国经济造成的损失每年约为595亿美元,约占国内生产总值的0.6%和技术(NIST)。在国家一级,超过一半的费用由软件用户承担,其余由软件开发商/供应商承担。该研究还发现,尽管所有错误都无法消除,但可以通过改进的测试基础设施消除超过三分之一的这些成本或估计的22.2亿美元成本,从而能够更早,更有效地识别和消除软件缺陷。这些节省与找到更接近引入它们的开发阶段的错误百分比(但不是100%)相关。目前,在开发过程或售后软件使用期间,直到“下游”才发现超过一半的错误。

接着:

软件开发人员已经花费了大约80%的开发成本来识别和纠正缺陷,而除软件之外的其他任何类型的产品都很少出现如此高的错误。

最终得出的结论是:

通向更高软件质量的途径是显着改进软件测试。

还有其他估计说,软件相关成本的80%是关于维护[2]。

尽管如此,根据维基百科[3]:

对维护的一种普遍看法是,它只是修复错误。然而,多年来的研究和调查表明,大部分超过80%的维护工作都用于非纠正措施(Pigosky,1997)。用户提交的问题报告实际上是对系统功能的增强,这种看法延续下去。

但是我们可以猜测,对现有软件的改进是非常昂贵的,因为你必须留意回归。至少这会使上述研究相互一致。

当然,某种软件是开发出来的,然后在一段时间内使用而没有多少改进,然后最终被扔掉。当然,在这种情况下,回归可能不是一个大问题。但另一方面,很多人在数年甚至数十年不断开发和维护大量软件。而且由于经常有很多人依赖(有时是批判地)这样的软件,所以回归是一个非常大的问题。

一个这样的软件是 Linux 内核。如果我们看看 Linux 内核,我们可以看到花了很多时间和精力来应对回归。发布周期以2周的合并窗口开始。然后标记第一个版本候选版本(rc)版本。之后,在最终版本发布之前,大约会有7或8个 rc 版本,每个版本之间会出现大约一周的时间。

第一次 rc 发布和最终发布之间的时间应该被用来测试 rc 版本和防止bug,尤其是回归。这一次超过发布周期的80%。但这还没有结束,因为它在发布之后当然会继续。

然后这就是 Ingo Molnar(一位知名的Linux内核开发人员)对他使用 git bisect 的说法:

我在合并窗口期间最积极地使用它(当很多树在上游合并并且 bug 流入量最高时) - 是的,我曾经有过多次使用它的情况。我的平均一天大概是一次。

因此,开发人员一直在进行回归测试,事实上,众所周知应尽快修复错误,以便尽快找到漏洞。这就是为什么有很好的工具来达到这个目的。

其他工具来对抗回归

那么用什么工具来对抗回归呢?它们几乎与用于防止常规错误的那些相同。唯一特定的工具是与“git bisect”类似的测试套件和工具。

测试套件非常好。但是,当它们单独使用时,应该使用它们,以便在每次提交后检查所有测试。这意味着它们效率不高,因为许多测试运行时没有有趣的结果,并且它们遭受了组合式爆炸。

事实上,问题在于大型软件通常有许多不同的配置选项,并且每次提交之后,每个测试用例都应该通过每个配置。因此,如果您对每个版本都有:N 配置,M 提交和 T 测试用例,则应执行:

N * M * T tests

其中 N,M 和 T 都随着软件的大小而增长。

所以很快就不可能完全测试一切。

如果一些错误通过你的测试套件,那么你可以添加一个测试到你的测试套件。但是如果你想使用你的新改进的测试套件来发现错误在哪里滑入,那么你将不得不模拟一个二分过程,否则你可能会直截了当地从每个提交的“坏”提交开始测试每个提交,这可能是非常浪费。

“git bisect”概述

开始一个平分点

要使用的第一个“git bisect”子命令是“git bisect start”来开始搜索。然后必须设置边界来限制提交空间。这通常是通过给予一个“不好的”和至少一个“好的”提交来完成的。他们可以在初次调用时传递给“git bisect start”,如下所示:

$ git bisect start [BAD [GOOD...]]

或者可以使用以下设置:

$ git bisect bad [COMMIT]

和:

$ git bisect good [COMMIT...]

BAD,GOOD 和 COMMIT 都是可以解析为提交的名称。

然后,“git bisect”将检出其选择的提交并要求用户对其进行测试,如下所示:

$ git bisect start v2.6.27 v2.6.25Bisecting: 10928 revisions left to test after this (roughly 14 steps)[2ec65f8b89ea003c27ff7723525a2ee335a2b393] x86: clean up using max_low_pfn on 32-bit

请注意,我们将使用的示例实际上就是一个玩具示例,我们将查找具有“2.6.26-something”版本的第一个提交,即提交的“SUBLEVEL = 26”行顶级 Makefile。这是一个玩具示例,因为使用 Git 比使用“git bisect”(例如“git blame”或“git log -S <string>”)更好地找到此提交。

手动驱动二等分

在这一点上,基本上有两种方法来驱动搜索。它可以由用户手动驱动,也可以由脚本或命令自动驱动。

如果用户正在驱动它,那么在搜索的每一步中,用户将不得不测试当前提交,并使用“git bisect good”或“git bisect bad”命令来判断它是“好”还是“坏”分别如上所述。例如:

$ git bisect bad
Bisecting: 5480 revisions left to test after this (roughly 13 steps)[66c0b394f08fd89236515c1c84485ea712a157be] KVM: kill file->f_count abuse in kvm

在经过几步之后,“git bisect”最终会发现第一个错误的提交:

$ git bisect bad
2ddcca36c8bcfa251724fe342c8327451988be0d is the first bad commit
commit 2ddcca36c8bcfa251724fe342c8327451988be0d
Author: Linus Torvalds <torvalds@linux-foundation.org>Date:   Sat May 3 11:59:44 2008 -0700

    Linux 2.6.26-rc1:100644 100644 5cf82581... 4492984e... M      Makefile

在这一点上,我们可以看到提交的内容,检查出它(如果它没有被检出)或修改它,例如:

$ git show HEAD
commit 2ddcca36c8bcfa251724fe342c8327451988be0d
Author: Linus Torvalds <torvalds@linux-foundation.org>Date:   Sat May 3 11:59:44 2008 -0700

    Linux 2.6.26-rc1

diff --git a/Makefile b/Makefile
index 5cf8258..4492984 100644--- a/Makefile+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6-SUBLEVEL = 25-EXTRAVERSION =+SUBLEVEL = 26+EXTRAVERSION = -rc1
 NAME = Funky Weasel is Jiggy wit it

 # *DOCUMENTATION*

当我们完成后,我们可以使用“git bisect reset”重新开始我们在开始平分前的分支:

$ git bisect reset
Checking out files: 100% (21549/21549), done.Previous HEAD position was 2ddcca3... Linux 2.6.26-rc1
Switched to branch 'master'

自动驱动二等分

驱动二等分过程的另一种方式是告诉“git bisect”在每个二等分步骤中启动脚本或命令,以确定当前提交是“好”还是“坏”。为此,我们使用“git bisect run”命令。例如:

$ git bisect start v2.6.27 v2.6.25Bisecting: 10928 revisions left to test after this (roughly 14 steps)[2ec65f8b89ea003c27ff7723525a2ee335a2b393] x86: clean up using max_low_pfn on 32-bit
$
$ git bisect run grep '^SUBLEVEL = 25' Makefile
running grep ^SUBLEVEL = 25 Makefile
Bisecting: 5480 revisions left to test after this (roughly 13 steps)[66c0b394f08fd89236515c1c84485ea712a157be] KVM: kill file->f_count abuse in kvm
running grep ^SUBLEVEL = 25 Makefile
SUBLEVEL = 25Bisecting: 2740 revisions left to test after this (roughly 12 steps)[671294719628f1671faefd4882764886f8ad08cb] V4L/DVB(7879): Adding cx18 Support for mxl5005s......running grep ^SUBLEVEL = 25 Makefile
Bisecting: 0 revisions left to test after this (roughly 0 steps)[2ddcca36c8bcfa251724fe342c8327451988be0d] Linux 2.6.26-rc1
running grep ^SUBLEVEL = 25 Makefile
2ddcca36c8bcfa251724fe342c8327451988be0d is the first bad commit
commit 2ddcca36c8bcfa251724fe342c8327451988be0d
Author: Linus Torvalds <torvalds@linux-foundation.org>Date:   Sat May 3 11:59:44 2008 -0700

    Linux 2.6.26-rc1:100644 100644 5cf82581... 4492984e... M      Makefile
bisect run success

在这个例子中,我们将“grep ^SUBLEVEL = 25Makefile”作为参数传递给“git bisect run”。这意味着在每一步中,我们通过的 grep 命令将会启动。如果代码0(表示成功)退出,那么git bisect会将当前状态标记为“良好”。如果它以代码1退出(或包含1到127之间的任何代码,除特殊代码125外),则当前状态将被标记为“不良”。

退出128到255之间的代码对于“git bisect run”是特别的。他们立即停止平分过程。例如,如果传递的命令需要很长的时间才能完成,那么这很有用,因为你可以用一个信号杀停它并停止平分过程。

如果检测到一些非常异常的情况,它也可以用于传递给“git bisect run”到“exit 255”的脚本。

避免不可测试的提交

有时会发生,当前状态不能被测试,例如,如果它不编译,因为当时有一个阻止它的错误。这是特殊退出 码125 的用途。它告诉“git bisect run”,当前提交应该被标记为不可测试,并且另一个应该被选中并签出。

如果平分过程是手动驱动的,则可以使用“git bisect skip”来做同样的事情。(事实上,特殊的退出码125使得“git bisect run”在后台使用“git bisect skip”。)

或者如果你想要更多的控制,你可以使用例如“git bisect visualize”来检查当前状态。它会启动 gitk(或者如果DISPLAY没有设置环境变量,则使用“git log” )来帮助您找到更好的对分点。

无论哪种方式,如果您有一串不可测试的提交,则可能发生您正在查找的回归已由其中一个不可测试的提交引入。在这种情况下,无法确定哪个提交引入了回归。

所以如果你使用“git bisect skip”(或者用特殊代 代码125 退出运行脚本),你可能会得到如下结果:

There are only 'skip'ped commits left to test.The first bad commit could be any of:15722f2fa328eaba97022898a305ffc8172db6b1
78e86cf3e850bd755bb71831f42e200626fbd1e0
e15b73ad3db9b48d7d1ade32f8cd23a751fe0ace
070eab2303024706f2924822bfec8b9847e4ac1b
We cannot bisect more!

保存日志并重复它

如果你想向其他人展示你的平分过程,你可以使用例如:

$ git bisect log > bisect_log.txt

并且可以使用以下方法进行重放:

$ git bisect replay bisect_log.txt

“git bisect”细节

二分算法

随着 Git 提交形成一个有向无环图(DAG),在每一步中找到最好的二等分提交并不是那么简单。无论如何, Linus 发现并实施了一个“真正的愚蠢”算法,后来由 Junio Hamano 改进,效果很好。

因此,当没有跳过提交时,“git bisect”使用的算法可以找到最佳的二等分提交,如下所示:

1)只保留提交:

a)是“坏”提交的祖先(包括“坏”提交本身),b)不是“好”提交的祖先(不包括“好”提交)。

这意味着我们摆脱了在 DAG 中无趣的提交。

例如,如果我们从这样的图表开始:

G-Y-G-W-W-W-X-X-X-X
           \ /
            W-W-B           /Y---G-W---W
 \ /   \
Y-Y     X-X-X-X-> time goes this way ->

其中 B 是“坏”提交,“G”是“好”提交,W,X 和 Y 是其他提交,我们将在第一步之后得到以下图表:

W-W-W
     \
      W-W-B     /W---W

所以只有 W 和 B 的提交将被保留。因为提交 X 和 Y 将分别被规则 a)和 b)删除,并且由于提交 G 也被规则b)删除。

注意 Git 用户,它相当于只保留以下提供的提交:

git rev-list BAD --not GOOD1 GOOD2...

还要注意,我们并不要求那些被认为是“良好”提交的后代的提交。所以在下面的例子中,提交 W 和 Z 将被保留:

G-W-W-W-B   /Z-Z

2)从图的“好”端开始,将每个提交的祖先的数量加上一个

例如下面的图表,其中 H 是“坏”提交,A 和 D 是一些“好”提交的双亲项:

A-B-C
     \
      F-G-H     /D---E

这会给:

1 2 3A-B-C
     \6 7 8
      F-G-H1   2/D---E

3)与每个提交相关联:min(X,N - X)

其中 X 是与步骤2中的提交相关联的值),N 是图中提交的总数。

在上面的例子中,我们有 N = 8,所以这会给出:

1 2 3A-B-C
     \2 1 0
      F-G-H1   2/D---E

4)最好的二等分点是具有最高关联号码的提交

所以在上面的例子中,最好的对分点是提交 C.

5)请注意,为了加快算法,实施了一些快捷方式

因为我们从一开始就知道 N,我们知道 min(X,N - X)不能大 于N / 2。因此,在步骤2)和3)中,如果我们将 N / 2与提交相关联,那么我们知道这是最好的对分点。所以在这种情况下,我们可以停止处理任何其他提交并返回当前提交。

对分算法调试

对于任何提交图,您可以使用“git rev-list --bisect-all”查看与每次提交相关的编号。

例如,对于上面的图,一个命令如:

$ git rev-list --bisect-all BAD --not GOOD1 GOOD2

会输出如下内容:

e15b73ad3db9b48d7d1ade32f8cd23a751fe0ace (dist=3)15722f2fa328eaba97022898a305ffc8172db6b1 (dist=2)78e86cf3e850bd755bb71831f42e200626fbd1e0 (dist=2)a1939d9a142de972094af4dde9a544e577ddef0e (dist=2)070eab2303024706f2924822bfec8b9847e4ac1b (dist=1)a3864d4f32a3bf5ed177ddef598490a08760b70d (dist=1)a41baa717dd74f1180abf55e9341bc7a0bb9d556 (dist=1)9e622a6dad403b71c40979743bb9d5be17b16bd6 (dist=0)

讨论了二分算法

首先让我们定义“最佳平分点”。我们会说如果知道它的状态(“好”或“坏”)提供了尽可能多的信息,那么提交X是最好的平分点还是最好的平分提交,无论提交的状态是“好”还是“坏”。

这意味着最好的二等分提交是以下函数最大的提交:

f(X) = min(information_if_good(X), information_if_bad(X))

其中 information_if_good(X)是我们得到的信息,如果X 很好,而 information_if_bad(X)是我们在X不好的情况下得到的信息。

现在我们假设只有一个“第一次错误提交”。这意味着它的所有后代都是“坏的”,其他所有的提交都是“好的”。我们假设所有的提交都有好的或坏的概率,或者是第一个坏提交的概率,所以知道 c 提交的状态总是提供相同数量的信息,无论这些 c 提交在图上,以及任何 c 是。(所以我们假设这些提交例如在一个分支上或接近一个好的或不好的提交时不会提供更多或更少的信息)。

我们还假设在上面的二分算法中,我们有一个像步骤1)之后的清理图。这意味着我们可以通过我们可以从图中删除的提交次数来衡量我们得到的信息。

让我们在图中提交一个 X 提交。

如果 X 被发现是“好的”,那么我们知道它的祖先都是“好的”,所以我们想说:

information_if_good(X) = number_of_ancestors(X)  (TRUE)

这是真实的,因为在步骤1)b)我们删除了“良好”提交的祖先。

如果发现 X 是“坏的”,那么我们知道它的后代都是“坏的”,所以我们想说:

information_if_bad(X) = number_of_descendants(X)  (WRONG)

但是这是错误的,因为在步骤1)a)我们只保留坏提交的祖先。所以当一个提交被标记为“不好”时,我们会得到更多的信息,因为我们也知道,先前的“坏”提交的祖先不是新的“坏”提交的祖先,它不是第一个错误提交。我们不知道它们是好还是坏,但我们知道它们不是第一个不好的提交,因为它们不是新的“坏”提交的祖先。

所以当一个提交被标记为“坏”时,我们知道我们可以删除图中的所有提交,除了那些是新的“坏”提交的祖先的提交。这意味着:

information_if_bad(X) = N - number_of_ancestors(X)  (TRUE)

其中 N 是(清理)图中的提交数。

所以最终这意味着要找到最佳的二等分提交,我们应该最大化该函数:

f(X) = min(number_of_ancestors(X), N - number_of_ancestors(X))

这很好,因为在步骤2)我们计算 number_of_ancestors(X),所以在步骤3)我们计算 f(X)。

我们以下面的图为例:

            G-H-I-J           /       \
A-B-C-D-E-F         O
           \       /
            K-L-M-N

如果我们计算下面的非最优函数:

g(X) = min(number_of_ancestors(X), number_of_descendants(X))

我们得到:

            4 3 2 1
            G-H-I-J1 2 3 4 5 6/       \0A-B-C-D-E-F         O
           \       /
            K-L-M-N            4 3 2 1

但用 git bisect 使用的算法,我们得到:

            7 7 6 5
            G-H-I-J1 2 3 4 5 6/       \0A-B-C-D-E-F         O
           \       /
            K-L-M-N            7 7 6 5

所以我们选择 G,H,K 或 L 作为最好的二等分点,这比 F 好。因为如果例如L不好,那么我们不仅会知道 L,M 和 N 是坏的,而且 G,H ,I 和 J 不是第一个错误提交(因为我们假设只有一个第一个错误提交,它必须是 L 的祖先)。

所以目前的算法似乎是我们最初设想的最好的。

跳过算法

当一些提交被跳过时(使用“git bisect skip”),那么对于步骤1)到3),二分算法是相同的。但是,我们大致使用以下步骤:

6)通过减少关联值来排序提交

7)如果第一次提交没有被跳过,我们可以返回并停在这里

8)否则过滤掉排序列表中的所有跳过的提交

9)使用伪随机数发生器(PRNG)产生一个介于0和1之间的随机数

10)将此随机数乘以其平方根以将其偏向0

11)将结果乘以过滤列表中的提交数,以获得该列表中的索引

12)在计算出的索引处返回提交

跳过算法讨论

在步骤7)之后(在跳过算法中),我们可以检查第二个提交是否被跳过,如果不是这样,则返回它。实际上,这是我们在 Git 1.5.4版(2008年2月1日发布)中开发“git bisect skip”之前使用的算法,直到 Git 1.6.4版(2009年7月29日发布)为止。

但 Ingo Molnar和H. Peter Anvin(另一位着名的Linux内核开发人员)都抱怨说,有时候最好的二等分点都发生在所有提交都无法测试的区域。在这种情况下,用户被要求测试许多不可测试的提交,这可能是非常低效的。

实际上,不可测试的提交通常是不可测试的,因为一次引入了破坏,并且只有在引入了其他许多提交之后,破坏才得以解决。

这种破坏当然大部分时间与我们试图在提交图中找到的破坏无关。但它阻止我们知道这个有趣的“不良行为”是否存在。

因此,接近无法测试的提交很有可能无法自行测试。最好的二等分提交也经常在一起发现(由于二分算法)。

这就是为什么当第一个被跳过的时候选择下一个最好的没有加载的分叉提交是一个坏主意。

我们发现图表上的大部分提交可能会在测试时提供相当多的信息。那些平均不会提供大量信息的提交是接近好的和不好的提交的提交。

因此,使用带有偏见的 PRNG 承诺从好的和不好的提交看起来是一个不错的选择。

对这种算法的一个明显的改进是在使用 PRNG 之前查找一个提交值,该提交值具有与最好的二等分提交之一相关的值,并且位于另一个分支上。因为如果存在这样的提交,那么它不太可能也是不可测试的,所以它可能比几乎随机选择的提供更多的信息。

检查合并基础

在二分法算法中还有一个调整,在上面的“二分算法”中没有描述过。

We supposed in the previous examples that the "good" commits were ancestors of the "bad" commit. But this is not a requirement of "git bisect".

Of course the "bad" commit cannot be an ancestor of a "good" commit, because the ancestors of the good commits are supposed to be "good". And all the "good" commits must be related to the bad commit. They cannot be on a branch that has no link with the branch of the "bad" commit. But it is possible for a good commit to be related to a bad commit and yet not be neither one of its ancestor nor one of its descendants.

For example, there can be a "main" branch, and a "dev" branch that was forked of the main branch at a commit named "D" like this:

A-B-C-D-E-F-G  <--main
       \
        H-I-J  <--dev

The commit "D" is called a "merge base" for branch "main" and "dev" because it’s the best common ancestor for these branches for a merge.

Now let’s suppose that commit J is bad and commit G is good and that we apply the bisection algorithm like it has been previously described.

As described in step 1) b) of the bisection algorithm, we remove all the ancestors of the good commits because they are supposed to be good too.

So we would be left with only:

H-I-J

But what happens if the first bad commit is "B" and if it has been fixed in the "main" branch by commit "F"?

The result of such a bisection would be that we would find that H is the first bad commit, when in fact it’s B. So that would be wrong!

And yes it can happen in practice that people working on one branch are not aware that people working on another branch fixed a bug! It could also happen that F fixed more than one bug or that it is a revert of some big development effort that was not ready to be released.

In fact development teams often maintain both a development branch and a maintenance branch, and it would be quite easy for them if "git bisect" just worked when they want to bisect a regression on the development branch that is not on the maintenance branch. They should be able to start bisecting using:

$ git bisect start dev main

To enable that additional nice feature, when a bisection is started and when some good commits are not ancestors of the bad commit, we first compute the merge bases between the bad and the good commits and we chose these merge bases as the first commits that will be checked out and tested.

If it happens that one merge base is bad, then the bisection process is stopped with a message like:

The merge base BBBBBB is bad.This means the bug has been fixed between BBBBBB and [GGGGGG,...].

where BBBBBB is the sha1 hash of the bad merge base and GGGGGG,… is a comma separated list of the sha1 of the good commits.

If some of the merge bases are skipped, then the bisection process continues, but the following message is printed for each skipped merge base:

Warning: the merge base between BBBBBB and [GGGGGG,...] must be skipped.So we cannot be sure the first bad commit is between MMMMMM and BBBBBB.We continue anyway.

where BBBBBB is the sha1 hash of the bad commit, MMMMMM is the sha1 hash of the merge base that is skipped and GGGGGG,… is a comma separated list of the sha1 of the good commits.

So if there is no bad merge base, the bisection process continues as usual after this step.

Best bisecting practices

Using test suites and git bisect together

If you both have a test suite and use git bisect, then it becomes less important to check that all tests pass after each commit. Though of course it is probably a good idea to have some checks to avoid breaking too many things because it could make bisecting other bugs more difficult.

You can focus your efforts to check at a few points (for example rc and beta releases) that all the T test cases pass for all the N configurations. And when some tests don’t pass you can use "git bisect" (or better "git bisect run"). So you should perform roughly:

c * N * T + b * M * log2(M) tests

where c is the number of rounds of test (so a small constant) and b is the ratio of bug per commit (hopefully a small constant too).

So of course it’s much better as it’s O(N * T) vs O(N * T * M) if you would test everything after each commit.

This means that test suites are good to prevent some bugs from being committed and they are also quite good to tell you that you have some bugs. But they are not so good to tell you where some bugs have been introduced. To tell you that efficiently, git bisect is needed.

The other nice thing with test suites, is that when you have one, you already know how to test for bad behavior. So you can use this knowledge to create a new test case for "git bisect" when it appears that there is a regression. So it will be easier to bisect the bug and fix it. And then you can add the test case you just created to your test suite.

So if you know how to create test cases and how to bisect, you will be subject to a virtuous circle:

more tests ⇒ easier to create tests ⇒ easier to bisect ⇒ more tests

So test suites and "git bisect" are complementary tools that are very powerful and efficient when used together.

Bisecting build failures

You can very easily automatically bisect broken builds using something like:

$ git bisect start BAD GOOD
$ git bisect run make

Passing sh -c "some commands" to "git bisect run"

例如:

$ git bisect run sh -c "make || exit 125; ./my_app | grep 'good output'"

另一方面,如果你经常这样做,那么可以使用脚本来避免太多的输入。

寻找绩效回归

以下是一个示例脚本,它由 Junio Hamano [4] 使用的真实世界脚本稍微修改而来。

可以将此脚本传递给“git bisect run”以查找引入性能回归的提交:

#!/bin/sh

# Build errors are not what I am interested in.make my_app || exit 255# We are checking if it stops in a reasonable amount of time, so
# let it run in the background..../my_app >log 2>&1 &# ... and grab its process ID.pid=$!# ... and then wait for sufficiently long.sleep $NORMAL_TIME

# ... and then see if the process is still there.if kill -0 $pid
then
        # It is still running -- that is bad.
        kill $pid; sleep 1; kill $pid;
        exit 1else
        # It has already finished (the $pid process was no more),
        # and we are happy.
        exit 0fi

遵循一般最佳实践

很明显,不要在明知故障的情况下提交更改,即使其他一些提交后来修复了损坏。

使用任何 VCS 在每个提交中只有一个小的逻辑更改也是一个好主意。

The smaller the changes in your commit, the most effective "git bisect" will be. And you will probably need "git bisect" less in the first place, as small changes are easier to review even if they are only reviewed by the committer.

Another good idea is to have good commit messages. They can be very helpful to understand why some changes were made.

These general best practices are very helpful if you bisect often.

Avoiding bug prone merges

First merges by themselves can introduce some regressions even when the merge needs no source code conflict resolution. This is because a semantic change can happen in one branch while the other branch is not aware of it.

For example one branch can change the semantic of a function while the other branch add more calls to the same function.

This is made much worse if many files have to be fixed to resolve conflicts. That’s why such merges are called "evil merges". They can make regressions very difficult to track down. It can even be misleading to know the first bad commit if it happens to be such a merge, because people might think that the bug comes from bad conflict resolution when it comes from a semantic change in one branch.

Anyway "git rebase" can be used to linearize history. This can be used either to avoid merging in the first place. Or it can be used to bisect on a linear history instead of the non linear one, as this should give more information in case of a semantic change in one branch.

Merges can be also made simpler by using smaller branches or by using many topic branches instead of only long version related branches.

And testing can be done more often in special integration branches like linux-next for the linux kernel.

Adapting your work-flow

A special work-flow to process regressions can give great results.

Here is an example of a work-flow used by Andreas Ericsson:

  • write, in the test suite, a test script that exposes the regression
  • use "git bisect run" to find the commit that introduced it
  • fix the bug that is often made obvious by the previous step
  • commit both the fix and the test script (and if needed more tests)

And here is what Andreas said about this work-flow [5]:

To give some hard figures, we used to have an average report-to-fix cycle of 142.6 hours (according to our somewhat weird bug-tracker which just measures wall-clock time). Since we moved to Git, we’ve lowered that to 16.2 hours. Primarily because we can stay on top of the bug fixing now, and because everyone’s jockeying to get to fix bugs (we’re quite proud of how lazy we are to let Git find the bugs for us). Each new release results in ~40% fewer bugs (almost certainly due to how we now feel about writing tests).

Clearly this work-flow uses the virtuous circle between test suites and "git bisect". In fact it makes it the standard procedure to deal with regression.

In other messages Andreas says that they also use the "best practices" described above: small logical commits, topic branches, no evil merge,… These practices all improve the bisectability of the commit graph, by making it easier and more useful to bisect.

So a good work-flow should be designed around the above points. That is making bisecting easier, more useful and standard.

Involving QA people and if possible end users

One nice about "git bisect" is that it is not only a developer tool. It can effectively be used by QA people or even end users (if they have access to the source code or if they can get access to all the builds).

There was a discussion at one point on the linux kernel mailing list of whether it was ok to always ask end user to bisect, and very good points were made to support the point of view that it is ok.

For example David Miller wrote [6]:

What people don’t get is that this is a situation where the "end node principle" applies. When you have limited resources (here: developers) you don’t push the bulk of the burden upon them. Instead you push things out to the resource you have a lot of, the end nodes (here: users), so that the situation actually scales.

This means that it is often "cheaper" if QA people or end users can do it.

What is interesting too is that end users that are reporting bugs (or QA people that reproduced a bug) have access to the environment where the bug happens. So they can often more easily reproduce a regression. And if they can bisect, then more information will be extracted from the environment where the bug happens, which means that it will be easier to understand and then fix the bug.

For open source projects it can be a good way to get more useful contributions from end users, and to introduce them to QA and development activities.

Using complex scripts

In some cases like for kernel development it can be worth developing complex scripts to be able to fully automate bisecting.

Here is what Ingo Molnar says about that [7]:

i have a fully automated bootup-hang bisection script. It is based on "git-bisect run". I run the script, it builds and boots kernels fully automatically, and when the bootup fails (the script notices that via the serial log, which it continuously watches - or via a timeout, if the system does not come up within 10 minutes it’s a "bad" kernel), the script raises my attention via a beep and i power cycle the test box. (yeah, i should make use of a managed power outlet to 100% automate it)

Combining test suites, git bisect and other systems together

We have seen that test suites an git bisect are very powerful when used together. It can be even more powerful if you can combine them with other systems.

For example some test suites could be run automatically at night with some unusual (or even random) configurations. And if a regression is found by a test suite, then "git bisect" can be automatically launched, and its result can be emailed to the author of the first bad commit found by "git bisect", and perhaps other people too. And a new entry in the bug tracking system could be automatically created too.

The future of bisecting

"git replace"

We saw earlier that "git bisect skip" is now using a PRNG to try to avoid areas in the commit graph where commits are untestable. The problem is that sometimes the first bad commit will be in an untestable area.

To simplify the discussion we will suppose that the untestable area is a simple string of commits and that it was created by a breakage introduced by one commit (let’s call it BBC for bisect breaking commit) and later fixed by another one (let’s call it BFC for bisect fixing commit).

For example:

...-Y-BBC-X1-X2-X3-X4-X5-X6-BFC-Z-...

where we know that Y is good and BFC is bad, and where BBC and X1 to X6 are untestable.

In this case if you are bisecting manually, what you can do is create a special branch that starts just before the BBC. The first commit in this branch should be the BBC with the BFC squashed into it. And the other commits in the branch should be the commits between BBC and BFC rebased on the first commit of the branch and then the commit after BFC also rebased on.

For example:

      (BBC+BFC)-X1'-X2'-X3'-X4'-X5'-X6'-Z'     /...-Y-BBC-X1-X2-X3-X4-X5-X6-BFC-Z-...

where commits quoted with ' have been rebased.

You can easily create such a branch with Git using interactive rebase.

For example using:

$ git rebase -i Y Z

and then moving BFC after BBC and squashing it.

After that you can start bisecting as usual in the new branch and you should eventually find the first bad commit.

For example:

$ git bisect start Z' Y

If you are using "git bisect run", you can use the same manual fix up as above, and then start another "git bisect run" in the special branch. Or as the "git bisect" man page says, the script passed to "git bisect run" can apply a patch before it compiles and test the software [8]. The patch should turn a current untestable commits into a testable one. So the testing will result in "good" or "bad" and "git bisect" will be able to find the first bad commit. And the script should not forget to remove the patch once the testing is done before exiting from the script.

(Note that instead of a patch you can use "git cherry-pick BFC" to apply the fix, and in this case you should use "git reset --hard HEAD^" to revert the cherry-pick after testing and before returning from the script.)

But the above ways to work around untestable areas are a little bit clunky. Using special branches is nice because these branches can be shared by developers like usual branches, but the risk is that people will get many such branches. And it disrupts the normal "git bisect" work-flow. So, if you want to use "git bisect run" completely automatically, you have to add special code in your script to restart bisection in the special branches.

Anyway one can notice in the above special branch example that the Z' and Z commits should point to the same source code state (the same "tree" in git parlance). That’s because Z' result from applying the same changes as Z just in a slightly different order.

So if we could just "replace" Z by Z' when we bisect, then we would not need to add anything to a script. It would just work for anyone in the project sharing the special branches and the replacements.

With the example above that would give:

      (BBC+BFC)-X1'-X2'-X3'-X4'-X5'-X6'-Z'-...     /...-Y-BBC-X1-X2-X3-X4-X5-X6-BFC-Z

That’s why the "git replace" command was created. Technically it stores replacements "refs" in the "refs/replace/" hierarchy. These "refs" are like branches (that are stored in "refs/heads/") or tags (that are stored in "refs/tags"), and that means that they can automatically be shared like branches or tags among developers.

"git replace" is a very powerful mechanism. It can be used to fix commits in already released history, for example to change the commit message or the author. And it can also be used instead of git "grafts" to link a repository with another old repository.

In fact it’s this last feature that "sold" it to the Git community, so it is now in the "master" branch of Git’s Git repository and it should be released in Git 1.6.5 in October or November 2009.

One problem with "git replace" is that currently it stores all the replacements refs in "refs/replace/", but it would be perhaps better if the replacement refs that are useful only for bisecting would be in "refs/replace/bisect/". This way the replacement refs could be used only for bisecting, while other refs directly in "refs/replace/" would be used nearly all the time.

Bisecting sporadic bugs

Another possible improvement to "git bisect" would be to optionally add some redundancy to the tests performed so that it would be more reliable when tracking sporadic bugs.

This has been requested by some kernel developers because some bugs called sporadic bugs do not appear in all the kernel builds because they are very dependent on the compiler output.

The idea is that every 3 test for example, "git bisect" could ask the user to test a commit that has already been found to be "good" or "bad" (because one of its descendants or one of its ancestors has been found to be "good" or "bad" respectively). If it happens that a commit has been previously incorrectly classified then the bisection can be aborted early, hopefully before too many mistakes have been made. Then the user will have to look at what happened and then restart the bisection using a fixed bisect log.

There is already a project called BBChop created by Ealdwulf Wuffinga on Github that does something like that using Bayesian Search Theory [9]:

BBChop is like git bisect (or equivalent), but works when your bug is intermittent. That is, it works in the presence of false negatives (when a version happens to work this time even though it contains the bug). It assumes that there are no false positives (in principle, the same approach would work, but adding it may be non-trivial).

But BBChop is independent of any VCS and it would be easier for Git users to have something integrated in Git.

Conclusion

We have seen that regressions are an important problem, and that "git bisect" has nice features that complement very well practices and other tools, especially test suites, that are generally used to fight regressions. But it might be needed to change some work-flows and (bad) habits to get the most out of it.

Some improvements to the algorithms inside "git bisect" are possible and some new features could help in some cases, but overall "git bisect" works already very well, is used a lot, and is already very useful. To back up that last claim, let’s give the final word to Ingo Molnar when he was asked by the author how much time does he think "git bisect" saves him when he uses it:

a lot. About ten years ago did i do my first bisection of a Linux patch queue. That was prior the Git (and even prior the BitKeeper) days. I literally days spent sorting out patches, creating what in essence were standalone commits that i guessed to be related to that bug. It was a tool of absolute last resort. I’d rather spend days looking at printk output than do a manual patch bisection. With Git bisect it’s a breeze: in the best case i can get a ~15 step kernel bisection done in 20-30 minutes, in an automated way. Even with manual help or when bisecting multiple, overlapping bugs, it’s rarely more than an hour. In fact it’s invaluable because there are bugs i would never even try to debug if it wasn’t for git bisect. In the past there were bug patterns that were immediately hopeless for me to debug - at best i could send the crash/bug signature to lkml and hope that someone else can think of something. And even if a bisection fails today it tells us something valuable about the bug: that it’s non-deterministic - timing or kernel image layout dependent. So git bisect is unconditional goodness - and feel free to quote that ;-)

Acknowledgments

Many thanks to Junio Hamano for his help in reviewing this paper, for reviewing the patches I sent to the Git mailing list, for discussing some ideas and helping me improve them, for improving "git bisect" a lot and for his awesome work in maintaining and developing Git.

Many thanks to Ingo Molnar for giving me very useful information that appears in this paper, for commenting on this paper, for his suggestions to improve "git bisect" and for evangelizing "git bisect" on the linux kernel mailing lists.

Many thanks to Linus Torvalds for inventing, developing and evangelizing "git bisect", Git and Linux.

Many thanks to the many other great people who helped one way or another when I worked on Git, especially to Andreas Ericsson, Johannes Schindelin, H. Peter Anvin, Daniel Barkalow, Bill Lear, John Hawley, Shawn O. Pierce, Jeff King, Sam Vilain, Jon Seymour.

Many thanks to the Linux-Kongress program committee for choosing the author to given a talk and for publishing this paper.