1. 关于 diff
diff分析两个文件并打印不同的行。
本质上,它输出一组指令,用于如何更改第一个文件使其与第二个文件相同。
它实际上并没有改变文件; 然而,它可以为程序ed(或者可以用来应用更改的ex)生成一个脚本(带-e选项)来落实这些改变。
2. diff 如何工作
2.1. 例子1
我们有个2个文件, file1.txt 和 file2.txt
file1.txt 内容如下:
I need to buy apples.
I need to run the laundry.
I need to wash the dog.
I need to get the car detailed.
file2.txt内容如下:
I need to buy apples.
I need to do the laundry.
I need to wash the car.
I need to get the dog detailed.
我们可以使用diff命令来自动显示 这两个文件的哪些行不同。
diff file1.txt file2.txt
输出如下:
2,4c2,4
< I need to run the laundry.
< I need to wash the dog.
< I need to get the car detailed.
---
> I need to do the laundry.
> I need to wash the car.
> I need to get the dog detailed.
我们来看下输出的含义。 一个需要记住的重要前提是,diff用于描述不同,在规定的上下文中,来说明如何改变第一个文件来与第二个文件相匹配。
输出的第一行将包含:
第一个文件的对应行号,
*一个字母(a:添加, c:改变 ,d:删除)
*对应的第2个文件的行号。
在上面的输出中,“2,4c2,4” 表示:“第一个文件的第2到第4行需要进行改变,以匹配第2个文件的第2行到第4行”, 然后表示每个文件中的这些行。
< 开头的行表示来自第一个文件
> 开头的行表示来自第二个文件
三个横杠"---" 仅仅表示分隔开文件1和文件2的这些行。
2.2. 例子2
再看另一个例子:
file1.txt:
I need to go to the store.
I need to buy some apples.
When I get home, I'll wash the dog.
file2.txt:
I need to go to the store.
I need to buy some apples.
Oh yeah, I also need to buy grated cheese.
When I get home, I'll wash the dog.
diff file1.txt file2.txt
输出:
2a3
> Oh yeah, I also need to buy grated cheese.
这里,输出表示“第1个文件的第2行之后,需要加一行:第2个文件的第3行”,然后展示了要加的这一行。
现在,让我们来看看当需要删除一行时,diff怎么显示
file1:
I need to go to the store.
I need to buy some apples.
When I get home, I'll wash the dog.
I promise.
file2:
I need to go to the store.
I need to buy some apples.
When I get home, I'll wash the dog.
Our command:
diff file1.txt file2.txt
输出:
4d3
< I promise.
输出表示“需要删除第1个文件中的第4行,以便两个文件在第三行对齐。”,然后输出了需要删除的一行(第1个文件中的第4行)。
2.3. 查看diff的 上下文context模式的输出
上面的例子,展示了diff的缺省输出,这是被计算机读取的,不是给人看的。对于人类,它也提供了看改变的上下文。
GNU diff, 是大部分linux的用户使用的版本,提供了2中不同的方式上下文模式"context mode"和统一模式“unified mode”
为了查看上下文模式中的不同,使用 -c 参数, 举例如下
file1.txt:
apples
oranges
kiwis
carrots
file2.txt:
apples
kiwis
carrots
grapefruits
让我们来看看基于上下文的输出,命令如下:
diff -c file1.txt file2.txt
输出:
*** file1.txt 2014-08-21 17:58:29.764656635 -0400
--- file2.txt 2014-08-21 17:58:50.768989841 -0400
***************
*** 1,4 ****
apples
- oranges
kiwis
carrots
--- 1,4 ----
apples
kiwis
carrots
+ grapefruits
输出的头2行显示了文件信息,"from" file (file1)和 “to” file(file2). 显示了文件名,修改日期,文件修改时间,"from" file 用\“***\”表示,"to" file用“---”表示。
行 "***************" 只是一个分隔符。
下一行有3个星号(\“***\”),后面跟着第1个文件的行范围(在这里,第1行到第4行,中间用逗号分隔)。然后4个星号("***")
然后展示了这些行的内容。 如果行是不变的,则前缀是两个空格,但是,如果行发生变化,则会议指示字符和空格为前缀。
字符的含义如下:
character
meaning
!
表示此行是需要改变的一行或多行的一部分。在另一个文件的上下文中也有一组相应的行以“!”作为前缀。
+
表示第2个文件中需要添加到第1个文件中的行。
-
表示第1个文件中需要删除的行。
After the lines from the first file, there are three dashes ("---"), then a line range, then four dashes ("----"). This indicates the line range in the second file that will sync up with our changes in the first file.
在第一个文件的行之后,有三个破折号(“ --- ”),然后是一个行范围,然后是四个破折号(“ ----”)。这表示第二个文件中的行范围将与我们在第一个文件中的更改同步。
如果有多个部分section需要更改,diff 会 依次显示这些部分。来自第一个文件的行将仍然用“ *** ” 表示,而来自第二个文件的行用“ --- ”表示。
2.4. 统一模式 Unified Mode
统一模式(-u 参数)类似上下文context mode, 但是它不显示任何冗余信息。下面是个例子,使用上个例子中同样的输入文件。
file1.txt:
apples
oranges
kiwis
carrots
file2.txt:
apples
kiwis
carrots
grapefruits
执行命令:
diff -u file1.txt file2.txt
输出:
--- file1.txt 2014-08-21 17:58:29.764656635 -0400
+++ file2.txt 2014-08-21 17:58:50.768989841 -0400
@@ -1,4 +1,4 @@
apples
-oranges
kiwis
carrots
+grapefruits
输出同上一个类似,不同的区别被“统一”到一个集合里。
2.5. 比较目录
diff 可以通过提供目录名来比较2个目录
diff dir1 dir2
输出:
Only in dir1: tab2.gif
Only in dir1: tab3.gif
Only in dir1: tab4.gif
Only in dir1: tape.htm
Only in dir1: tbernoul.htm
Only in dir1: tconner.htm
Only in dir1: tempbus.psd
2.6. 使用diff来生成编辑脚本
参数 -e 可以将差异输出到一个脚本,包含命令的序列,可以由编辑程序 ed 或 ex 使用。 命令是 c(改变),a(添加) 和 d (删除)的组合, 当由编辑器执行时,它将修改 file1的内容 以匹配 file2的内容。
假设我们有2个文件:
file1.txt:
Once upon a time, there was a girl named Persephone.
She had black hair.
She loved her mother more than anything.
She liked to sit outside in the sunshine with her cat, Daisy.
She dreamed of being a painter when she grew up.
file2.txt
Once upon a time, there was a girl named Persephone.
She had red hair.
She loved chocolate chip cookies more than anything.
She liked to sit outside in the sunshine with her cat, Daisy.
She would look up into the clouds and dream of being a world-famous baker.
我们可以接下来的命令来分析两个文件,并且产生一个脚本来从file1的内容创建一个与file2相同内容的文件。
diff -e file1.txt file2.txt
输出将如下所示:
5c
She would look up into the clouds and dream of being a world-famous baker.
.
2,3c
She had red hair.
She loved chocolate chip cookies more than anything.
.
注意,所做的更改按照相反的顺序列出:首先列出接近末尾的更改,最后列出文件开头的更改。这个顺序是为了保存行号;如果我们先在文件的开头进行更改,那么稍后可能会更改之后文件中的行号,所以脚本从最后开始,并反向执行。
脚本是告诉编辑程序:“改变第5行为接下来的行,改变第2到第3行为接下来的2行内容”
接下来,我们应该保存脚本到一个文件,使用“>”操作符将输出重定向到一个文件:如下所示:
diff -e file1.txt file2.txt > my-ed-script.txt
该命令不会在屏幕上显示任何内容(除非有错误发生);相反,输出被重定向到一个叫做 my-ed-script.txt 的文件, 如果文件不存在,就会被创建,如果存在就会被覆盖。
然后我们检查这个文件内容,
cat my-ed-script.txt
会发现同上面的输出内容相同
但是仍然还缺个步骤, 我们需要脚本告诉 ed 要写入文件,缺少的就是一个 w 命令,它将更改写入。 我们可以通过显示字母"w”和追加符号">>"来添加到文件中。(>>类似>, 重定向输出到文件,但是不是覆盖,而是追加)命令如下:
echo "w" >> my-ed-script.txt
现在我们可以检查脚本内容:
cat my-ed-script.txt
5c
She would look up into the clouds and dream of being a world-famous baker.
.
2,3c
She had red hair.
She loved chocolate chip cookies more than anything.
.
w
现在该ed了,做变更并将变更写入磁盘。如何让ed做到呢?
我们可以让ed通过下面的命令执行此脚本,告诉它覆盖原始文件,破折号- 告诉ed 从标准输入读取,
ed - file1.txt < my-ed-script.txt
这个命令什么都不显示,我们看下原始文件的内容:
cat file1.txt
Once upon a time, there was a girl named Persephone.
She had red hair.
She loved chocolate chip cookies more than anything.
She liked to sit outside in the sunshine with her cat, Daisy.
She would look up into the clouds and dream of being a world-famous baker.
可以看到file1完全与file2匹配了。
警告:在这个例子中, ed 覆盖了原始文件file1,运行脚本后,原始文件file1消失了,所以在运行这些命令前请确保你理解你的操作!
2.7. diff -y 并排显示
下面是个通过-y选项使用diff来并排显示两个文件之间区别的例子:
file1.txt:
apples
oranges
kiwis
carrots
file2.txt:
apples
kiwis
carrots
grapefruits
diff -y file1.txt file2.txt
输出:
apples apples
oranges <
kiwis kiwis
carrots carrots
> grapefruits
3. 常用 diff 选项
这里有一些需要注意的有用的diff选项:
参数
含义
-b
只改变空白的变化(spaces 或 tabs)
-w
完全忽略空白
-B
计算差异时忽略空行
-y
以列显示输出
这些只是一些最常用的diff选项,下面是 diff的选项和功能的完整列表:
4. diff 使用选项列表
diff [OPTION]... FILES
Options
含义
--normal
输出一个“正常”差异,这是默认值。
-q, --brief
仅在文件不同时生成输出。如果没有差异,则不输出任何内容。
-s, --report-identical-files
当两个文件相同时报告。
-c, -C NUM, --context[=NUM]
提供NUM(默认3行)上下文(context)
-u, -U NUM, --unified[=NUM]
提供NUM(默认3行)统一(unified)上下文
-e, --ed
输出一个ed脚本
-n, --rcs
输出 RCS-format diff.
-y, --side by side
以2列格式输出
-W, --width=NUM
输出最多NUM个(默认130个)打印列print columns.
--left-column
只输出公共行的左列
--suppress-common-lines
Do not output lines common between the two files.
-p, --show-c-function
For files that contain C code, also show each C function change.
-F, --show-function-line=RE
Show the most recent line matching regular expression RE.
--label LABEL
When displaying output, use the label LABEL instead of the file name. This option can be issued more than once for multiple labels.
-t, --expand-tabs
Expand tabs to spaces in output.
-T, --initial-tab
Make tabs line up by prepending a tab if necessary.
--tabsize=NUM
Define a tab stop as NUM (default 8) columns.
--suppress-blank-empty
Suppress spaces or tabs before empty output lines.
-l, --paginate
Pass output through pr to paginate.
-r, --recursive
Recursively compare any subdirectories found.
-N, --new-file
If a specified file does not exist, perform the diff as if it is an empty file.
--unidirectional-new-file
Same as -n, but only applies to the first file.
--ignore-file-name-case
Ignore case when comparing file names.
--no-ignore-file-name-case
Consider case when comparing file names.
-x, --exclude=PAT
Exclude files that match file name pattern PAT.
-X, --exclude-from=FILE
Exclude files that match any file name pattern in file FILE.
-S, --starting-file=FILE
Start with file FILE when comparing directories.
--from-file=FILE1
Compare FILE1 to all operands; FILE1 can be a directory.
--to-file=FILE2
Compare all operands to FILE2; FILE2 can be a directory.
-i, --ignore-case
Ignore case differences in file contents.
-E, --ignore-tab-expansion
Ignore changes due to tab expansion.
-b, --ignore-space-change
Ignore changes in the amount of white space.
-w, --ignore-all-space
Ignore all white space.
-B, --ignore-blank-lines
Ignore changes whose lines are all blank.
-I, --ignore-matching-lines=RE
Ignore changes whose lines all match regular expression RE.
-a, --text
Treat all files as text.
--strip-trailing-cr
Strip trailing carriage return on input.
-D, --ifdef=NAME
Output merged file with "#ifdef NAME" diffs.
--GTYPE-group-format=GFMT
Format GTYPE input groups with GFMT.
--line-format=LFMT
Format all input lines with LFMT.
--LTYPE-line-format=LFMT
Format LTYPE input lines with LFMT.
These format options provide fine-grained control over the output of diff, generalizing -D/--ifdef.
LTYPE is old, new, or unchanged.
GTYPE can be any of the LTYPE values, or the value changed.
GFMT (but not LFMT) may contain:
%< lines from FILE1
%> lines from FILE2
%= lines common to FILE1 and FILE2.
%[-][WIDTH][.[PREC]]{doxX}LETTER
printf-style spec for LETTER
Options
Options
含义
--normal
输出一个“正常”差异,这是默认值。
-q, --brief
仅在文件不同时生成输出。如果没有差异,则不输出任何内容。
-s, --report-identical-files
当两个文件相同时报告。
-c, -C NUM, --context[=NUM]
提供NUM(默认3行)上下文(context)
-u, -U NUM, --unified[=NUM]
提供NUM(默认3行)统一(unified)上下文
-e, --ed
输出一个ed脚本
-n, --rcs
输出 RCS-format diff.
-y, --side by side
以2列格式输出
-W, --width=NUM
输出最多NUM个(默认130个)打印列print columns.
--left-column
只输出公共行的左列
--suppress-common-lines
Do not output lines common between the two files.
-p, --show-c-function
For files that contain C code, also show each C function change.
-F, --show-function-line=RE
Show the most recent line matching regular expression RE.
--label LABEL
When displaying output, use the label LABEL instead of the file name. This option can be issued more than once for multiple labels.
-t, --expand-tabs
Expand tabs to spaces in output.
-T, --initial-tab
Make tabs line up by prepending a tab if necessary.
--tabsize=NUM
Define a tab stop as NUM (default 8) columns.
--suppress-blank-empty
Suppress spaces or tabs before empty output lines.
-l, --paginate
Pass output through pr to paginate.
-r, --recursive
Recursively compare any subdirectories found.
-N, --new-file
If a specified file does not exist, perform the diff as if it is an empty file.
--unidirectional-new-file
Same as -n, but only applies to the first file.
--ignore-file-name-case
Ignore case when comparing file names.
--no-ignore-file-name-case
Consider case when comparing file names.
-x, --exclude=PAT
Exclude files that match file name pattern PAT.
-X, --exclude-from=FILE
Exclude files that match any file name pattern in file FILE.
-S, --starting-file=FILE
Start with file FILE when comparing directories.
--from-file=FILE1
Compare FILE1 to all operands; FILE1 can be a directory.
--to-file=FILE2
Compare all operands to FILE2; FILE2 can be a directory.
-i, --ignore-case
Ignore case differences in file contents.
-E, --ignore-tab-expansion
Ignore changes due to tab expansion.
-b, --ignore-space-change
Ignore changes in the amount of white space.
-w, --ignore-all-space
Ignore all white space.
-B, --ignore-blank-lines
Ignore changes whose lines are all blank.
-I, --ignore-matching-lines=RE
Ignore changes whose lines all match regular expression RE.
-a, --text
Treat all files as text.
--strip-trailing-cr
Strip trailing carriage return on input.
-D, --ifdef=NAME
Output merged file with "#ifdef NAME" diffs.
--GTYPE-group-format=GFMT
Format GTYPE input groups with GFMT.
--line-format=LFMT
Format all input lines with LFMT.
--LTYPE-line-format=LFMT
Format LTYPE input lines with LFMT.
These format options provide fine-grained control over the output of diff, generalizing -D/--ifdef.
LTYPE is old, new, or unchanged.
GTYPE can be any of the LTYPE values, or the value changed.
GFMT (but not LFMT) may contain:
%< lines from FILE1
%> lines from FILE2
%= lines common to FILE1 and FILE2.
%[-][WIDTH][.[PREC]]{doxX}LETTER
printf-style spec for LETTER
LETTERs are as follows for new group, lower case for old group:
字母
含义
F
First line number.
L
Last line number,
N
Number of lines = L - F + 1.
E
F - 1
M
L + 1
%(A=B?T:E)
If A equals B then T else E.
LFMT (only) may contain:
符号
含义
%L
Contents of line.
%l
Contents of line, excluding any trailing newline.
%[-][WIDTH][.[PREC]]{doxX}n
printf-style spec for input line number.
Both GFMT and LFMT may contain:
符号
含义
%%
A literal %.
%c'C'
The single character C.
%c'\OOO'
The character with octal code OOO.
C
The character C (other characters represent themselves).
-d, --minimal
Try hard to find a smaller set of changes.
--horizon-lines=NUM
Keep NUM lines of the common prefix and suffix.
--speed-large-files
Assume large files and many scattered small changes.
--help
Display a help message and exit.
-v, --version
Output version information and exit.
FILES takes the form "FILE1 FILE2" or "DIR1 DIR2" or "DIR FILE..." or "FILE... DIR".
如果给出了--from -file或--to-file选项,则对FILE没有限制。如果FILE是破折号(“ - ”),diff从标准输入读取。
如果输入相同,则退出状态为0 ; 如果不同,则退出状态为1 ; 如果差异遇到任何问题,则退出状态为2。
5. 相关命令
bdiff — Identify the differences between two very big files.
cmp — Compare two files byte by byte.
comm — Compare two sorted files line by line.
dircmp — Compare the contents of two directories, listing unique files.
ed — A simple text editor.
pr — Format a text file for printing.
ls — List the contents of a directory or directories.
sdiff — Compare two files, side-by-side.
6. 参考资料