当前位置: 首页 > 工具软件 > git-lfs-one > 使用案例 >

Git LFS笔记(文档、博客、issue摘录)

封景曜
2023-12-01

Git LFS

1. What is Git LFS?

以下来自:Git LFS

…that reduces the impact of large files in your repository by downloading the relevant versions of them lazily. Specifically, large files are downloaded during the checkout process rather than during cloning or fetching.

Git LFS does this by replacing large files in your repository with tiny pointer files. During normal usage, you’ll never see these pointer files as they are handled automatically by Git LFS:

  1. When you add a file to your repository, Git LFS replaces its contents with a pointer, and stores the file contents in a local Git LFS cache.
  2. When you push new commits to the server, any Git LFS files referenced by the newly pushed commits are transferred from your local Git LFS cache to the remote Git LFS store tied to your Git repository.
  3. When you checkout a commit that contains Git LFS pointers, they are replaced with files from your local Git LFS cache, or downloaded from the remote Git LFS store.

To use Git LFS, you will need a Git LFS aware host such as Bitbucket Cloud or Bitbucket Server. Repository users will need to have the Git LFS command-line client installed, or a Git LFS aware GUI client such as Sourcetree.

1.1 Creating a new Git LFS repository

To create a new Git LFS aware repository, you’ll need to run git lfs install after you create the repository:

# initialize Git
$ mkdir Atlasteroids
$ cd Atlasteroids
$ git init
Initialized empty Git repository in /Users/tpettersen/Atlasteroids/.git/
# initialize Git LFS
$ git lfs install
Updated pre-push hook.
Git LFS initialized.

This installs a special pre-push Git hook in your repository that will transfer Git LFS files to the server when you git push.

Once Git LFS is initialized for your repository, you can specify which files to track using git lfs track.

1.2 Cloning an existing Git LFS repository

Once Git LFS is installed, you can clone a Git LFS repository as normal using git clone. At the end of the cloning process Git will check out the default branch (usually master), and any Git LFS files needed to complete the checkout process will be automatically downloaded for you. For example:

$ git clone git@bitbucket.org:tpettersen/Atlasteroids.git

When running git clone, Git LFS files are downloaded one at a time as pointer files are checked out of your repository.

1.3 Speeding up clones

If you’re cloning a repository with a large number of LFS files, the explicit git lfs clone command offers far better performance:

$ git lfs clone git@bitbucket.org:tpettersen/Atlasteroids.git

Rather than downloading Git LFS files one at a time, the git lfs clone command waits until the checkout is complete, and then downloads any required Git LFS files as a batch. This takes advantage of parallelized downloads, and dramatically reduces the number of HTTP requests and processes spawned (which is especially important for improving performance on Windows).

1.4 Pulling and checking out

Just like cloning, you can pull from a Git LFS repository using a normal git pull. Any needed Git LFS files will be downloaded as part of the automatic checkout process once the pull completes:

$ git pull

No explicit commands are needed to retrieve Git LFS content. However, if the checkout fails for an unexpected reason, you can download any missing Git LFS content for the current commit with git lfs pull:

$ git lfs pull

1.5 Speeding up pulls

ike git lfs clone, git lfs pull downloads your Git LFS files as a batch. If you know a large number of files have changed since the last time you pulled, you may wish to disable the automatic Git LFS download during checkout, and then batch download your Git LFS content with an explicit git lfs pull. This can be done by overriding your Git config with the -c option when you invoke git pull:

$ git -c filter.lfs.smudge= -c filter.lfs.required=false pull && git lfs pull

Since that’s rather a lot of typing, you may wish to create a simple Git alias to perform a batched Git and Git LFS pull for you:

$ git config --global alias.plfs "\!git -c filter.lfs.smudge= -c filter.lfs.required=false pull && git lfs pull"
$ git plfs

1.6 Tracking files with Git LFS

When you add a new type of large file to your repository, you’ll need to tell Git LFS to track it by specifying a pattern using the git lfs track command:

$ git lfs track "*.ogg"
Tracking *.ogg

Note that the quotes around "*.ogg" are important. Omitting them will cause the wildcard to be expanded by your shell, and individual entries will be created for each .ogg file in your current directory:

The patterns supported by Git LFS are the same as those supported by .gitignore, for example:

# track all .ogg files in any directory
$ git lfs track "*.ogg"
# track files named music.ogg in any directory
$ git lfs track "music.ogg"
# track all files in the Assets directory and all subdirectories
# note: cannot track files in subdirectories
$ git lfs track "Assets/"
# track all files in the Assets directory but *not* subdirectories
$ git lfs track "Assets/*"
# track all ogg files in Assets/Audio
$ git lfs track "Assets/Audio/*.ogg"
# track all ogg files in any directory named Music
$ git lfs track "**/Music/*.ogg"
# track png files containing "xxhdpi" in their name, in any directory
$ git lfs track "*xxhdpi*.png

These patterns are relative to the directory in which you ran the git lfs track command. To keep things simple, it is best to run git lfs track from the root of your repository. Note that Git LFS does not support negative patterns like .gitignore does.

After running git lfs track, you’ll notice a new file named .gitattributes in the directory you ran the command from. .gitattributes is a Git mechanism for binding special behaviors to certain file patterns. Git LFS automatically creates or updates .gitattributes files to bind tracked file patterns to the Git LFS filter. However, you will need to commit any changes to the .gitattributes file to your repository yourself:

$ git lfs track "*.ogg"
Tracking *.ogg
$ git add .gitattributes
$ git diff --cached
diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 0000000..b6dd0bb
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1 @@
+*.ogg filter=lfs diff=lfs merge=lfs -text
$ git commit -m "Track ogg files with Git LFS"

For ease of maintenance, it is simplest to keep all Git LFS patterns in a single .gitattributes file by always running git lfs trackfrom the root of your repository. However, you can display a list of all patterns that are currently tracked by Git LFS (and the .gitattributes files they are defined in) by invoking git lfs track with no arguments:

$ git lfs track

You can stop tracking a particular pattern with Git LFS by simply removing the appropriate line from your .gitattributes file, or by running the git lfs untrack command:

$ git lfs untrack "*.ogg"

1.7 Committing and pushing

You can commit and push as normal to a repository that contains Git LFS content. If you have committed changes to files tracked by Git LFS, you will see some additional output from git push as the Git LFS content is transferred to the server:

$ git push

If transferring the LFS files fails for some reason, the push will be aborted and you can safely try again. Like Git, Git LFS storage is content addressable: content is stored against a key which is a SHA-256 hash of the content itself. This means it is always safe to re-attempt transferring Git LFS files to the server; you can’t accidentally overwrite a Git LFS file’s contents with the wrong version.

1.8 Moving a Git LFS repository between hosts

To migrate a Git LFS repository from one hosting provider to another, you can use a combination of git lfs fetch and git lfs push with the --all option specified.

# create a bare clone of the GitHub repository
$ git clone --bare git@github.com:kannonboy/atlasteroids.git
$ cd atlasteroids
# set up named remotes for Bitbucket and GitHub
$ git remote add bitbucket git@bitbucket.org:tpettersen/atlasteroids.git
$ git remote add github git@github.com:kannonboy/atlasteroids.git
# fetch all Git LFS content from GitHub
$ git lfs fetch --all github
# push all Git and Git LFS content to Bitbucket
$ git push --mirror bitbucket
$ git lfs push --all bitbucket

1.9 Fetching extra Git LFS history

Git LFS typically only downloads the files needed for commits that you actually checkout locally. However, you can force Git LFS to download extra content for other recently modified branches using git lfs fetch --recent:

$ git lfs fetch --recent

Git LFS considers any branch or tag containing a commit newer than seven days as recent. You can configure the number of days considered as recent by setting the lfs.fetchrecentrefsdaysproperty:

# download Git LFS content for branches or tags updated in the last 10 days
$ git config lfs.fetchrecentrefsdays 10

By default, git lfs fetch --recent will only download Git LFS content for the commit at the tip of a recent branch or tag.

However you can configure Git LFS to download content for earlier commits on recent branches and tags by configuring the lfs.fetchrecentcommitsdays property:

# download the latest 3 days of Git LFS content for each recent branch or tag
$ git config lfs.fetchrecentcommitsdays 3

Use this setting with care: if you have fast moving branches, this can result in a huge amount of data being downloaded. However it can be useful if you need to review interstitial changes on a branch, cherry picking commits across branches, or rewrite history.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sklUAL6d-1578042723736)(picture\07.svg)]

As discussed in Moving a Git LFS repository between hosts, you can also elect to fetch all Git LFS content for your repository with git lfs fetch --all

1.10 Deleting local Git LFS files

You can delete files from your local Git LFS cache with the git lfs prune command:

$ git lfs prune

This will delete any local Git LFS files that are considered old. An old file is any file not referenced by:

  • the currently checked out commit
  • a commit that has not yet been pushed (to origin, or whatever lfs.pruneremotetocheck is set to)
  • a recent commit

By default, a recent commit is any commit created in the last tendays. This is calculated by adding:

  • the value of the lfs.fetchrecentrefsdays property discussed in Fetching extra Git LFS history (which defaults to seven); to
  • the value of the lfs.pruneoffsetdays property (which defaults to three)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aXgdCVGD-1578042723742)(picture\08.svg)]

You can configure the prune offset to retain Git LFS content for a longer period:

# don't prune commits younger than four weeks (7 + 21)
$ git config lfs.pruneoffsetdays 21

Unlike Git’s built-in garbage collection, Git LFS content is not pruned automatically, so running git lfs prune on a regular basis is a good idea to keep your local repository size down.

You can test out what effect a prune operation will have with git lfs prune --dry-run:

$ git lfs prune --dry-run
✔ 4 local objects, 33 retained
4 files would be pruned (2.1 MB)

And exactly which Git LFS objects will be pruned with git lfs prune --verbose --dry-run:

$ git lfs prune --dry-run --verbose
✔ 4 local objects, 33 retained
4 files would be pruned (2.1 MB)
* 4a3a36141cdcbe2a17f7bcf1a161d3394cf435ac386d1bff70bd4dad6cd96c48 (2.0 MB)
* 67ad640e562b99219111ed8941cb56a275ef8d43e67a3dac0027b4acd5de4a3e (6.3 KB)
* 6f506528dbf04a97e84d90cc45840f4a8100389f570b67ac206ba802c5cb798f (1.7 MB)
* a1d7f7cdd6dba7307b2bac2bcfa0973244688361a48d2cebe3f3bc30babcf1ab (615.7 KB)

The long hexadecimal strings output by --verbose mode are SHA-256 hashes (also known as Object IDs, or OIDs) of the Git LFS objects to be pruned. You can use the techniques described in Finding paths or commits that reference a Git LFS object to find our more about the objects that will be pruned.

As an additional safety check, you can use the --verify-remoteoption to check whether the remote Git LFS store has a copy of your Git LFS objects before they are pruned:

$ git lfs prune --verify-remote
✔ 16 local objects, 2 retained, 12 verified with remote
Pruning 14 files, (1.7 MB)
✔ Deleted 14 files

This makes the pruning process significantly slower, but gives you peace of mind knowing that any pruned objects are recoverable from the server. You can enable the --verify-remote option permanently for your system by configuring the lfs.pruneverifyremotealways property globally:

$ git config --global lfs.pruneverifyremotealways true

Or you can enable remote verification for just the context repository by omitting the --global option from the command above.

1.11 Deleting remote Git LFS files from the server

The Git LFS command-line client doesn’t support pruning files from the server, so how you delete them depends on your hosting provider.

1.12 Finding paths or commits that reference a Git LFS object

If you have a Git LFS SHA-256 OID, you can determine which commits reference it with git log --all -p -S <OID>:

$ git log --all -p -S 3b6124b8b01d601fa20b47f5be14e1be3ea7759838c1aac8f36df4859164e4cc

This git log incantation generates a patch (-p) from commits on any branch (--all) that add or remove a line (-S) containing the specified string (a Git LFS SHA-256 OID).

The patch shows you the commit and the path to the LFS object, as well as who added it, and when it was committed. You can simply checkout the commit, and Git LFS will download the file if needed and place it in your working copy.

If you suspect that a particular Git LFS object is in your current HEAD, or on a particular branch, you can use git grep to find the file path that references it:

# find a particular object by OID in HEAD
$ git grep 3b6124b8b01d601fa20b47f5be14e1be3ea7759838c1aac8f36df4859164e4cc HEAD
HEAD:Assets/Sprites/projectiles-spritesheet.png:oid sha256:3b6124b8b01d601fa20b47f5be14e1be3ea7759838c1aac8f36df4859164e4cc
# find a particular object by OID on the "power-ups" branch
$ git grep e88868213a5dc8533fc9031f558f2c0dc34d6936f380ff4ed12c2685040098d4 power-ups
power-ups:Assets/Sprites/shield2.png:oid sha256:e88868213a5dc8533fc9031f558f2c0dc34d6936f380ff4ed12c2685040098d4

1.13 Including/excluding Git LFS files

In some situations you may want to only download a subset of the available Git LFS content for a particular commit.

You can exclude a pattern or subdirectory using git lfs fetch -X (or --exclude):

$ git lfs fetch -X "Assets/**"

Alternatively, you may want to only include a particular pattern or subdirectory. For example, an audio engineer could fetch just oggand wav files with git lfs fetch -I (or --include):

$ git lfs fetch -I "*.ogg,*.wav"

If you combine includes and excludes, only files that match an include pattern and do not match an exclude pattern will be fetched.

$ git lfs fetch -I "Assets/**" -X "*.gif"

Excludes and includes support the same patterns as git lfs track and .gitignore. You can make these patterns permanent for a particular repository by setting the lfs.fetchinclude and lfs.fetchexclude config properties:

$ git config lfs.fetchinclude "Assets/**"
$ git config lfs.fetchexclude "*.gif"

These settings can also be applied to every repository on your system by appending the --global option.

1.14 Locking Git LFS files

Unfortunately, there is no easy way of resolving binary merge conflicts. With Git LFS file locking, you can lock files by extension or by file name and prevent binary files from being overwritten during a merge.

In order to take advantage of LFS’ file locking feature, you first need to tell Git which type of files are lockable. In the example below, the --lockable flag is appended to the git lfs track command which both stores PSD files in LFS and marks them as lockable.

$ git lfs track "*.psd" --lockable

Then add the following to your .gitattributes file:

*.psd filter=lfs diff=lfs merge=lfs -text lockable

When preparing to make changes to an LFS file, you’ll use the lock command in order to register the file as locked on your Git server.

$ git lfs lock images/foo.psd
Locked images/foo.psd

Once you no longer need the file lock, you can remove it using Git LFS’ unlock command.

$ git lfs unlock images/foo.psd

Git LFS file locks can be overridden, similar to git push, using a --force flag. Do not use the --force flag unless you’re absolutely sure you know what you’re doing.

$ git lfs unlock images/foo.psd --force

2. Git LFS 操作指南

以下内容来自:Git LFS 操作指南

2.1 常用 Git LFS 命令

# 查看当前使用 Git LFS 管理的匹配列表
git lfs track

# 使用 Git LFS 管理指定的文件
git lfs track "*.psd"

# 不再使用 Git LFS 管理指定的文件
git lfs untrack "*.psd"

# 类似 `git status`,查看当前 Git LFS 对象的状态
git lfs status

# 枚举目前所有被 Git LFS 管理的具体文件
git lfs ls-files

# 检查当前所用 Git LFS 的版本
git lfs version

# 针对使用了 LFS 的仓库进行了特别优化的 clone 命令,显著提升获取
# LFS 对象的速度,接受和 `git clone` 一样的参数。 [1] [2]
git lfs clone https://github.com/user/repo.git

[1] git lfs clone 通过合并获取 LFS 对象的请求,减少了 LFS API 的调用,并行化 LFS 对象的下载,从而达到显著的速度提升。git lfs clone 命令同样也兼容没有使用 LFS 的仓库。即无论要克隆的仓库是否使用 LFS,都可以使用 git lfs clone 命令来进行克隆。
[2] 目前最新版本的 git clone 已经能够提供与 git lfs clone 一致的性能,因此自 Git LFS 2.3.0 版本起,git lfs clone 已不再推荐使用。

2.2 Git LFS 进阶使用

使用 Git LFS 的核心思想就是把需要进行版本管理、但又占用很大空间的那部分文件独立于 Git 仓库进行管理。从而加快克隆仓库本身的速度,同时获得灵活的管理 LFS 对象的能力。

默认情况下,只有当前 commit 下的 LFS 对象的当前版本才会被获取。

2.2.1 只获取仓库本身,而不获取任何 LFS 对象

如果自己的相关工作不涉及到被 Git LFS 所管理的文件的话,可以选择只获取 Git 仓库自身的内容,而完全跳过 LFS 对象的获取。

GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/user/repo.git
# 或
git -c filter.lfs.smudge= -c filter.lfs.required=false clone https://github.com/user/repo.git

注:GIT_LFS_SKIP_SMUDGE=1git -c filter.lfs.smudge= -c filter.lfs.required=false 同样使用于其他 git 命令,如 checkout, reset 等。

2.3 常见问题

  • Git LFS 对象在本地仓库的存放位置

    通过 Git LFS 所管理的对象实际在本地的存储位置是在 .git/lfs/objects 目录下,该目录根据对象的 sha256 值来组织。

    作为对比,Git 自身所管理的对象则是存储在 .git/objects 目录下,根据 commit, tree, blob, tag 的 sha1 值来组织。

  • 已经使用 git lfs track somefile 追踪了某个文件,但该文件并未以 LFS 存储。

    如果被 LFS 追踪管理的文件的大小为 0 的话,则该文件不会以 LFS 的形式存储起来。

    只有当一个文件至少有 1 个字节时,其才会以 LFS 的形式存储

    注:一般使用 LFS 时,我们也不会用其追踪空文件,即使追踪了空文件,对于使用也没有任何影响。提到这点主要是为了消除在测试使用 LFS 时可能遇到的困惑。

3. 实际使用

3.1 追踪文件夹中的所有文件

notes:

  1. 注意顺序 git lfs track xxx -> git add .gitattributes -> git commit -> git add xxx -> git commit -> git push -u origin master

  2. git lfs ls-filesgit lfs track xxxgit add xxx 后可以看到当前追踪的所有文件

  3. GitHub文件限制:

    关于存储和带宽使用情况

    每个使用 Git Large File Storage 的帐户可获得 1 GB 的免费存储空间和 1 GB 的一个月免费带宽。如果带宽和存储配额不足,您可以选择为 Git LFS 购买附加配额。

    Git LFS 是适用于 GitHub 上每个仓库的变量,无论您的帐户或组织是否有付费的订阅。

4. 主要技术原理

以下来自:Git LFS 服务器实现杂谈

在 git 存储库中,有一些特殊文件, .gitattributes.gitignore.gitattributes 顾名思义用来管理存储库下路径的属性,如何过滤,如何 diff,如何合并。而 .gitignore 的作用则是排除指定的路径,不添加到版本库中。

笔者最早了解到 .gitattributes 时是使用 git 管理毕业论文,设置好 .gitattributes 后,就可以到 docx 文档进行 diff 了。利用这一特性,git lfs 在运行命令 git lfs track 后,将路径添加到 .gitattributes 文件中,比如这样:

clang.tar.gz filter=lfs diff=lfs merge=lfs -text

然后修改 git 钩子,包括 post-checkout post-commit post-merge pre-push,其中的内容都是将当前钩子命令当做参数启动 git-lfs。

创建提交的过程中,比如添加了 clang.tar.gz ,git 会启动 lfs 命令,将 clang.tar.gz 文件的 hash 值(目前是 sha256),大小写入到一个名字叫 clang.tar.gz 文件中,然后将此文件作为大文件的 Pointer 提交到版本库中,原始的 clang.tar.gz 会单独存储到 lfs/objects,将更新推送到远程服务器上时,pre-push 钩子启动 git lfs,git lfs 连接到远程的 LFS 服务器,将文件上传到服务器上。

而 checkout 或者 pull 的时候,本地没有的大文件,git lfs 就会将大文件下载下来。

简单的说就是利用 .gitattributes 偷梁换柱。

git lfs 规范可以在 https://github.com/git-lfs/git-lfs/blob/master/docs/spec.md 查看。

4.1 相关hooks

  1. post-checkout:

    #!/bin/sh
    command -v git-lfs >/dev/null 2>&1 || { echo >&2 "\nThis repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting .git/hooks/post-checkout.\n"; exit 2; }
    git lfs post-checkout "$@"
    
  2. post-commit:

    #!/bin/sh
    command -v git-lfs >/dev/null 2>&1 || { echo >&2 "\nThis repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting .git/hooks/post-commit.\n"; exit 2; }
    git lfs post-commit "$@"
    
  3. post-merge:

    #!/bin/sh
    command -v git-lfs >/dev/null 2>&1 || { echo >&2 "\nThis repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting .git/hooks/post-merge.\n"; exit 2; }
    git lfs post-merge "$@"
    
  4. pre-push:

    #!/bin/sh
    command -v git-lfs >/dev/null 2>&1 || { echo >&2 "\nThis repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting .git/hooks/pre-push.\n"; exit 2; }
    git lfs pre-push "$@"
    

5. Spec

以下来自:Git LFS Specification

5.1 The pointer

The core Git LFS idea is that instead of writing large blobs to a Git repository, only a pointer file is written.

  • Each line MUST be of the format {key} {value}\n (trailing unix newline).
  • The first key is always version.
  • Lines of key/value pairs MUST be sorted alphabetically in ascending order (with the exception of version, which is always first).
  • Pointer files MUST be stored in Git with their executable bit matching that of the replaced file.

An empty file is the pointer for an empty file. That is, empty files are passed through LFS without any change.

The required keys are:

  • version is a URL that identifies the pointer file spec. Parsers MUST use simple string comparison on the version, without any URL parsing or normalization. It is case sensitive, and %-encoding is discouraged.
  • oid tracks the unique object id for the file, prefixed by its hashing method: {hash-method}:{hash}. Currently, only sha256 is supported.
  • size is in bytes.
version https://hawser.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 12345
(ending \n)

For testing compliance of any tool generating its own pointer files, the reference is this official Git LFS tool:

NOTE: exact pointer command behavior TBD!

  • Tools that parse and regenerate pointer files MUST preserve keys that they don’t know or care about.

  • Run the pointer command to generate a pointer file for the given local file:

    $ git lfs pointer --file=path/to/file
    Git LFS pointer for path/to/file:
    
    version https://git-lfs.github.com/spec/v1
    oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
    size 12345
    
  • Run pointer to compare the blob OID of a pointer file built by Git LFS with a pointer built by another tool.

    $ git lfs pointer --file=path/to/file --pointer=other/pointer/file
    $ cat other/pointer/file | git lfs pointer --file=path/to/file --stdin
    

5.2 Intercepting Git

Git LFS uses the clean and smudge filters to decide which files use it. The global filters can be set up with git lfs install

These filters ensure that large files aren’t written into the repository proper, instead being stored locally at .git/lfs/objects/{OID-PATH} (where {OID-PATH} is a sharded filepath of the form OID[0:2]/OID[2:4]/OID), synchronized with the Git LFS server as necessary.

The clean filter runs as files are added to repositories. Git sends the content of the file being added as STDIN, and expects the content to write to Git as STDOUT.

  • Stream binary content from STDIN to a temp file, while calculating its SHA-256 signature.
  • Atomically move the temp file to .git/lfs/objects/{OID-PATH} if it does not exist, and the sha-256 signature of the contents matches the given OID.
  • Delete the temp file.
  • Write the pointer file to STDOUT.

Note that the clean filter does not push the file to the server. Use the git push command to do that (lfs files are pushed before commits in a pre-push hook).

The smudge filter runs as files are being checked out from the Git repository to the working directory. Git sends the content of the Git blob as STDIN, and expects the content to write to the working directory as STDOUT.

  • Read 100 bytes.
  • If the content is ASCII and matches the pointer file format:
    • Look for the file in .git/lfs/objects/{OID-PATH}.
    • If it’s not there, download it from the server.
    • Write its contents to STDOUT
  • Otherwise, simply pass the STDIN out through STDOUT.

6. Reference Manual

6.1 git-lfs-prune(1)

Deletes local copies of LFS files which are old, thus freeing up disk space.

Note: you should not run git lfs prune if you have different repositories sharing the same custom storage directory; see git-lfs-config(1) for more details about lfs.storage option.

6.2 git-lfs-config(5)

  1. lfs.storage 可以更改lfs文件的存储位置(不同仓库可以共享同一存储目录)

    Allow override LFS storage directory. Non-absolute path is relativized to inside of Git repository directory (usually .git).

    Note: you should not run git lfs prune if you have different repositories sharing the same storage directory.

    Default: lfs in Git repository directory (usually .git/lfs).

7. 问题合集

7.1 drone CI 中构建含有git lfs 的镜像

drone克隆含子项目的仓库时:

kind: pipeline
type: docker
name: default

clone:
  disable: true

steps:
- name: clone
  image: alpine/git
  commands:
  - git lfs clone --recursive http://192.168.xxx.xxx:11080/username/repo.git   

** 会报错,因为 alpine/git 镜像中没有安装 git-lfs

** 如果使用 git clone 命令,那么使用了 lfs 的子模块中的大文件不会被拉取下来

构建安装好 git-lfs 的镜像:

FROM alpine

RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories
RUN apk update && apk upgrade && \
    apk add --no-cache bash openssh git git-lfs

** 更换源

7.2 git lfs clone成功但checkout失败

  1. 文件过大,commit时自动打包

    $ git commit -m "added lfw_160"
    Auto packing the repository in background for optimum performance.
    See "git help gc" for manual housekeeping.
    Enumerating objects: 7299, done.
    Counting objects: 100% (7299/7299), done.
    Delta compression using up to 4 threads
    Compressing objects: 100% (7013/7013), done.
    Writing objects: 100% (7299/7299), done.
    Total 7299 (delta 0), reused 0 (delta 0)
    Removing duplicate objects: 100% (256/256), done.
    [master (root-commit) ef03f8f] added lfw_160
     6029 files changed, 24112 insertions(+)
     create mode 100644 .gitattributes
     create mode 100644 lfw_160/Aaron_Sorkin/Aaron_Sorkin_0001.png
     create mode 100644 lfw_160/Aaron_Sorkin/Aaron_Sorkin_0002.png
    ......
    
  2. 仓库克隆成功但检出失败

    其中pre-trained-models仓库push前没有打包。

    进入仓库文件夹后,使用git checkout -f HEAD有时会成功,而且有时会要求输入用户名和密码(使用http克隆时)

    / # git --version
    git version 2.22.0
    / # git clone http://192.168.xxx.xxx:10080/usr/lfw-dataset.git
    Cloning into 'pre-trained-models'...
    remote: Enumerating objects: 8, done.
    remote: Counting objects: 100% (8/8), done.
    remote: Compressing objects: 100% (8/8), done.
    remote: Total 8 (delta 0), reused 0 (delta 0)
    Unpacking objects: 100% (8/8), done.
    / # git clone http://192.168.xxx.xxx:10080/usr/lfw-dataset.git
    Cloning into 'lfw-dataset'...
    remote: Enumerating objects: 7299, done.
    remote: Counting objects: 100% (7299/7299), done.
    remote: Compressing objects: 100% (7013/7013), done.
    remote: Total 7299 (delta 0), reused 7299 (delta 0)
    Receiving objects: 100% (7299/7299), 1006.13 KiB | 23.96 MiB/s, done.
    Downloading lfw_160/Jorge_Moreno/Jorge_Moreno_0001.png (33 KB)
    Error downloading object: lfw_160/Jorge_Moreno/Jorge_Moreno_0001.png (b8e0536): Smudge error: Error downloading lfw_160/Jorge_Moreno/Jorge_Moreno_0001.png (b8e05360ab5e1d92a8d2bf7f3eb2793ce6ddb21f1430fac9fb27663ad448b8cb): batch response: Repository or object not found: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    Check that it exists and that you have proper access to it
    
    Errors logged to /lfw-dataset/.git/lfs/logs/20191128T110441.134505027.log
    Use `git lfs logs last` to view the log.
    error: external filter 'git-lfs filter-process' failed
    fatal: lfw_160/Jorge_Moreno/Jorge_Moreno_0001.png: smudge filter lfs failed
    warning: Clone succeeded, but checkout failed.
    You can inspect what was checked out with 'git status'
    and retry the checkout with 'git checkout -f HEAD'
    
  3. 使用GIT_TRACE=1查看克隆细节,服务器内部错误500

    • 在Windows、Ubuntu平台(非docker容器中),有或没有像仓库添加ssh密钥的情况下,都克隆并检出成功
    • 在docker容器中,Ubuntu、Alpine都克隆成功但检出失败,因此推断是由于docker容器的某种限制导
    • 请求batch,服务器内部错误500 api error: Fatal error: Server error: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    • 之后的数次请求,找不到对象404 api error: Repository or object not found: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    • 重试几次后,最终返回 api error: Repository or object not found: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
      Check that it exists and that you have proper access to it
    09:46:03.806661 trace git-lfs: HTTP: 200
    09:46:03.806706 trace git-lfs: HTTP: 200
    09:46:03.807167 trace git-lfs: filepathfilter: accepting "lfw_160/Joschka_Fischer/Joschka_Fischer_0004.png"
    09:46:03.807853 trace git-lfs: filepathfilter: accepting "lfw_160/Joseph_Deiss/Joseph_Deiss_0003.png"
    09:46:03.807990 trace git-lfs: tq: sending batch of size 100
    09:46:03.808246 trace git-lfs: api: batch 100 files
    09:46:03.808293 trace git-lfs: HTTP: POST http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    09:46:03.808350 trace git-lfs: filepathfilter: accsepting "lfw_160/Jose_Serra/Jose_Serra_0003.png"s
    09:46:03.808565 trace git-lfs: filepathfilter: accepting "lfw_160/Judi_Patton/Judi_Patton_0001.png"
    09:46:04.138014 trace git-lfs: HTTP: 500
    09:46:04.138057 trace git-lfs: api error: Fatal error: Server error: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    09:46:04.138366 trace git-lfs: tq: sending batch of size 100
    09:46:04.138465 trace git-lfs: api: batch 100 files
    09:46:04.138511 trace git-lfs: HTTP: POST http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    09:46:04.147331 trace git-lfs: HTTP: 404
    09:46:04.147371 trace git-lfs: api error: Repository or object not found: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    Check that it exists and that you have proper access to it
    09:46:04.147630 trace git-lfs: tq: sending batch of size 100
    09:46:04.147725 trace git-lfs: api: batch 100 files
    09:46:04.147768 trace git-lfs: HTTP: POST http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    09:46:04.156188 trace git-lfs: HTTP: 404
    09:46:04.156219 trace git-lfs: api error: Repository or object not found: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    Check that it exists and that you have proper access to it
    ……
    09:46:04.361248 trace git-lfs: tq: sending batch of size 25
    09:46:04.361318 trace git-lfs: api: batch 25 files
    09:46:04.361373 trace git-lfs: HTTP: POST http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    09:46:04.370427 trace git-lfs: HTTP: 404
    09:46:04.370465 trace git-lfs: api error: Repository or object not found: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    Check that it exists and that you have proper access to it
    09:46:04.384150 trace git-lfs: filepathfilter: accepting "lfw_160/Julia_Tymoshenko/Julia_Tymoshenko_0003.png"
    Downloading lfw_160/Julia_Tymoshenko/Julia_Tymoshenko_0003.png (32 KB)
    09:46:04.384189 trace git-lfs: tq: running as batched queue, batch size of 100
    09:46:04.384210 trace git-lfs: tq: sending batch of size 1
    09:46:04.384255 trace git-lfs: api: batch 1 files
    09:46:04.384301 trace git-lfs: HTTP: POST http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    09:46:04.393577 trace git-lfs: HTTP: 404
    09:46:04.393609 trace git-lfs: api error: Repository or object not found: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    Check that it exists and that you have proper access to it
    Error downloading object: lfw_160/Julia_Tymoshenko/Julia_Tymoshenko_0003.png (ab93cd4): Smudge error: Error downloading lfw_160/Julia_Tymoshenko/Julia_Tymoshenko_0003.png (ab93cd45d9ada70acdaa7e7d0d84b4155fa73cc36d1c070d1817d6631932fb5a): batch response: Repository or object not found: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    Check that it exists and that you have proper access to it
    
    Errors logged to /lfw-dataset/.git/lfs/logs/20191130T094604.393673476.log
    Use `git lfs logs last` to view the log.
    error: external filter 'git-lfs filter-process' failed
    fatal: lfw_160/Julia_Tymoshenko/Julia_Tymoshenko_0003.png: smudge filter lfs failed
    warning: Clone succeeded, but checkout failed.
    You can inspect what was checked out with 'git status'
    and retry the checkout with 'git checkout -f HEAD'
    
  4. 更详细的信息

    GIT_TRACE=1 GIT_TRANSFER_TRACE=1 GIT_CURL_VERBOSE=1

    GIT_TRACE=1 GIT_TRANSFER_TRACE=1 GIT_CURL_VERBOSE=1 git clone http://192.168.xxx.xxx:10080/usr/lfw-dataset.git
    
    < HTTP/1.1 200 OK
    < Content-Length: 28128
    < Content-Type: application/octet-stream
    < Date: Sun, 01 Dec 2019 02:20:39 GMT
    < Set-Cookie: lang=en-US; Path=/; Max-Age=2147483647
    < Set-Cookie: i_like_gitea=2c0a816af789763f; Path=/; HttpOnly
    < Set-Cookie: _csrf=UXrpPl4bM6x53OXiXSd2EBdtkEc6MTU3NTE2NjgzOTgwMzQzNTk5Mw; Path=/; Expires=Mon, 02 Dec 2019 02:20:39 GMT; HttpOnly
    < X-Frame-Options: SAMEORIGIN
    <
    02:20:39.816151 trace git-lfs: xfer: adapter "basic" worker 3 finished job for "6fbb3cfda70f7f4c147134a1fd159a124a07addde5cd97b3d9e602c8cc05099a"
    02:20:39.816297 trace git-lfs: filepathfilter: accepting "lfw_160/Joschka_Fischer/Joschka_Fischer_0004.png"
    02:20:39.817394 trace git-lfs: tq: sending batch of size 100
    02:20:39.817485 trace git-lfs: api: batch 100 files
    02:20:39.817520 trace git-lfs: HTTP: POST http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    > POST /usr/lfw-dataset.git/info/lfs/objects/batch HTTP/1.1
    > Host: 192.168.xxx.xxx:10080
    > Accept: application/vnd.git-lfs+json; charset=utf-8
    > Content-Length: 8871
    > Content-Type: application/vnd.git-lfs+json; charset=utf-8
    > User-Agent: git-lfs/2.7.2 (GitHub; linux amd64; go 1.12.5; git f3af9f768e)
    >
    {"operation":"download","objects":[{"oid":"fe755d8d795033b8fe4092c0e6baa0d908436069dd4503a1005315c3a7ad959f","size":46076},{"oid":"36c2029b8103ed6e5f8aa1fe94a047cb051ca2e26294c5b980dd91e10fbb84bf","size":43510},……{"oid":"3f7a2397513cc87d38fbf2f31f3be29f8edb9951f0d08a11cc386560d4263c2a","size":28813}],"ref":{"name":"refs/heads/master"}}02:20:40.198751 trace git-lfs: HTTP: 500
    
    
    < HTTP/1.1 500 Internal Server Error
    < Transfer-Encoding: chunked
    < Content-Type: text/html; charset=UTF-8
    < Date: Sun, 01 Dec 2019 02:20:40 GMT
    < Set-Cookie: lang=en-US; Path=/; Max-Age=2147483647
    < Set-Cookie: i_like_gitea=11ed8dfc2d08ef19; Path=/; HttpOnly
    < Set-Cookie: _csrf=u0QaX5rU9M3YXlvQjiyA0B6RdX06MTU3NTE2NjgzOTgxNzc2OTY2NQ; Path=/; Expires=Mon, 02 Dec 2019 02:20:39 GMT; HttpOnly
    < X-Frame-Options: SAMEORIGIN
    <
    02:20:40.198867 trace git-lfs: api error: Fatal error: Server error: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    02:20:40.199269 trace git-lfs: tq: sending batch of size 100
    02:20:40.199389 trace git-lfs: api: batch 100 files
    02:20:40.199442 trace git-lfs: HTTP: POST http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    > POST /usr/lfw-dataset.git/info/lfs/objects/batch HTTP/1.1
    > Host: 192.168.xxx.xxx:10080
    > Accept: application/vnd.git-lfs+json; charset=utf-8
    > Content-Length: 8871
    > Content-Type: application/vnd.git-lfs+json; charset=utf-8
    > User-Agent: git-lfs/2.7.2 (GitHub; linux amd64; go 1.12.5; git f3af9f768e)
    >
    {"operation":"download","objects":[{"oid":"a892c6a142d8b246b27b6531b70dbba6005fa837014b2f1d95d95dc68d64b853","size":43791},{"oid":"21bc4e6cdc605e3b244767d138057537e3f136461bc915b9775e51849e35433c","size":42884},……{"oid":"2e66599d82999056aa8f02b9562369f971b3c4462c971953c9773dd8a1280ecd","size":28470}],"ref":{"name":"refs/heads/master"}}02:20:40.210304 trace git-lfs: HTTP: 404
    
    
    < HTTP/1.1 404 Not Found
    < Content-Length: 23
    < Content-Type: text/plain; charset=utf-8
    < Date: Sun, 01 Dec 2019 02:20:40 GMT
    < Set-Cookie: lang=en-US; Path=/; Max-Age=2147483647
    < Set-Cookie: i_like_gitea=09c73f17462dc4ec; Path=/; HttpOnly
    < Set-Cookie: _csrf=fadTW0WNPRIHuwV3BtmAwAx6tWo6MTU3NTE2Njg0MDE5OTc4MjY1Mw; Path=/; Expires=Mon, 02 Dec 2019 02:20:40 GMT; HttpOnly
    < X-Frame-Options: SAMEORIGIN
    <
    02:20:40.210406 trace git-lfs: api error: Repository or object not found: http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    Check that it exists and that you have proper access to it
    02:20:40.210752 trace git-lfs: tq: sending batch of size 100
    02:20:40.210875 trace git-lfs: api: batch 100 files
    02:20:40.210933 trace git-lfs: HTTP: POST http://192.168.xxx.xxx:10080/usr/lfw-dataset.git/info/lfs/objects/batch
    > POST /usr/lfw-dataset.git/info/lfs/objects/batch HTTP/1.1
    > Host: 192.168.xxx.xxx:10080
    > Accept: application/vnd.git-lfs+json; charset=utf-8
    > Content-Length: 8871
    > Content-Type: application/vnd.git-lfs+json; charset=utf-8
    > User-Agent: git-lfs/2.7.2 (GitHub; linux amd64; go 1.12.5; git f3af9f768e)
    >
    {"operation":"download","objects":[{"oid":"f5a23061bf1b96a1d1f402edd9965cc54051be26f87e855d1089bbc3bcc8c6aa","size":42833},{"oid":"a7d9f41c34b042b535c6eeddd4a3df02a1829d59b31becc7b58cc2314f9beb5b","size":41265},……{"oid":"1c356214ed987a424fb9a713c26a8287469cf35f4d369ac72cd9a5565dc2c815","size":25820}],"ref":{"name":"refs/heads/master"}}02:20:40.221400 trace git-lfs: HTTP: 404
    
    
    
  5. 误打误撞,根据

    external filter ‘git-lfs filter-process’ failed … smudge filter lfs failed on Mac #3519

    EOF and exit status 255 connection closed by remote host for 1 user in a specific location #2271

    猜测可能是docker网络的某种原因,所以限制:

    # lfs.tlstimeout=60 (default 30 seconds)
    # lfs.concurrenttransfers=1 (default 3)
    # 其他
    git config --global lfs.tlstimeout 60
    git config --global lfs.concurrenttransfers 1
    git config --global lfs.activitytimeout 60
    git config --global lfs.transfer.maxretries 10
    git config --global lfs.transfer.maxverifies 8lfs.tlstimeout=60 (default 30 seconds)
    lfs.concurrenttransfers=1 (default 3)
    

    克隆成功

  6. 也许是速率的原因:

    来自:API rate limit exceeded

    Closing this out. API rate limit has not changed. Unauthenticated rate limits across all of GitHub’s API have been fairly low historically. Configuring a password or turning lfs.access to basic as I demonstrated in #2134 (comment) will give you the higher, authenticated rate limit.

    来自: “API rate limit exceeded” not considered an error

    Bah, I left out one important detail. If you set lfs.<url>.access = basic, you’ll need a login too. You can create a github personal access token with no scopes. This will get you a higher rate limit, without providing access to any private resources.

  7. 也有可能是内部ip原因:

    来自: API rate limit exceeded

    @jonasmalacofilho We also saw this exact same problem at the same time you did with the same IP ( 172.16.40.2) address reported in the error messages. It brought all of our work to a halt and then resolved on its own. 172.16.* is an address block reserved for private network routing, so I believe this was probably caused by an internal error at github or one of their providers rather than being due to an actual increase in api requests.

参考

  1. Git LFS
  2. Git LFS 操作指南
  3. 关于存储和带宽使用情况
  4. Git LFS 服务器实现杂谈
  5. Git LFS Specification
  6. external filter ‘git-lfs filter-process’ failed … smudge filter lfs failed on Mac #3519
  7. EOF and exit status 255 connection closed by remote host for 1 user in a specific location #2271
  8. API rate limit exceeded
  9. “API rate limit exceeded” not considered an error
 类似资料: