Skip to content

Commit 320c85f

Browse files
committed
filter-repo: improve support for partial history rewrites
Partial history rewrites were possible before with the (previously hidden) --refs flag, but the defaults were wrong. That could be worked around with the --source or --target flags, but that disabled --no-data for fast-export and thus slowed things down, and also would require overridding --replace-refs. And the defaults for --source and --target may diverge further from what is wanted/needed for partial history rewrites in the future. So, add --partial as a first-class supported option with scary documentation about how it permits mixing new and old history. Make --refs imply that flag. Make the behavioral similarities (in regards to which steps are skipped) between --source, --target, and --partial more clear. Add relevant documentation to round it out. Signed-off-by: Elijah Newren <[email protected]>
1 parent 509a624 commit 320c85f

File tree

5 files changed

+98
-41
lines changed

5 files changed

+98
-41
lines changed

Documentation/git-filter-repo.txt

Lines changed: 55 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -256,9 +256,10 @@ Generic callback code snippets
256256
Location to filter from/to
257257
~~~~~~~~~~~~~~~~~~~~~~~~~~
258258

259-
NOTE: Specifying alternate source or target locations will disable some
260-
auxiliary steps such as disconnecting the origin remote, and avoiding
261-
mixing new and old history.
259+
NOTE: Specifying alternate source or target locations implies --partial
260+
except that the normal default for --replace-refs is used. However, unlike
261+
normal uses of --partial, this doesn't risk mixing old and new history
262+
since the old and new histories are in different repositories.
262263

263264
--source <source>::
264265
Git repository to read from
@@ -278,6 +279,25 @@ Miscellaneous options
278279
Rewrite history even if the current repo does not look like a fresh
279280
clone.
280281

282+
--partial:
283+
Do a partial history rewrite, resulting in the mixture of old and
284+
new history. This implies a default of update-no-add for
285+
--replace-refs, disables rewriting refs/remotes/origin/* to
286+
refs/heads/*, disables removing of the 'origin' remote, disables
287+
removing unexported refs, disables expiring the reflog, and
288+
disables the automatic post-filter gc. Also, this modifies
289+
--tag-rename and --refname-callback options such that instead of
290+
replacing old refs with new refnames, it will instead create new
291+
refs and keep the old ones around. Use with caution.
292+
293+
--refs <refs+>::
294+
Limit history rewriting to the specified refs. Implies --partial.
295+
In addition to the normal caveats of --partial (mixing old and new
296+
history, no automatic remapping of refs/remotes/origin/* to
297+
refs/heads/*, etc.), this also may cause problems for pruning of
298+
degenerate empty merge commits when negative revisions are
299+
specified.
300+
281301
--dry-run::
282302
Do not change the repository. Run `git fast-export` and filter its
283303
output, and save both the original and the filtered version for
@@ -699,6 +719,23 @@ The reason to specify --force is two-fold: filter-repo will error out
699719
if no arguments are specified, and the new graft commit would
700720
otherwise trigger the not-a-fresh-clone check.
701721

722+
Partial history rewrites
723+
~~~~~~~~~~~~~~~~~~~~~~~~
724+
725+
To rewrite the history on just one branch (which may cause it to no longer
726+
share any common history with other branches), use `--refs`. For example,
727+
to remove a file named 'extraneous.txt' from the 'master' branch:
728+
729+
--------------------------------------------------
730+
git filter-repo --invert-paths --path extraneous.txt --refs master
731+
--------------------------------------------------
732+
733+
To rewrite just some recent commits:
734+
735+
--------------------------------------------------
736+
git filter-repo --invert-paths --path extraneous.txt --refs master~3..master
737+
--------------------------------------------------
738+
702739
[[CALLBACKS]]
703740
CALLBACKS
704741
---------
@@ -946,8 +983,11 @@ Some notes or exceptions on each of the above:
946983
are that they've only rewritten trees and commits and maybe a few
947984
blobs, so `--aggressive` isn't needed and would be too slow.)
948985

949-
Information about these steps is printed out when `--debug` is passed to
950-
filter-repo.
986+
Information about these steps is printed out when `--debug` is passed
987+
to filter-repo. When doing a `--partial` history rewrite, steps 2, 3,
988+
7, and 8 are unconditionally skipped, step 5 is skipped if
989+
`--replace-refs` is `update-no-add`, and just the nuke-unused-refs
990+
portion of step 5 is skipped if `--replace-refs` is something else.
951991

952992
Limitations
953993
~~~~~~~~~~~
@@ -1041,18 +1081,16 @@ Issues specific to filter-repo
10411081
such as `-M` or `-C` would break assumptions used in other places of
10421082
filter-repo.
10431083

1044-
* Partial-repo filtering does not mesh well with filter-repo's "avoid
1045-
mixing old and new history" design. filter-repo has some capability
1046-
in this area but it is intentionally underdocumented and mostly left
1047-
for use by external scripts which import filter-repo as a module
1048-
(some examples in contrib/filter-repo-demos/ do use this). The only
1049-
real usecases I've seen for partial repo filtering, though, are
1050-
sidestepping filter-branch's insanely slow execution on commits that
1051-
would not be changed by the filters in question anyway (which is
1052-
largely irrelevant since filter-repo is multiple orders of magnitude
1053-
faster), or to do operations better suited to linkgit:git-rebase[1]
1054-
and which rebase grew special options for years ago (e.g. the
1055-
`--signoff` option).
1084+
* Partial-repo filtering, while supported, runs counter to filter-repo's
1085+
"avoid mixing old and new history" design. This support has required
1086+
improvements to core git as well (e.g. it depends upon the
1087+
`--reference-excluded-parents` option to fast-export that was added
1088+
specifically for this usage within filter-repo). The `--partial` and
1089+
`--refs` options will continue to be supported since there are people
1090+
with usecases for them; however, I am concerned that this inconsistency
1091+
about mixing old and new history seems likely to lead to user mistakes.
1092+
For now, I just hope that long explanations of caveats in the
1093+
documentation of these options suffice to curtail any such problems.
10561094

10571095
Comments on reversibility
10581096
^^^^^^^^^^^^^^^^^^^^^^^^^

contrib/filter-repo-demos/bfg-ish

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -400,7 +400,7 @@ class BFG_ish:
400400
stdin = subprocess.PIPE,
401401
stdout = subprocess.PIPE)
402402
self.args = bfg_args
403-
# Setting source and target to anything prevents:
403+
# Setting partial prevents:
404404
# * remapping origin remote tracking branches to regular branches
405405
# * deletion of the origin remote
406406
# * nuking unused refs
@@ -411,9 +411,8 @@ class BFG_ish:
411411
# The third is irrelevant since BFG has no mechanism for renaming refs,
412412
# and we'll manually add the fourth and fifth back in below by calling
413413
# RepoFilter.cleanup().
414-
fr_args = fr.FilteringOptions.parse_args(['--source', '.',
415-
'--target', '.',
416-
'--force'] + extra_args)
414+
fr_args = fr.FilteringOptions.parse_args(['--partial', '--force'] +
415+
extra_args)
417416
self.filter = fr.RepoFilter(fr_args, commit_callback=self.commit_update)
418417
self.filter.run()
419418
if new_replace_file:

contrib/filter-repo-demos/filter-lamely

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -585,9 +585,7 @@ class UserInterfaceNightmare:
585585
self.args.prune_empty = True
586586
fr_args = fr.FilteringOptions.parse_args(['--preserve-commit-hashes',
587587
'--preserve-commit-encoding',
588-
'--replace-refs', 'update-no-add',
589-
'--source', '.',
590-
'--target', '.',
588+
'--partial',
591589
'--force'] + extra_args)
592590
fr_args.prune_empty = 'always' if self.args.prune_empty else 'never'
593591
fr_args.refs = self.get_extended_refs()

contrib/filter-repo-demos/signed-off-by

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,7 @@ def add_signed_off_by_trailer(commit, metadata):
5858
# * nuking reflogs
5959
# * repacking
6060
# so we cheat and set source and target both to '.'
61-
args = fr.FilteringOptions.parse_args(['--source', '.',
62-
'--target', '.',
63-
'--force',
64-
'--replace-refs', 'update-no-add',
61+
args = fr.FilteringOptions.parse_args(['--force',
6562
'--refs'] + myargs.rev_list_args)
6663
args.refs = myargs.rev_list_args
6764
filter = fr.RepoFilter(args, commit_callback=add_signed_off_by_trailer)

git-filter-repo

Lines changed: 38 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1673,10 +1673,6 @@ EXAMPLES
16731673
"useful in determining what to filter in a subsequent run. "
16741674
"Will not modify your repo."))
16751675

1676-
refs = parser.add_argument_group(title=_("Git References"))
1677-
refs.add_argument('--refs', nargs='*', default=['--all'],
1678-
help=argparse.SUPPRESS)
1679-
16801676
path = parser.add_argument_group(title=_("Filtering based on paths "
16811677
"(see also --filename-callback)"))
16821678
path.add_argument('--invert-paths', action='store_false', dest='inclusive',
@@ -1846,9 +1842,10 @@ EXAMPLES
18461842
"CALLBACKS section below."))
18471843

18481844
desc = _(
1849-
"Specifying alternate source or target locations will disable some \n"
1850-
"auxiliary steps such as disconnecting the origin remote, and avoiding\n"
1851-
"mixing new and old history.")
1845+
"Specifying alternate source or target locations implies --partial,\n"
1846+
"except that the normal default for --replace-refs is used. However,\n"
1847+
"unlike normal uses of --partial, this doesn't risk mixing old and new\n"
1848+
"history since the old and new histories are in different repositories.")
18521849
location = parser.add_argument_group(title=_("Location to filter from/to"),
18531850
description=desc)
18541851
location.add_argument('--source', type=os.fsencode,
@@ -1862,6 +1859,29 @@ EXAMPLES
18621859
misc.add_argument('--force', '-f', action='store_true',
18631860
help=_("Rewrite history even if the current repo does not look "
18641861
"like a fresh clone."))
1862+
misc.add_argument('--partial', action='store_true',
1863+
help=_("Do a partial history rewrite, resulting in the mixture of "
1864+
"old and new history. This implies a default of "
1865+
"update-no-add for --replace-refs, disables rewriting "
1866+
"refs/remotes/origin/* to refs/heads/*, disables removing "
1867+
"of the 'origin' remote, disables removing unexported refs, "
1868+
"disables expiring the reflog, and disables the automatic "
1869+
"post-filter gc. Also, this modifies --tag-rename and "
1870+
"--refname-callback options such that instead of replacing "
1871+
"old refs with new refnames, it will instead create new "
1872+
"refs and keep the old ones around. Use with caution."))
1873+
# WARNING: --refs presents a problem with become-degenerate pruning:
1874+
# * Excluding a commit also excludes its ancestors so when some other
1875+
# commit has an excluded ancestor as a parent we have no way of
1876+
# knowing what it is an ancestor of without doing a special
1877+
# full-graph walk.
1878+
misc.add_argument('--refs', nargs='+',
1879+
help=_("Limit history rewriting to the specified refs. Implies "
1880+
"--partial. In addition to the normal caveats of --partial "
1881+
"(mixing old and new history, no automatic remapping of "
1882+
"refs/remotes/origin/* to refs/heads/*, etc.), this also may "
1883+
"cause problems for pruning of degenerate empty merge "
1884+
"commits when negative revisions are specified."))
18651885

18661886
misc.add_argument('--dry-run', action='store_true',
18671887
help=_("Do not change the repository. Run `git fast-export` and "
@@ -2065,6 +2085,12 @@ EXAMPLES
20652085
args.strip_blobs_with_ids = set(f.read().split())
20662086
else:
20672087
args.strip_blobs_with_ids = set()
2088+
if (args.partial or args.refs) and not args.replace_refs:
2089+
args.replace_refs = 'update-no-add'
2090+
if args.refs or args.source or args.target:
2091+
args.partial = True
2092+
if not args.refs:
2093+
args.refs = ['--all']
20682094
return args
20692095

20702096
class RepoAnalyze(object):
@@ -3475,8 +3501,6 @@ class RepoFilter(object):
34753501
.format(decode(self._fe_filt)))
34763502

34773503
def _migrate_origin_to_heads(self):
3478-
if self._args.dry_run or self._args.source or self._args.target:
3479-
return
34803504
refs_to_migrate = set(x for x in self._orig_refs
34813505
if x.startswith(b'refs/remotes/origin/'))
34823506
if not refs_to_migrate:
@@ -3532,7 +3556,7 @@ class RepoFilter(object):
35323556
# Remove unused refs
35333557
exported_refs, imported_refs = self.get_exported_and_imported_refs()
35343558
refs_to_nuke = exported_refs - imported_refs
3535-
if self._args.source or self._args.target:
3559+
if self._args.partial:
35363560
refs_to_nuke = set()
35373561
if refs_to_nuke and self._args.debug:
35383562
print("[DEBUG] Deleting the following refs:\n "+
@@ -3690,7 +3714,8 @@ class RepoFilter(object):
36903714
start = time.time()
36913715
if not self._input and not self._output:
36923716
self._run_sanity_checks()
3693-
self._migrate_origin_to_heads()
3717+
if not self._args.dry_run and not self._args.partial:
3718+
self._migrate_origin_to_heads()
36943719
self._setup_input(use_done_feature = True)
36953720
self._setup_output()
36963721
assert self._sanity_checks_handled
@@ -3725,7 +3750,7 @@ class RepoFilter(object):
37253750
self._save_marks_files()
37263751

37273752
# Notify user how long it took, before doing a gc and such
3728-
repack = (not self._args.source and not self._args.target)
3753+
repack = (not self._args.partial)
37293754
msg = "New history written in {:.2f} seconds..."
37303755
if repack:
37313756
msg = "New history written in {:.2f} seconds; now repacking/cleaning..."
@@ -3749,7 +3774,7 @@ class RepoFilter(object):
37493774
# Write out data about run
37503775
self._record_metadata(self.results_tmp_dir(), self._orig_refs)
37513776

3752-
# Nuke the reflogs and repack
3777+
# If repack, then nuke the reflogs and repack. If reset, do a reset --hard
37533778
reset = not GitUtils.is_repository_bare(target_working_dir)
37543779
RepoFilter.cleanup(target_working_dir, repack, reset,
37553780
run_quietly=self._args.quiet,

0 commit comments

Comments
 (0)