Skip to content

[Question] How to delete all history of a file but keep the current version commited? #667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
harogaston opened this issue May 16, 2025 · 3 comments

Comments

@harogaston
Copy link

Hi! I arrived to this tool with the aim of cleaning my repo's history of some sensitive data.

I started by removing the sensitive data from the tip of all existing branches. We can assume all REFs points to versions of files with no sensitive data. So wanting to re-write and clean up the commit history from sensitive data I did:

$ git filter-repo --sensitive-data-removal --invert-paths --path path/to/file1 --path path/to/file2

I was expecting that after that I would find path/to/file1 and path/to/file2 (and in fact the whole repo) in the same state for every REF (brach specifically) but I found the files are now gone - deleted.

I probably understood this tool wrong. Now, is there a way to achieve what I need? Any help would be much appreciated.
Thank you in advance.

@newren
Copy link
Owner

newren commented Jun 6, 2025

I was expecting that after that I would find path/to/file1 and path/to/file2 (and in fact the whole repo) in the same state for every REF (brach specifically)

Why would you expect that? I don't understand where such an assumption would come from; --invert-paths is specifically documented to mean that you only select files that don't match any of the --path options for keeping, meaning whatever paths you give will be fully discarded from (all versions of) your repository.

I started by removing the sensitive data from the tip of all existing branches

Why? Wouldn't it make more sense to remove the bad files first, then re-add and commit versions of the relevant files without sensitive data afterward?

Or, alternatively, instead of adding a separate step of committing clean versions of the files, to instead do the file cleanup as part of the rewrite operation, e.g. with --file-info-callback?

Now, is there a way to achieve what I need?

I think the alternatives I outlined above would be simpler and achieve what you need. But if you really wanted to go about it this Rube Goldberg-esque way you outlined, you could probably do it with a --commit-callback. First, find out the commit ids where you modified these files into a good state (i.e. the commits you made on each branch to remove the sensitive data; I'll assume you had two branches affected and that the commit ids were 8df00d8 and 42618ac as examples). Then use the --commit-callback (warning, just typing this off the top of my head and looking a bit at the docs; didn't actually test to make sure I didn't have typos or anything):

$ git filter-repo --commit-callback '
    if commit.original_id not in [b'8df00d8fa8a7357c2d7ef0f9a843649c7f80784b', b'42618ac23c69c4e39134987310433ae82f00f868']:
        commit.file_changes = [
            change for change in commit.file_changes
            if change.filename != b'path/to/file1' and change.filename != b'path/to/file2']
'

@harogaston
Copy link
Author

Well

Why? Wouldn't it make more sense to remove the bad files first, then re-add and commit versions of the relevant files without sensitive data afterward?

was what I ended up doing, at least in concept. Hope to never have to go through this again in the future but thank you for taking the time and your recommendation.

@newren
Copy link
Owner

newren commented Jun 9, 2025

was what I ended up doing, at least in concept.

Cool, glad you got it working.

Hope to never have to go through this again in the future but thank you for taking the time and your recommendation.

Yeah, dealing with sensitive data leaks is no fun. Glad you got it squared away, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants