2023-03-01 | Reading time: 5 min
Have you ever faced the panic of trying to push your changes to your GitHub repository, only to be met with an error message? Recently, I found myself in such a situation when I accidentally tried to push a file that exceeded the 100 MB storage limit. In this blog post, I will share with you how to undo a commit with large files and provide one way how to avoid this situation at all. Spoiler alert: gitignore.io
will be your new best friend❕
Git is my favorite tool for managing and tracking changes in code repositories. However, Git was primarily designed for text files and can struggle when handling large binary files, like outputs from quantum chemical software or databases. This is due to the fact, that Git treats each binary file as a single object, and each change to the file as a new object.
The Error Message
In my case, I was trying to push my changes to my GitHub repository and see an error message that reads:
This error occurs when a file that is larger than 100 MB is added to the repository.
In the following, I show you how to fix this error by amending (i.e. editing) the commit where the large file (m24_i1.db
) was added to rewrite history.
The respective undo-procedure depends on which commit added the large file.
Here I cover both scenarios, i.e., when the large file was a) just added in the most recent commit, or b) committed prior to the most recent commit. 🌟
If the large file was just added in the most recent commit, you can amend the commit to remove the large file. First, make the change to the commit, and then run the following commands in the terminal at the root of your repository:
Replace m24_i1.db
with the name of your large file.
If the large file was committed prior to the most recent commit, you will need to perform an interactive rebase. This will allow you to edit the specific commit where the large file was added, while leaving the other commits intact.
First, run the following command in the terminal to print out the commit history of the repository:
Identify the last “good” commit before the large file was added. You may need to try this process repeatedly with different commits until you find the right one. Once you have identified the last “good” commit, note the commit hash.
Run the following command using the commit hash noted in step 1:
This will open up a file in your Git editor that lists the commits from the last “good” commit to the current commit.
Find the commit where the large file was added and change the command from pick
to edit
.
Save the file and close the editor.
Now that you have initiated the interactive rebase, Git will pause at the commit where the large file was added. Run the following command in the terminal to remove the large file:
Replace m24_i1.db
with the name of your large file.
After removing the large file, amend the commit by running the following command:
This will amend the commit without changing the commit message.
Resume the rebase by running the following command:
Git will continue the rebase and apply your changes to the remaining commits.
❕CAUTION: When you start rewriting history, running the wrong command may delete the large file. If the contents of the file are important make a copy of the file outside of the repository so that you can recover it later.
You should see an output like:
.
Following these steps, you should now be able to perform a git push
without encountering any error messages related to large files. Enjoy the smooth sailing! ⛵
In summary, if you have large files that are crucial for tasks such as parsing files or utilizing databases in Machine Learning, there are two options.
You can either keep them outside of the repository or create a gitignore.io
file to exclude them.
From my perspective, I highly recommend the latter option and checking out this website! 👩🔬