Large Files and Git: My 'AHA' Moment of February 2023


2023-03-01 | Reading time: 5 min





Have you ever faced the panic of trying to push your changes to your GitHub repository, only to be met with an error message? Recently, I found myself in such a situation when I accidentally tried to push a file that exceeded the 100 MB storage limit. In this blog post, I will share with you how to undo a commit with large files and provide one way how to avoid this situation at all. Spoiler alert: gitignore.io will be your new best friend❕

Large Files with Git – Removing Large Files from commit history

Git is my favorite tool for managing and tracking changes in code repositories. However, Git was primarily designed for text files and can struggle when handling large binary files, like outputs from quantum chemical software or databases. This is due to the fact, that Git treats each binary file as a single object, and each change to the file as a new object.

The Error Message

In my case, I was trying to push my changes to my GitHub repository and see an error message that reads:

remote: error: File m24_i1.db is 218.83 MB; this exceeds GitHub's file size limit of 100.00 MB.

This error occurs when a file that is larger than 100 MB is added to the repository. In the following, I show you how to fix this error by amending (i.e. editing) the commit where the large file (m24_i1.db) was added to rewrite history. The respective undo-procedure depends on which commit added the large file. Here I cover both scenarios, i.e., when the large file was a) just added in the most recent commit, or b) committed prior to the most recent commit. 🌟

A) The Large File Was Just Added in the Most Recent Commit

If the large file was just added in the most recent commit, you can amend the commit to remove the large file. First, make the change to the commit, and then run the following commands in the terminal at the root of your repository:

git rm --cached m24_i1.db
git commit --amend -C HEAD

Replace m24_i1.db with the name of your large file.

B) The Large File Was Committed Prior to The Most Recent Commit

If the large file was committed prior to the most recent commit, you will need to perform an interactive rebase. This will allow you to edit the specific commit where the large file was added, while leaving the other commits intact.

1) Locating the last “good” commit

First, run the following command in the terminal to print out the commit history of the repository:

git log --oneline

Identify the last “good” commit before the large file was added. You may need to try this process repeatedly with different commits until you find the right one. Once you have identified the last “good” commit, note the commit hash.

2) Initiate a rebase between the last “good” commit and the current commit

Run the following command using the commit hash noted in step 1:

git rebase -i <commit hash>

This will open up a file in your Git editor that lists the commits from the last “good” commit to the current commit. Find the commit where the large file was added and change the command from pick to edit. Save the file and close the editor.

3) Remove the large file

Now that you have initiated the interactive rebase, Git will pause at the commit where the large file was added. Run the following command in the terminal to remove the large file:

git rm --cached m24_i1.db

Replace m24_i1.db with the name of your large file.

4) Amend the commit

After removing the large file, amend the commit by running the following command:

git commit --amend --allow-empty -C HEAD

This will amend the commit without changing the commit message.

5) Continue the rebase

Resume the rebase by running the following command:

git rebase --continue

Git will continue the rebase and apply your changes to the remaining commits.

❕CAUTION: When you start rewriting history, running the wrong command may delete the large file. If the contents of the file are important make a copy of the file outside of the repository so that you can recover it later.

You should see an output like:

Successfully rebased and updated refs/heads/master

. Following these steps, you should now be able to perform a git push without encountering any error messages related to large files. Enjoy the smooth sailing! ⛵

What’s next?

In summary, if you have large files that are crucial for tasks such as parsing files or utilizing databases in Machine Learning, there are two options. You can either keep them outside of the repository or create a gitignore.io file to exclude them. From my perspective, I highly recommend the latter option and checking out this website! 👩‍🔬