01 Feb 2015
Writing history, actually re-writing it.
RubyGems.org was getting out of hand, not in terms of code, but the git repository was way too big. Everytime someone wanted to clone the repo, it would take a long time, as the repo was over 500MB. The code itself is not big at all, but we need to vendor all the gems we use. You might be wondering why we need to vendor the RubyGems.org gem dependencies. Most projects can simply install gems from RubyGems.org when they are deployed. But RubyGems.org itself might have a critical bug that causes it to be unavailable. The only way to deploy a fix to such a bug is to ensure the RubyGems.org codebase does not depend on the RubyGems.org service being available. Vendoring more than 100 gems cost space, also everytime a new gem is updated, the old versions live forever in the history. Git is distruibuted source control, and when you clone the repo you clone all branches, tags and history attached to them. That said, the repository would just grow and became harder and harder to be cloned.
git clone --depth=1 would be an easier solution. However the problem about this is that everyone that clones the repo would have to know about the
depth flag. Another problem about it, is that you would not clone the history locally, so searches or things like
git-blame would not work.
Create a separate
vendor/cache folder in a another git repository, and add that as a git submodule. If
vendor/cache folder is not part of the main repo, history on that folder would not be tracked by the main repo. Therefore the RubyGems.org repository would not grow immensely with every gem update.
However that would not solve the problem of having a 600MB repository. In order to fix that, we would have to rewrite history of the repository to remove all the vendored files from history. And that’s exactly what we did. As we were rewriting history we also decided to remove a few other big folders and files from the history:
And lastly we moved
vendor/cache out of the history to another repository
RubyGems.org is an open source project, and contributions are always welcome, so a small and faster repository is key to make the project more approachable for the community.
$ git clone firstname.lastname@example.org:rubygems/rubygems.org-backup.git $ du -skh . 536M . $ git clone email@example.com:rubygems/rubygems.org.git $ du -skh . 11M .
Impact on development
Everyone must rebase
Everyone that has a PR to
rubygems/rubygems.org, must rebase against the new history. Locally, this means that clones of
rubygems/rubygems.org can either delete and clone it again, or just
git fetch --all; git pull --rebase.
Nothing changed, still
bundle install will do its job.
Updating or adding a new gem
Just add the gem to
Gemfile or run
bundle update gem_name, and send a PR with changes to
Gemfile.lock only. There is no need to update the
vendor/cache folder anymore, or to send a PR to the vendor repo. The RubyGems team will make sure to update the vendor folder.