Migrating from Subversion to Git: Lessons Learned

  December 16, 2010

For a few years, we on the SmartBear Collaborator team used Subversion as our version control system of choice.    Recently, we decided to switch to Git for all the usual reasons and, as with any technology migration, we wanted to do it with as little impact to our development as possible.

We've been using Git as our canonical repository for a month now without any developer downtime and minimal interruption to our usual processes, so it seems safe to say that our migration has been a success.  Given that, I wanted to share some suggestions for making your conversion go smoothly as well.

Choose the Right Tools

Lots of documentation about migrating from Subversion to Git will tell you to use git-svn to do your conversion.  While git-svn has the advantage of being built in to recent versions of Git, it has a few distinct disadvantages which we ran up against during our first attempts at migration.  First, it doesn't deal well with non-standard SVN repository layouts.  You can specify where your trunk is, and where your "branches" and "tags" directories are, but you'll run into problems if you don't have "trunk" directory, or if you have subdirectories in your "branches" or "tags" directories.

More importantly, git-svn is SLOW.   At the time of our conversion, our SVN repository had roughly 15,000 revisions -- not very large as SVN repos go.  But it took around FOUR DAYS for git-svn to convert our repository -- four days of babysitting a process that would leak memory until it coredumped and had to be restarted.  This made the process of debugging our migration configuration excruciatingly long.

Thankfully, the helpful people in the #git IRC channel pointed me in the right direction.  There's a handy tool called svn2git that was written by the KDE team to help migrate their (much larger) SVN repository.  Unfortunately, there's at least one other tool called "svn2git" which seems to have higher Google page ranking, so make sure you use the one linked above.   There's some documentation for svn2git on the KDE TechBase wiki, but it's a bit KDE-centric.  If you're looking to give svn2git a try, the best way to get started is by reading some of the sample configuration files included with the source.   Svn2git uses a pattern-matching schema which you can configure in a plain text file which works with arbitrarily complex and non-standard SVN layouts.  And it's blazingly fast -- at least in comparison to git-svn.  It converted our entire repository in under 4 minutes.

I don't want to go into too much detail about  svn2git configuration.  It runs so quickly it'll be easy for you to do a few test runs and tweak your configuration as necessary.  However, a couple tips:

Import SVN tags as branches, then convert to Git tags

In Subversion, tags are directories, just like branches.  They only differ by convention.  Svn2git will import your tags as branches, which you can convert to real Git tags later.  That task is simpler if you prefix your SVN tags as they are converted to git branches, like so:

# (Svn2Git configuration snippet): 

# Convert SVN tags to Git branches prefixed with tag--:

# (We'll convert to git tags later.)

match /smartbear/tags/([^/]+)/

repository collab

branch tag--1

end match

Once svn2git has finished, you can create git tags like this:

git branch |

# Remove spaces at beginning of line:

sed s/..// |

# Only get 'tag' branches:

grep ^tag-- |

# Strip down to just the tag name:

sed s/tag--// |

while read tagname; do

git tag -a "$tagname" -m "Tag imported from SVN." "tag--$tagname" >/dev/null 2>/dev/null

&& echo "tagged: $tagname"

done

Convert With Metadata

Svn2git's command (svn-all-fast-import) takes an --add-metadata option which will add a line like the following to each of the Git commit messages it creates:

svn path=/smartbear/branches/version-6.0/; revision=15522

I'm always a fan of having more information, and this info comes in particularly handy if you ever referenced SVN revision IDs in your internal bug tracking system, or commit messages.  With this metadata, you can search for the relevant revision with its old SVN revision ID in your new Git repository instead of having to open up an old archive of your SVN repository.

Create a Mirror

The most difficult part of migrating away from a source control system is usually not the switch to a new program for checking in code -- it's switching all of your processes that rely on your old SCM system to the new system.  Think of all of the processes that have knowledge of your SCM system which will be affected if it goes away:

  • Continuous integration system(s)
  • Release Process(es)
  • Code Review
  • SCM Web Interface / History Browser

Each of these will need to be updated to use your new Git repository.  In some cases you may even need to replace one of the above with software that can speak to Git.  (We ditched Trac's SVN browser and switched to GitWeb.)

You don't want to have to figure this all out on the day that you switch to Git, but you need a Git repository which you can test against to make sure your systems will work.  This is where a mirror comes in handy.

1. Create a script that periodically fetches new revisions from your SVN repository and puts them into your new Git repository.  (The svn2git option --resume-from SVN_REV is handy here.)

2. Make sure your Git repository is in its future "production" location.  Sure, it's not in production yet, but you don't want to have to configure all of your systems twice when you flip the switch and make Git your authoritative repository.

3. Reconfigure each of your systems to pull code from your new Git repository.

At this point, your developers are still committing code to Subversion, but your other systems are integrated with Git.  It's time to let the other developers know what's up.

Keep it Simple

Git is great as a source control system, but it can also be a bit overwhelming.  It has the usual high-level commands for dealing with a source code repository: commit files, check out particular versions of files, update your working copy to the latest version, etc.   But it also has quite a few low-level commands, commonly called "plumbing."  It also has several commands that help with rewriting history, which can be a powerful tool for good or evil... or "Oops!"  

When discussing the switchover to Git with other developers, I chose to keep things simple by doing the following:

  • As examples, choose the most common Subversion workflows already in use by developers.
  • Document  how each step of that workflow would happen in Git.   (For examples, see Git for Subversion Users.) We found it useful to tailor the documentation for our internal workflows, and to place the documentation in our own wiki.
  • Schedule time to step through examples with your developers.  It's one thing to read instructions on a wiki, but when you get everyone in a room watching everything on screen, you'll get a lot more involvement.  You'll also be able to answer any questions that come up as you step through the process.

The most important part of keeping things simple is to avoid extraneous information.  In particular, I made a point not to discuss commands that rewrite history (rebase, cherry-pick), and instead focused on best practices for  creating feature branches and merging.

Some developers may want to jump into using Git by using git-svn to make commits to your SVN repository before you've made the switch to Git.  I'd recommend against this approach for a few reasons:

  • Git-svn has a very steep learning curve.  You have to learn Git, plus git-svn, and you have to become familiar with the limitations of trying to push Git changes which can have non-linear history into a Subversion repository, which only supports linear history.
  • Because of the above limitations, git-svn has a unique workflow, with different commands than those developers will use with a Git repository, so learning it is only marginally useful toward getting up to speed with your new git repo.
  • Any local repository created with git-svn will not be compatible with the repository created with svn2git, so the user will have to discard their git-svn repository and clone the svn2git one.

Since you've already got a mirror, you can point any eager users at the mirror.  They can clone the mirror and browse all of the imported history.  They can even try out branching and merging in their local repository without affecting the central repo.

Make the Switch

Now that you've got a Git repository and you've given your developers enough information to interact with it, it's time to dive in and start using it.  You've already done the heavy lifting, so the last step is to flip the switch and have your developers start committing.

First, make your SVN repository read-only.  Depending on the way that your developers access your subversion repository, there are several ways you could do that.  One of the simplest methods is a pre-commit hook, which gives you the option to specify a meaningful error message to your developers:

#!/bin/bash

REPOS="$1"

TXN="$2"

# Disable committing to the project we've moved to Git:

if svnlook changed "$REPOS" -t "$TXN" |grep -q "smartbear/" ; then

    echo "Don't commit to SVN, use Git!" 1>&2

    exit 1

fi

exit 0

Now that you know there will be no new revisions in Subversion, update your Git mirror one last time, and disable the mirror script.   Your developers should now be able to clone the Git repo and push their changes back to the server.  (This assumes you have properly set up your server and permissions, which I haven't covered here.)

Since all of your other processes have already been querying Git, this last step might seem a bit anticlimactic, but that was the goal!   You can still expect a few questions about using Git from your developers for some more complicated operations, but for everything else, you should be up and running.

If you found this information useful, or if you have other suggestions to add, we'd love to hear it!

See also: