[ANNOUNCE] git-svnconvert: YASI (Yet Another SVN importer)

Rutger Nijlunsing rutger at nospam.com
Wed Aug 31 21:59:46 BST 2011


Hi all,

Since I didn't succeed in importing a (private) SVN repo into git, I
wrote a new converter to handle more cases like:

  - use command line svn instead of some perl library to have less
    dependancies and to support proxy + repo authentification.
    Might even work on MacOSX ;)
  - automatic merge detection by looking at from which revision a
    revision gets its files
  - visualisation of the branch tree with dotty to check what git-convert
    would produce _before_ importing it.
  - /trunk is moved to /branches/old, /branches/new_branch becomes /trunk
    in next revision (ARGH!)
  - To be able to continue after a ^C and be able to continue where
    it stopped.
  - have configurable repo layout to handle things like:
    - files next to branches in /branches
    - /branches/

Regards,
Rutger.

-- 
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------
-------------- next part --------------
#!/usr/bin/env ruby

# Convert a Subversion repository with all its branches
# into git incrementally.
#
# The main difference with git-svnimport is
# that it handles a badly-broken archive which I wanted to 
# convert which git-svnimport did not handle ;)
#
# But, of course, your milage may (will!) vary. svn only defines
# snapshot-trees and interpretation of each leaf in the tree is left
# as an exercise for the reader.
#
# Features:
#  - import _all_ branches
#  - uses svn command line, so:
#    - supports HTTP proxy with authentification
#    - supports repository authentification
#    - supports incorrect (self-signed) SSL certificates ;)
#  - can handle multiple branche changes in one revision
#  - supports extra files in /branches next to real branches
#  - gets merge information by looking at which revisions we're based on
#    (so no need to parse commit messages)
#    This (only) works when the merge of two branches results
#    in files from both branches.
#  - handles missing revisions
#  - no extra packages needed, just ruby and subversion.
#  - outputs GraphViz .dot files for visualisation of branches
#  - fairly interrupt-save. Once in a while it saves it state
#    to be able to continue next time where it left off.
#  - works parallel: one process fetches, while another copies
#    it into git.
# Con:
#  - needs a complete, checked-out SVN repository locally
#    _for_ _each_ _branch_.
#    This takes disk space. However, disk space is cheap ;)
# TODO:
#  - parse XML as XML instead of text. In case the exact XML formatting
#    changes in the future...
#  - add tagging support
#  - handle file properties like execute and ignore
#  - use 'svn switch' to checkout other branches fast
#
# Workings
#  - Get a log of all commits
#  - Create a branches graph from this log
#  - Check out each revision separately and write a commit
#    for each revision.
#
# (c)2006 R. Nijlunsing <git-svnconvert at tux.tmfweb.nl>
# Released under the GNU Public License, version 2.


$VERBOSE = true			# Let Ruby warn more

require 'set'
require 'time'
require 'find'
require 'fileutils'
require 'optparse'
require 'cgi'			# For unescapeHTML()

#################### Configuration

# Root directories of the branches and trunk(s).
#   - one line per directory
#   - '*' matches one filename component and is matched last
#   - names are case sensitive
#   - whitespaces at start and end are ignored
$branch_dirs = %q{
  /branches/*
  /branches/cc_test/*
  /branches/pre-dev/trunk
  /branches/pre-dev/trunk_old
  /branches/tasks/*
  /trunk
}

# List of paths which might get matched by $branch_dirs,
# but which are not the root of branches.
#
# Argh. People put files into the root of branches,
# which ends up as a branch (so for example /branches/README
# ends up as a branch). List here all files in the roots
# to ignore them.
$not_branch_dirs = %q{
  /branches/README
  /branches/pre-dev
  /branches/cc_test
  /branches/tasks
}

#################### End of configuration

def read_svn_authors(authors_filename)
  users = {}
  begin
    IO.foreach(authors_filename) { |line|
      if line =~ %r{^(\S+?)\s*=\s*(.+?)\s*<(.+)>\s*$}
	user, name, email = $1, $2, $3
	users[user] = [name, email]
      end
    }
  rescue Errno::ENOENT, Errno::EACCES
    die("Could not read #{authors_filename}: #{$!}")
  end
  if $verbose
    puts "Read #{users.size} authors from #{authors_filename}"
  end
  return users
end

def write_svn_authors(users, authors_filename)
  begin
    File.open(authors_filename, "wb") { |io|
      users.keys.sort.each { |user|
	io.puts "#{user} = #{users[user][0]} <#{users[user][1]}>"
      }
    }
    if $verbose
      puts "Wrote #{users.size} authors to #{authors_filename}"
    end
  rescue Errno::EACCES
    die("Could not write #{authors_filename}: #{$!}")
  end
end

# Given a string with on each line a root directory, generate a regular
# expression matching one of those root directories.
def root_dirs_to_regexp(rootdirs, is_prefix)
  Regexp.new(
    "^(?:" +			# Match at start without capturing
    rootdirs.
      split("\n").
      find_all { |dir| dir.strip != "" }.	# Delete empty lines
      collect { |dir|
        # make path absolute; use '/' as path separator;
        ("/" + dir.strip).gsub(%r{[\\/]+}, "/")
      }.sort { |dir1, dir2|
        # Sort on size so '/branches/development/trunk' comes before
        # '/branches/*' and will therefore be matched.
        dir2.size <=> dir1.size
      }.collect { |dir|
        Regexp.escape(dir)	# Escape everything
      }.join("|").gsub("\\*", "[^/]+") + # Unescape '*' back
    ")" +
    # If not a prefix, must match at end
    # If prefix, must match at path delimiter (or at end)
    (is_prefix ? "(?=/|$)" : "$")  
  )
end

# Execute shell command; bail out at error
def safe_system(cmd)
  puts cmd if $verbose
  system(cmd)
  if $? != 0
    puts cmd if !$verbose
    puts "!!! Command returned non-zero exit code: #{$?}"
    puts "!!! Working dir: #{Dir.pwd}"
    exit $?
  end
end

def safe_popen(cmd, mode = "w+", &callback)
  puts "|" + cmd if $verbose
  res = IO.popen(cmd, mode, &callback)
  if $? != 0
    puts cmd if !$verbose
    puts "!!! Command returned non-zero exit code: #{$?}"
    puts "!!! Working dir: #{Dir.pwd}"
    exit $?
  end
  return res
end

module Shell
  # Escape string string so that it is parsed to the string itself
  # Compare to Regexp.escape .
  def self.escape(string)
    string !~ %r{[ "\\]}i ? 
      string : '"' + string.gsub(%r{(["\\])}i, '\\\\\1') + '"'
  end
end

def svn_common_args(branch = "")
  repo_url = $repo_url
  repo_url += "/" + branch if branch != ""
  res = "--non-interactive"
  res += " --username #{Shell.escape($username)}" if $username
  res += " --password #{Shell.escape($password)}" if $password
  res += " #{Shell.escape(repo_url)}"
  res
end

def svn_get_current_revision
  $stderr.print "Retrieving current HEAD revision... " if $verbose
  # 'svn info' doesn't always work to get most recent revision.
  # So parse output of 'svn log -r HEAD:HEAD'.
  svn_info = `svn log --xml -r HEAD:HEAD #{svn_common_args}`
  svn_info =~ %r{\srevision=\"(\d+)\"}m
  $stderr.puts "r#{$1}" if $verbose
  $1.to_i
end

# Prefix each line with "!!!" and only warn:
#  - once for each unique warning
#  - at most a fixed number of times per caller
$warning_txt = Set.new		# All dumped warnings texts
$warning_callers = {}		# Per caller: number of warnings
def warning(*txt)
  txt = txt.flatten.collect { |t| t.split("\n") }.flatten
  if !$warning_txt.include?(txt)
    $warning_txt << txt
    backtrace = caller[0]
    $warning_callers[backtrace] ||= 0
    times_warned = ($warning_callers[backtrace] += 1)
    if times_warned <= 5
      first = true
      txt.each { |line|
	$stderr.puts((first ? "!!!" : "   ") + " #{line}")
	first = false
      }
      if times_warned == 5
	$stderr.puts 
          "    (more of this type of warnings will be suppressed)"
      end
    end
  end
end

def die(*txt); warning(*txt); exit 1; end

class BranchColor
  @@available_colors = [
    "green", "yellow", "orange", "cyan", "steelblue3",
    "lightblue", "thistle", "red", '".7 .3 1.0"',
    "navy", "violet", "crimson", "azure", "linen", "peru",
    "tan", "darkgreen", "coral"
  ]
  @@branch_color = {}		# String branch -> String color

  def self.color(branch)
    # For a new branch, take a fixed color or a random one if out-of-colors.
    @@branch_color[branch] ||= (
      @@available_colors.shift ||
      ('"%.1f %.1f %.1f"' % [rand + 0.1, rand + 0.1, rand + 0.1])
    )
  end
end

# Map a filename into a branch name part and element name part.
# Branchname may contain '/''s.
# Element name starts with a '/' or is empty.
#
# SVN does not contain branches, but only contains one large tree with
# objects of (potentially different) versions. By only looking at a
# subtree, a branch is emulated. However, we have to know where those
# subtrees are rooted to be able to convert them to branches.
def path_to_branch(filename)
  if filename =~ $branch_dirs
    # We matched a branch-name.
    branch, elem = $&, filename[$&.size..-1]
    return nil if branch =~ $not_branch_dirs
    return [branch[1..-1], elem] # Remove leading '/' from branch name
  else
    # Outside the branches (e.g. a tag). Skip this.
    return nil
  end
end

# Revision of a branch; part of a revision.
# Forms a directed acyclic graph with other BranchRevisions
class BranchRevision
  attr_accessor :must_add_implicit_dep	# Boolean
  attr_accessor :depends_on	# Set of BranchRevision: parents
  attr_accessor :dependers      # Set of BranchRevision: children
  attr_reader :branch		# String: branchname
  attr_accessor :empty		# Bool: true if no reason to keep this rev
  attr_accessor :deleted	# Bool: true if deleted at end
  attr_accessor :commit_sha1	# String. git's SHA1 commit hash.

  # If true, implicitly adds an dependancy on the previous
  # version of this branch. This the the default.
  # However, when we detect that this branch consists of totally new files
  # (for example, when copying /trunk to /branches/branch_name) we set
  # this to false.
  # @must_add_implicit_dep  # Boolean

  def initialize(revision, branch)
    @revision = revision	# Revision
    @branch = branch
    @must_add_implicit_dep = true
    @depends_on = Set.new
    @dependers = Set.new
    @empty = true
    @deleted = false
    @commit_sha1 = nil
  end

  # Compare on revision number
  def <=>(other); self.nr <=> other.nr; end

  def to_s; "#{@branch}:#{nr}"; end

  def nr; @revision.nr; end	# Fixnum: revision nr

  # Returns true if this BranchRevision is a root revision since:
  #   - it does not depend on another BranchRevision
  #   - it contains changes _within_ the branch (== not empty)
  def root?; @depends_on.empty? && !@empty; end

  def add_depends_on(branch_rev)
    @depends_on << branch_rev
    branch_rev.dependers << self
  end

  def remove_depends_on(branch_rev)
    @depends_on.delete(branch_rev)
    branch_rev.dependers.delete(self)
  end

  def rev; @revision; end

  def other_branch_depends_on
    @depends_on.find_all { |b| b.branch != @branch }
  end

  # Returns all BranchRevisions we depend on in the same branch
  def same_branch_depends_on
    @depends_on.find_all { |b| b.branch == @branch }
  end

  # Returns branch revision on which we depend which is least number
  # of revisions back on the same branch. Since we go back, this is
  # the max. revision of the dependancies.
  def closest_same_branch_depends_on
    same_branch_depends_on.max
  end

  # Returns branch revision which depends on us which is least number
  # of revisions forward on the same branch.
  def closest_same_branch_depender
    @dependers.find_all { |b| b.branch == @branch }.min
  end

  # A dependancy is unneeded when:
  #   - the dependancy belongs to the same branch and the same revs
  #     can be reached by removed this dependancy, or
  #   - the dependancy belongs to another branch on which a rev in
  #     the same branch above us is already depending
  def remove_unneeded_depends_on
    return if @depends_on.size <= 1 # Optimisation

    # TODO: look farther back.

    # Same branch dependancy remover:
    # Follow the dependancy chain along the closest parent
    # till we reach the farthest dep.
    same_branch_deps = same_branch_depends_on()
    closest_same_branch_dep = same_branch_deps.max
    farthest_same_branch_nr = !same_branch_deps.empty? && same_branch_deps.min.nr
    current_dep = closest_same_branch_dep
    same_branch_deps_rec = Set.new
    while current_dep && current_dep.nr >= farthest_same_branch_nr
      same_branch_deps_rec.add(current_dep)
      current_dep = current_dep.closest_same_branch_depends_on
    end
    same_branch_deps.each { |red_rev|
      if red_rev != closest_same_branch_dep &&
	  same_branch_deps_rec.include?(red_rev)
#	puts "r#{nr}: Dep. on same branch r#{red_rev.nr} redundant" if $verbose
	remove_depends_on(red_rev)
      end
    }

    # Via-other-branch redundant dependancy remover:
    other_branch_depends_on.each { |other_dep|
      if !((other_dep.dependers & same_branch_deps_rec).empty?)
#	puts "r#{nr}: Dep. on other branch r#{other_dep.nr} redundant" if $verbose
	remove_depends_on(other_dep)
      end
    }
  end

  # Signal the fact that the last action has been added to this revision.
  # Now the dependancies can be calculated.
  def last_action_added
    if @must_add_implicit_dep
      prev_branch_rev = Revision.find_revision_with_branch(nr - 1, @branch)
      add_depends_on(prev_branch_rev) if prev_branch_rev
    end
    @depends_on.dup.each { |br|
      # We might depend on empty branch revisions. Since empty branch will not
      # be checked out, copy dependancies from those branches.
      if br.empty
	br.depends_on.each { |parent_depends_on|
	  add_depends_on(parent_depends_on)
	}
	remove_depends_on(br)
      end
    }
    remove_unneeded_depends_on
  end

  # Internal dotty label.
  def label; "_#{@branch}_#{nr}".delete("^0-9a-zA-Z_"); end

  def to_dotty
    return "\t/* r#{nr} is empty */\n" if @empty
    res = "\t#{label}[label=\"#{File.basename(@branch)} " +
      "#{nr}\\n#{rev.author}\",color=#{BranchColor.color(@branch)}];"
    @depends_on.each { |d|
      res += " #{d.label} -> #{label}"
      res += "[style=dashed]" if @branch != d.branch # A merge or branch
      res += ";"
    }
    res + "\n"
  end
end

# One revision in SVN. Contains zero or more BranchRevisions.
class Revision
  attr_reader :nr		# Fixnum
  attr_accessor :author		# String
  attr_accessor :msg		# String
  attr_accessor :time		# String
  attr_accessor :branches	# Hash: String rootdir to BranchRevision

  @@all_revs = []		# Array: all revisions, indexed by nr

  def self.get_all_revs; @@all_revs; end

  def self.set_all_revs(new_all_revs); @@all_revs = new_all_revs; end

  # Returns first revision number we're interested in.
  def self.get_next_log_nr
    @@all_revs[-1] ? @@all_revs[-1].nr + 1 : $start_revision
  end

  def self.[](nr); @@all_revs[nr]; end
  # Iterate sorted over each revision
  def self.each; @@all_revs.each { |rev| yield(rev) if rev }; end

  # Search for most recent BranchRevision which changed given branch.
  def self.find_revision_with_branch(start_nr, branch)
    start_nr.downto($start_revision) { |nr|
      rev = @@all_revs[nr]
      return rev.branches[branch] if rev && rev.branches.has_key?(branch)
    }
    warning(
      "Could not find branch #{branch.inspect} from revision #{start_nr} back"
    ) if start_nr >= $start_revision
    return nil
  end

  def initialize(nr)
    @nr = nr
    @branches = {}
    # Store new revision in global revision array
    @@all_revs[nr] = self
  end

  # Add an action to this revision
  #   'copyto_path' is the destination being changed. In all cases,
  #   this can be a file or directory.
  #   action == :R  : Replace. copyfrom_* always filled in.
  #   action == :M  : Modify
  #   action == :D  : Delete
  #   action == :A  : Create as new or copy from other revision
  #                   If from other revision, copyfrom_* are filled in.
  $branch_paths = Set.new
  def add_action(action, copyto_path, copyfrom_path, copyfrom_rev)
    branch, elem = *path_to_branch(copyto_path)
    return if !branch		# Action is outside a branch

    if $verbose && !$branch_paths.include?(branch)
      puts "r#{nr}: New branch: #{branch}"
      $branch_paths.add(branch)
    end

    branch_rev = (@branches[branch] ||= BranchRevision.new(self, branch))

    from_branch = from_elem = nil
    if copyfrom_path
      from_branch, from_elem = *path_to_branch(copyfrom_path)
      if !from_branch
	# Not a branch. Must be a tag.
	warning(
          "Ignoring dependancy r#{@nr} on a non-branch: " + 
          "#{copyfrom_path.inspect}:#{copyfrom_rev}"
        )
      else
	from_branch_rev = 
	  Revision.find_revision_with_branch(copyfrom_rev, from_branch)
	branch_rev.add_depends_on(from_branch_rev) if from_branch_rev
      end
    end

    # We cannot checkout deleted branches, so record the fact that
    # it is deleted at the end of the revision.
    branch_rev.deleted = true if elem == "" && action == :D

    # If something changes the root of the branch (deleted, copied
    # from other branch, ...) we break the implicit dependancy chain.
    # However, _modifying_ the root directory (adding files, removing files)
    # does not remove the dependancy.
    branch_rev.must_add_implicit_dep = false if elem == "" && action != :M

    # Check whether this action results in a revision. For example,
    # deleting the root of the branch or creating the root dir does not
    # change anything _within_ the branch.
    branch_rev.empty = false if elem != "" # Real files. Not empty.
  end

  # Signal the fact that the last action has been added to this revision.
  # Now the dependancies can be calculated.
  def last_action_added
    @branches.each_value { |b| b.last_action_added }
  end

  def to_dotty
    @branches.values.collect { |b| b.to_dotty }.join("")
  end
end

# Parse output of 'svn log --xml --verbose'
def parse_svn_log_xml(svnlog_filename)
  $stderr.puts "Branch analysis on svn log files..." if $verbose
  last_rev_nr = nil
  rev = nil			# Current Revision
  File.open(svnlog_filename, "rb") { |io|
    while line = io.gets
      case line
      when %r{^   revision="(\d+)"}
	rev_nr = $1.to_i
	$stderr.print "r#{rev_nr}" + ("\010" * 10)
	# Only create a new revision when not already loaded from
	# state file.
	if rev_nr > $start_revision && Revision[rev_nr] == nil
	  rev = Revision.new(rev_nr)
	  if last_rev_nr && (rev_nr != last_rev_nr + 1)
	    # Missing revisions are a sign of not getting the log of the
	    # whole archive, but only one of its branches (or subprojects).
	    # Which could be valid if we're after a subproject, but not
	    # if we want to import the whole repository.
	    warning("Missing reversion(s) #{last_rev_nr + 1}-#{rev_nr - 1}")
	  end
	  last_rev_nr = rev_nr
	end
      when %r{^\<path}: copyfrom_path = nil; copyfrom_rev = nil
      when %r{^   copyfrom-path="(.*)"}: copyfrom_path = $1
      when %r{^   copyfrom-rev="(\d+)"}: copyfrom_rev = $1.to_i
      when %r{^   action="([A-Z])"\>(.*)\</path\>}
	rev.add_action($1.to_sym, $2, copyfrom_path, copyfrom_rev) if rev
      when %r{^\</paths\>}
	# End of this revision.
	next if !rev
	rev.last_action_added
	root_branches = rev.branches.values.find_all { |branch| branch.root? }
	root_branches.each { |bi|
	  puts "r#{rev.nr} is root of #{bi.branch}"
	}
	changed_branches = rev.branches.values.find_all { |b| !b.empty }
	if changed_branches.size > 1
	  changed_branches_names = changed_branches.collect { |b| b.branch }
	  warning(
            "More than one branch changed in r#{rev.nr}..?",
            changed_branches_names
          )
	end
      when %r{^\<author\>(.*)\</author\>}
	rev.author = $1 if rev
      when %r{^\<msg\>(.*)}
	msg = $1.rstrip
	while !msg.gsub!("\</msg\>", "")
	  msg += "\n" + io.gets.rstrip
	end
	msg = CGI::unescapeHTML(msg.strip + "\n") # & -> & etc.
	rev.msg = msg if rev
      when %r{^\<date\>(.*)\</date\>}
	rev.time = Time.parse($1) if rev
      end
    end
  }
end

$state_version = "20060409"

# Load state of git-svnconvert. Returns true in case of succes.
def load_state
  begin
    state = File.open($state, "rb") { |io| Marshal.load(io) }
    version = state.shift
    if version != $state_version
      warning(
        "Ignoring previous state: has different version (#{version}) " +
        "than this git-svnconvert version (#{$state_version})"
      )
    else
      $repo_url, all_new_revs = *state
      Revision.set_all_revs(all_new_revs)
      return true
    end
  rescue Errno::ENOENT
  end
  return false
end

def save_state
  start_time = Time.now
  File.open($state + ".new", "wb") { |io|
    io.write(Marshal.dump([$state_version, $repo_url, Revision.get_all_revs]))
  }
  # Rename atomically new state into _the_ state
  File.rename($state + ".new", $state)
  $stderr.puts "Saved new state in %.1fs" % (Time.now - start_time) if $verbose
end

# Export current graph of revisions to a GraphViz dotty file
def export_to_dot(dot_file)
  $stderr.puts "Writing branch graph to \"#{dot_file}\"..."
  File.open(dot_file, "wb") { |io|
    io.puts "strict digraph svn {"
    io.puts "\tnode[shape=box,style=filled];\n\n"
    Revision.each { |rev|
      next if !rev || (rev.nr < $start_revision)
      io.print rev.to_dotty
    }
    io.puts "}"
  }
end

# Runs an svn command, and run git-update-index on each file
# echoed to stdout by svn.
def svn_cmd_with_update_index(svn_cmd)
  safe_popen(svn_cmd, "r") { |svn_io|
    safe_popen("git-update-index --add --remove --stdin", "w") { |git_io|
      git_io.sync = true
      # Have a buffer of one line: svn echoes the file it is
      # working on. When svn echoes the next line, we then know
      # it finished the previous file, and can therefore be added
      # to git now.
      last_file = nil
      while line = svn_io.gets
	$stderr.puts line if $verbose
	if line =~ %r{^[A-Z][A-Z ]{3} (.*)$}
	  git_io.puts last_file if last_file 
	  last_file = $1
	end
      end
      git_io.puts last_file if last_file 
    }
  }
end

# Unknown contents in directory; recreate index
def git_update_index
  # Remove incomplete index
  git_index_file = ENV['GIT_INDEX_FILE']
  [git_index_file, git_index_file + ".lock"].each { |f|
    File.delete(f) if File.exist?(f)
  }
  # Add all files
  to_prune = Set.new([".git", ".svn"]) # Directories not to add
  safe_popen("git-update-index --add --remove --stdin") { |io|
    elems = 0
    Find.find(".") { |elem|
      Find.prune if to_prune.include?(File.basename(elem))
      if elem != "."
	elem = elem[2..-1]  # Remove ./
	io.puts elem
	elems += 1
      end
    }
    puts "Dir contained #{elems} elements" if $verbose
  }
end

######################################## Main

# Default values for options
$verbose = true
$username = nil			# Repository username
$password = nil			# Repository password
$start_revision = 1
dot_filename = nil
svn_authors_file = nil		# Mapping from 
$run_dotty = false

# Parse options
$opts = OptionParser.new
$opts.banner = %q{Convertor for a subversion archive into a git archive.

git-svnconvert [options] URL[@REV] [DIR]
  ..to convert complete SVN archive at URL into directory DIR.
  The URL must point to the root directory of the repo, so not f.e. to /trunk!
  By default, DIR will be the basename of the URL.
  Directory DIR will be created if it does not exist.
  Otherwise, DIR must already be a git repository.
git-svnconvert [options]
  ..to incrementally convert newly added revision and add them
    to the git repo in the current directory.

Examples:

git-svnconvert svn://svn.0pointer.net/fusedav .
  ..checks out svn://svn.0pointer.net/fusedav into current dir.
git-svnconvert svn://svn.0pointer.net/fusedav@10
  ..checks out starting at revision 10 into directory fusedav.
git-svnconvert
  ..updates already imported svn archive in current dir.

}
$opts.on("Options (defaults between []'s):")
$opts.on("--verbose", "-v", "Toggle verbose mode [#{$verbose}]") {
  $verbose = !$verbose
}
$opts.on("--help", "-h", "This usage") { puts $opts; exit 1 }
$opts.on("--revision REV", "-s", "Start revision [#{$start_revision}]") {
  |$start_revision|
}
$opts.on("--dot FILENAME", "-d", "Export branch tree as .dot file.") {
  |dot_filename|
}
$opts.on("--dotty", "-D", "Export branch tree and run dotty on it") {
  $run_dotty = true
}
$opts.on("--authors FILENAME", "-A", "Filename of svnauthors file to add") {
  |svn_authors_file|
}
$opts.on("\nOptions passed to 'svn':");
$opts.on("--username NAME", "-u", "Specify a username for SVN repo") {
  |$username|
}
$opts.on("--password PWD", "-p", "Specify a password for SVN repo") {
  |$password|
}

begin
  $opts.parse!(ARGV)
rescue OptionParser::InvalidOption
  die($!.to_s, $opts.to_s)
end

$repo_url = (ARGV.size > 0) ? ARGV.shift.dup : nil
# Parse @REV part of URL at REV if given...
$start_revision = $1.to_i if $repo_url && $repo_url.gsub!(%r{@(\d+)$}, "")
dest_dir = (ARGV.size > 0) ? ARGV.shift : nil
dest_dir = File.basename($repo_url) if $repo_url && !dest_dir
dest_dir ||= Dir.pwd	# No URL and no destdir given: use current dir
dest_dir = File.expand_path(dest_dir)
git_dir = ENV["GIT_DIR"] || "#{dest_dir}/.git"
svnconvert_dir = "#{git_dir}/svnconvert"
# Name of state file with all parsed log files and connection to git archive.
$state = "#{svnconvert_dir}/state"

if !File.exist?($state) && !$repo_url
  puts $opts.to_s
  puts "\n!!! Need an URL to checkout or a previously git-svnconvert'ed dir"
  exit 1
end

# Maping from SVN author name to full name and email
default_svn_authors_file = "#{git_dir}/svn-authors"
users = File.readable?(default_svn_authors_file) ?
  read_svn_authors(default_svn_authors_file) : {}
# If authors file explicitly given, add
if svn_authors_file
  users_to_add = read_svn_authors(svn_authors_file)
  if users_to_add.size > 0
    users.merge!(users_to_add) 
    write_svn_authors(users, default_svn_authors_file)
  end
end

FileUtils.mkdir_p(svnconvert_dir)	# Create destination if not existing
Dir.chdir(dest_dir)

$branch_dirs = root_dirs_to_regexp($branch_dirs, true)
$not_branch_dirs = root_dirs_to_regexp($not_branch_dirs, false)

# Load complete state, or failing that, same empty state with new repo URL
load_state || save_state
next_revision_nr = Revision.get_next_log_nr
if next_revision_nr < svn_get_current_revision
  svnlog = "#{svnconvert_dir}/svnlog.xml"
  puts "Retrieving revision log from #{next_revision_nr} and upwards..."
  cmd = "svn log #{svn_common_args} -r #{next_revision_nr}:HEAD --xml --verbose >#{svnlog}"
  safe_system(cmd)
  parse_svn_log_xml(svnlog)
  save_state
else
  puts "No new revision to get log of."
end

# The graph eases debugging, so export it always...
default_dot_filename = "#{svnconvert_dir}/branches.dot"
export_to_dot(dot_filename || default_dot_filename)
if $run_dotty
  safe_system("dotty #{Shell.escape(dot_filename || default_dot_filename)}")
end
exit 0 if dot_filename || $run_dotty

# Directory of all branches checked out:
svn_co_dir = "#{svnconvert_dir}/checkedout" 
ENV['GIT_DIR'] = git_dir

head_existed = File.exist?("#{git_dir}/HEAD")
safe_system("git-init-db")
if !head_existed
  $stderr.puts "Creating HEAD pointing to trunk"
  File.open("#{git_dir}/HEAD", "wb") { |io|
    io.puts "ref: refs/heads/trunk"
  }
end
Revision.each { |rev|
  next if rev.nr < $start_revision

  rev_commit_sha1s = []
  branches = rev.branches.values
  branches.each { |br|
    branch_dir = "#{svn_co_dir}/#{br.branch}"
    if br.commit_sha1
      # Skip if already imported
    elsif br.deleted
      # Branch contains only an 'delete this branch' command.
      $stderr.puts "r#{rev.nr}: #{br.branch} deleting #{branch_dir}"
#      FileUtils.rm_rf(branch_dir)
    elsif br.empty
      $stderr.puts "r#{rev.nr}: #{br.branch} is empty; skipping..."
    else
      # Hide the git index inside .svn dir.
      git_index_file = ENV['GIT_INDEX_FILE'] = "#{branch_dir}/.svn/index.git"
      if !File.directory?(branch_dir)
        # Start of new branch. Make (empty) directory to start new branch in
        FileUtils.mkdir_p(branch_dir)
	FileUtils.mkdir_p(File.dirname(ENV['GIT_INDEX_FILE']))
	Dir.chdir(branch_dir)
        puts "Checking out new branch #{br.branch}:#{rev.nr} in #{Dir.pwd}"
        cmd = "svn checkout -r #{rev.nr} #{svn_common_args(br.branch)} ."
        svn_cmd_with_update_index(cmd)
      else
	# Branched already checked out, update to new revision.
	Dir.chdir(branch_dir)
	svn_cmd = "svn update -r #{rev.nr} ."
	if File.exist?(git_index_file + ".lock")
	  $stderr.puts "Git index file already locked. Removing lock and recreating index."
	  safe_system(svn_cmd)
	  git_update_index
	else
	  puts "Updating branch to new revision: #{br.branch}:#{rev.nr}"
	  svn_cmd_with_update_index(svn_cmd)
	end
      end

      # git index is now up-to-date. Write the tree.
      tree_sha1 = `git-write-tree`.chomp

      # Now write this tree into an commit
      author = rev.author
      email = "unknown"
      author, email = users[author] if users[author]
      $stderr.puts "r#{rev.nr}: Author: #{author} <#{email}>" if $verbose
      ENV['GIT_AUTHOR_NAME'] = ENV['GIT_COMMITTER_NAME'] = author
      ENV['GIT_AUTHOR_EMAIL'] = ENV['GIT_COMMITTER_EMAIL'] = email
      ENV['GIT_AUTHOR_DATE'] = ENV['GIT_COMMITTER_DATE'] =
	rev.time.strftime("+0000 %Y-%m-%d %H:%M:%S") 
      parents = br.depends_on.find_all { |p|
	# Filter out all non-converted parents
	raise if !p.commit_sha1 && p.nr >= $start_revision
	p.commit_sha1 != nil
      }.collect { |p| "-p #{p.commit_sha1}" }.join(" ")
      puts "Committing #{tree_sha1}..."
      cmd = "git-commit-tree #{tree_sha1} #{parents}"
      puts cmd if $verbose
      safe_popen(cmd) { |io|
        io.puts rev.msg.strip
        io.close_write
        br.commit_sha1 = io.read.chomp
        rev_commit_sha1s << br.commit_sha1    
      }
      puts "#{br.branch}:#{rev.nr} has id #{br.commit_sha1}" if $verbose

      Dir.chdir(dest_dir)
      # Tag current branch + revision.
#    safe_system("git-tag -f #{br.branch}/r#{rev.nr} #{br.commit_sha1}")
      # Update HEAD of this branch
      safe_system("git-update-ref refs/heads/#{br.branch} #{br.commit_sha1}")
      puts
    end
  }
  if rev_commit_sha1s.size == 1
    # '-f' added to make this script restartable (state might not get saved)
    safe_system("git-tag -f r#{rev.nr} #{rev_commit_sha1s}")
  end
  save_state if !rev_commit_sha1s.empty?
}
puts "Done."


More information about the git-announce mailing list