[ANNOUNCE] ClearCase UCM -> git converter
Rutger Nijlunsing
rutger at nospam.com
Wed Aug 31 21:59:48 BST 2011
Hi,
Find attached a ClearCase UCM -> git converter.
While UCM does have branches and so on, this importer imports one
stream. Since UCM is somewhat like an expensive CVS, it is quite
difficult to retrieve more information from UCM in a useful way: the
concept are just too poluted.
Features:
- Author files support to map ClearCase usernames to git user names
- Incrementally import one ClearCase UCM stream into git.
Just rerun the script when new baselines were created.
- Handles composite baselines
- Auto-detection of VOB, stream, rules to load, .. as much
as possible.
- Auto repack after a lot of objects have been generated
- Should be able to pick up where it left after an interruption.
HOWEVER, it is Windows only currently since I don't have a cleartool
unix client. It should not be too hard to convert though, since it
uses the normal 'cleartool' command line client to retrieve all
information.
Usage:
Just have your stream-to-be-converted running at, say, m:\mystream_int
(most likely an integration stream), and say something like:
git-ucmimport m:\mystream_int c:\mystream_git
...and hopefully it should detect all configuration items and start
converting.
Can also be found at http://www.wingding.demon.nl .
Rutger.
--
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------
-------------- next part --------------
#!/usr/bin/env ruby
# ClearCase UCM -> git importer.
#
# Imports the baselines of _one_ stream into a Git repository.
#
# Features:
# - Author files support to map ClearCase usernames to git user names
# - Incrementally import one ClearCase UCM stream into git.
# Just rerun the script when new baselines were created.
# - Handles composite baselines
# - Auto-detection of VOB, stream, rules to load, .. as much
# as possible.
# - Auto repack after a lot of objects have been generated
# - Should be able to pick up where it left after an interrruption.
#
# Assumptions:
# - Works on Windows for me(TM). Not tested on Unix.
#
# Discussion:
# Importing the whole tree of streams (branches) of UCM into git is
# next to impossible. UCM has the concepts of 'activities' which
# contain versions of changed files, and a set of activities is
# delivered as a changeset. So far so good. However, activities may be
# reused (even over changeset boundary) and may be interleaved with
# each other. This makes it infeasible to map activities or changesets
# on git commits, which is a much nicer concept.
#
# 2006 Rutger Nijlunsing <git-ucmimport at tux.tmfweb.nl>
############################## Config documentation.
# Don't change those; just use the command line switches to set them.
# Section below is mainly for documentation purposes. ;)
# ClearCase: Directory with existing Dynamic view on integration
# stream to import. Probably a subdirectory on M:\, or under /view
$int_view_dir = ""
# Location of snapshot view on substream of integration stream. We are
# going to rebase this stream to a newer version in int_view, and then
# check in the result.
$sub_view_dir = ""
# Baseline in parent stream to start importing at if the destination
# snapshot view does not exist.
# If empty, takes first baseline.
# Use 'cleartool lsbl' in stream directory to get a list.
first_baseline = ""
# Subdirectory of $sub_view_dir to generate git repository at. Only
# files in this directory or lower are added. This can be used when a
# common subdirectory is always added to the paths (which is common in
# ClearCase). If unsure, leave empty.
$sub_root = ""
# Filename with author mapping from ClearCase to Git. Leave empty ("")
# to have no mapping and use the ClearCase names.
# Format of file is lines containing:
# 'cc_author_name=Real Name <email at domain.com>'
$ucm_authors_file = ""
# Server Storage Location for the view storage directory.
# This is the same as can be found in the GUI:
# ClearCase Explorer -> Join Project -> Next -> Next -> Next ->
# Advanced Options -> [x] Use server storage location,
# and then the 'name' column of that table.
# Use empty string ("") for default.
stgloc = "ADVIEWS"
##############################
# Include required standard Ruby libraries
require 'find'
require 'fileutils'
require 'time'
require 'optparse'
require 'ostruct'
$VERBOSE = true
$verbose = true
def die(*txt); puts txt.flatten.join("\n"); exit 1; end
def read_ucm_authors(authors_filename)
users = {}
begin
IO.foreach(authors_filename) { |line|
if line =~ %r{^(\S+?)\s*=\s*(.+?)\s*<(.+)>\s*$}
user, name, email = $1, $2, $3
users[user] = [name, email]
end
}
rescue Errno::ENOENT, Errno::EACCES
die("Could not read #{authors_filename}: #{$!}")
end
if $verbose
puts "Read #{users.size} authors from #{authors_filename}"
end
return users
end
def write_ucm_authors(users, authors_filename)
begin
File.open(authors_filename, "wb") { |io|
users.keys.sort.each { |user|
io.puts "#{user} = #{users[user][0]} <#{users[user][1]}>"
}
}
if $verbose
puts "Wrote #{users.size} authors to #{authors_filename}"
end
rescue Errno::EACCES
die("Could not write #{authors_filename}: #{$!}")
end
end
def read_write_ucm_authors
# Maping from ClearCase author name to full name and email
default_ucm_authors_file = "#{$git_dir}/ucm-authors"
$users = File.readable?(default_ucm_authors_file) ?
read_ucm_authors(default_ucm_authors_file) : {}
# If authors file explicitly given, add
if $ucm_authors_file && $ucm_authors_file != ""
users_to_add = read_ucm_authors($ucm_authors_file)
if users_to_add.size > 0
$users.merge!(users_to_add)
write_ucm_authors($users, default_ucm_authors_file)
end
end
end
module Shell
# Escape string string so that it is parsed to the string itself
# Compare to Regexp.escape .
def self.escape(string)
(string !~ %r{[ "]}i && string != "") ?
string : '"' + string.gsub(%r{(["])}i, '\\\\\1') + '"'
end if not defined? self.escape
end
# Run 'cleartool' and filter the output.
def system_filter_output(cmd, may_fail)
no_merge_required = false
hijacked = []
p cmd
File.popen("#{cmd} 2>&1", "rb") { |io|
while line = io.gets
line = line.rstrip
next if line =~ %r{^[.]*$}
next if line =~ %r{^End dir }
next if line =~ %r{^Processing dir }
next if line =~ %r{^Done loading }
next if line =~ %r{^Log has been written to }
next if line =~ %r{^Making dir } # Only interested in files...
line = line.
sub(%r{^Loading }, "+ ").
sub(%r{^Unloaded }, "- ")
no_merge_required = true if line =~ %r{^No versions require merging}
if line =~ %r{^Keeping hijacked object "(.*)"}
hijacked << $1
else
raise if line =~ %r{hijacked}
end
puts line
end
}
# Some hijacked files might be left. Update them manually
hijacked.each { |f|
puts "Unhijacking #{f}"
safe_system("cleartool update -overwrite #{Shell.escape(f)}")
}
ok = no_merge_required
if !ok && !may_fail
die(
"!!! View at #{Dir.pwd} is not clean.",
"!!! No checkout out files, hijacked files etc. may be present.",
"!!! Restore the view (undo hijacking, undo checkouts etc.) and retry."
)
end
return ok
end
def skip(*cmd); puts "Skipping #{cmd.inspect}"; end
# Run a command with its arguments.
# When cmd is an array, a boolean indicates whether the next argument should
# be emitted or not.
def safe_system(*cmd)
emit = true
cmdline = [cmd].flatten.collect { |arg|
if (arg == true) || (arg == false)
emit = arg
arg = nil
else
arg = nil if !emit
emit = true
end
arg.to_s
}.join(" ")
puts cmdline if $verbose
system(cmdline)
if $? != 0
puts cmdline if !$verbose
puts "!!! Command returned non-zero exit code: #{$?}"
puts "!!! Working dir: #{Dir.pwd}"
exit $?
end
end
def safe_popen(cmd, mode = "w+", &callback)
puts "|" + cmd if $verbose
res = IO.popen(cmd, mode, &callback)
if $? != 0
puts cmd if !$verbose
puts "!!! Command returned non-zero exit code: #{$?}"
puts "!!! Working dir: #{Dir.pwd}"
exit $?
end
return res
end
# Returns all rules loaded in the current directory
def read_loaded_rules
rules = {}
File.popen("cleartool catcs", "rb") { |io|
while line = io.gets
rules[$1] = true if line =~ %r{^element "\[[0-9a-f]{32}=(.*)\].*\" }
end
}
rules = rules.keys
return rules
end
class Activity
attr_accessor :id, :author, :comment
def is_activity; @id !~ %r{^(deliver|rebase)\.}; end
def initialize(id, author, comment)
@id = id
@author = author
@comment = comment
end
def hash; @id.hash; end
def eql?(other); @id == other.id; end
end
# A baseline of a stream
class Baseline
attr_reader :name # Name of baseline
attr_reader :date # Date of creation of baseline
attr_reader :author # User who created the baseline
attr_reader :comment
attr_reader :children # For composite baselines
attr_reader :parent # For non-composite baselines
attr_accessor :component # String
def initialize(name, date, author, comment)
@name = name
@date = date
@author = author
@comment = comment
@children = []
@parent = nil
end
def set_parent(b); @parent = b; b.add_child(self); end
def add_child(baseline); @children << baseline; end
# Returns self and all children (recursively)
def children_recursive
res = [self]
children.each { |c| res += c.children_recursive }
return res
end
end
# Make sure .git exists and save the state
def git_init_db
Dir.chdir($git_root)
if !File.directory?($git_dir)
puts "\n=== Creating initial git repository at #{$git_root}"
safe_system("git init-db")
if !File.directory?($git_dir)
die("!!! Could not create git repo at #{$git_root}")
end
end
save_state
end
def git_commit(baseline, msg)
Dir.chdir($sub_view_dir)
puts "\n=== Removing ClearCase left-over files..."
Find.find(".") { |elem|
if elem =~ %r{\.(contrib|keep|renamed|unloaded|loading|mkelem)(\.\d+)?$} ||
elem =~ %r{\.(updt|stackdump)$}
puts "Removing #{elem}"
FileUtils.rm_rf(elem)
end
}
Dir.chdir($git_root)
puts "\n=== Adding files to git"
safe_system("git add .")
author = baseline.author
email = nil
author, email = $users[author] if $users[author]
author ||= "unknown"
email ||= "unknown"
ENV['GIT_AUTHOR_NAME'] = ENV['GIT_COMMITTER_NAME'] = author
ENV['GIT_AUTHOR_EMAIL'] = ENV['GIT_COMMITTER_EMAIL'] = email
date = baseline.date
if date
puts "Date: #{date}"
ENV['GIT_AUTHOR_DATE'] = ENV['GIT_COMMITTER_DATE'] =
date.strftime("+0000 %Y-%m-%d %H:%M:%S")
else
ENV.delete('GIT_AUTHOR_DATE')
ENV.delete('GIT_COMMITTER_DATE')
end
puts "\n=== Committing new version to git"
puts "Commit message:"
puts msg
# 'git commit' might fail if no data is added, so no safe_popen()
cmd = "git commit --no-verify -F - --all"
File.popen(cmd, "wb") { |io|
io.print msg
}
puts "\n=== Tagging baseline in git"
system("git tag -f #{baseline.name}")
end
$state_version = "20060706"
# Load state. Returns true in case of succes.
def load_state
begin
state = File.open($state, "rb") { |io| Marshal.load(io) }
if state.version != $state_version
die(
"!!! Previous state has different version (#{state.version}) ",
"!!! than this git-ucmimport version (#{$state_version})"
)
else
$int_view_dir = state.int_view_dir
$sub_view_dir = state.sub_view_dir
$sub_root = state.sub_root
return true
end
rescue Errno::ENOENT
end
return false
end
def save_state
File.open($state + ".new", "wb") { |io|
state = OpenStruct.new
state.version = $state_version
state.int_view_dir = $int_view_dir
state.sub_view_dir = $sub_view_dir
state.sub_root = $sub_root
io.write(Marshal.dump(state))
}
# Rename atomically new state into _the_ state
File.rename($state + ".new", $state)
end
######################################## Start of main
# Make sure Cygwin is in front of path. Otherwise we might fail
# if other Unix toolkits (like NutCracker) are simultanously installed.
ENV['PATH'] = "c:\\cygwin\\bin;" + ENV['PATH']
$state = ".git/git-ucmimport.state"
state_loaded = false
state_loaded ||= load_state if File.exists?($state)
puts "ClearCase UCM -> git importer"
puts "(c)2006 R. Nijlunsing <git-ucmimport at tux.tmfweb.nl>"
puts "License: LGPL."
puts
$opts = OptionParser.new
$opts.banner = %q{Usage:
git-ucmimport [dynamic view to import] [dir to export to]
For the initial conversion.
git-ucmimport
From the git repository to incrementally convert.
Example:
git-ucmimport --authors-file c:\\temp\\authors m:\\jparser_int c:\\jparser_git
...to import the stream of view 'jparser_int' which is started at
m:\\jparser_int into newly-generated stream, view and directory at
c:\\jparser_git . c:\\temp\\authors contains the usernames mapping.
c:\\jparser_git will both be a UCM view as well as an git repo, but will
be read-only! So to develop against the converted archive, use 'git clone'.
To convert the view incrementally, go to the root of the git repository
(which contains the '.git' directory), and call 'git-ucmimport' without
any parameters. In this example case that would be from c:\\jparser_git .
For more information regarding the options, read the top of this script.
}
$opts.on("--help", "-h", "This usage") { puts $opts; exit 1 }
$opts.on(
"--authors-file FILE", "-A",
"Add file with mapping from ClearCase user", "to git author"
) { |f| $ucm_authors_file = File.expand_path(f) }
$opts.on("\nOptions for first-time import:")
$opts.on(
"--stgloc LOCATION",
"Server Storage Location for the view",
"storage directory [chosen by UCM]"
) { |s| stgloc = s }
$opts.on(
"--first-baseline BASELINE",
"Start importing at given baseline", "[oldest baseline]"
) { |bl| first_baseline = bl }
$opts.on(
"--root DIRINREPO",
"Make given directory the root of the git", "repository"
) { |sr| $sub_root = sr }
$opts.on("")
begin
$opts.parse!(ARGV)
rescue OptionParser::InvalidOption
die("!!! Invalid option", $opts)
end
if !state_loaded
if ARGV.size != 2
puts $opts
die("!!! Need at least the import and export directory")
end
$int_view_dir, $sub_view_dir = ARGV
end
# Git: Location of git archive
$git_root = $sub_view_dir
$git_root += "/" + $sub_root if $sub_root && $sub_root != ""
$git_dir = "#{$git_root}/.git"
$state = "#{$git_dir}/git-ucmimport.state"
puts "UCM Source: Dynamic view at \"#{$int_view_dir}\""
puts "UCM Mirror repo: Snapshot view at \"#{$sub_view_dir}\""
puts "git repo: At \"#{$git_root}\""
puts "Start baseline: At #{first_baseline}" if first_baseline && first_baseline != ""
puts
# Check for existance of git in $PATH
safe_system("git version")
if !File.directory?($int_view_dir)
die(
"!!! Cannot find \"#{$int_view_dir}\",",
"!!! which should contain integration view directory.",
"!!! Did you start your view?"
)
end
puts "\n=== Retrieving all current baselines to import"
Dir.chdir($int_view_dir)
vob = nil # String: VOB
int_stream = nil # String: name of stream
baselines = {} # From name to Baseline
baseline = nil
#File.open("c:\\temp\\lsbl.memberof", "rb") { |io|
File.popen("cleartool lsbl -member_of", "rb") { |io|
while line = io.gets
line = line.rstrip
if line =~ %r{^([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+\"(.*)\"$}
# New baseline found
date, name, author, comment = $1, $2, $3, $4
$VERBOSE = false # Time.parse warns about year...
baseline = Baseline.new(name, Time.parse(date), author, comment)
$VERBOSE = true
$stderr.print "." if baselines.size % 10 == 0
baselines[name] = baseline
elsif line =~ %r{^ component: ([^ @]+)@([^ ]+)$}
baseline.component = $1
vob = $2
elsif line =~ %r{^ stream: ([^ @]+)@([^ ]+)$}
int_stream = $1
elsif line =~ %r{^ ([^ @]+)@([^ ]+) \(.*\)}
# 'member of:' line
baseline.set_parent(baselines[$1])
end
end
}
$stderr.puts
# We only need to keep the root of all composite baselines sorted on time.
baselines =
baselines.keys.
find_all { |n| baselines[n].parent.nil? }. # Only keep root baselines
collect { |n| baselines[n] }.
sort_by { |b| b.date } # Sort them on date
puts "Found #{baselines.size} (composite) baselines."
components = baselines.collect { |b| b.component }.uniq
if components.size != 1
die(
"!!! Not all baselines belong to the same component!",
"!!! Too afraid to continue.",
"!!! Components found: #{components.inspect}"
)
end
if !first_baseline || first_baseline == ""
first_baseline = baselines[0].name
end
first_baseline_idx = nil
baselines.each_with_index { |b, idx|
first_baseline_idx = idx if b.name == first_baseline
}
if !first_baseline_idx
die("!!! Could not find first_baseline with name \"#{first_baseline}\"")
end
# Create destination stream if needed
if !File.exist?($sub_view_dir)
puts "\n=== Reading all rules loaded in stream to import"
Dir.chdir($int_view_dir)
loaded_rules = read_loaded_rules
puts "Currently loaded rules: #{loaded_rules.inspect}"
sub_stream = File.basename($sub_view_dir)
puts "\n=== Creating read-only stream '#{sub_stream}'"
safe_system(
"cleartool mkstream",
"-in stream:#{int_stream}@#{vob}",
"-readonly",
"-baseline baseline:#{first_baseline}",
"#{sub_stream}@#{vob}"
)
Dir.chdir(File.dirname($sub_view_dir))
puts "\n=== Creating snapshot view at #{$sub_view_dir}"
safe_system(
"cleartool mkview",
"-snapshot -ptime",
"-tag #{sub_stream}",
"-stream #{sub_stream}@#{vob}",
stgloc && stgloc != "", "-stgloc #{stgloc}",
sub_stream
)
$sub_view_dir = Dir.pwd + "/" + sub_stream
puts "\n=== Load rules and update view"
Dir.chdir($sub_view_dir)
loaded_rules.each { |rule|
system_filter_output("cleartool update -overwrite -add_loadrules #{rule[1..-1]}", true)
}
git_init_db
read_write_ucm_authors
git_commit(baselines[first_baseline_idx], "Initial commit")
else
read_write_ucm_authors
puts "\n=== Updating view to be sure snapshot view matches UCM contents"
Dir.chdir($sub_view_dir)
system_filter_output("cleartool update -overwrite", true)
end
git_init_db
puts "\n=== Finding current baseline"
Dir.chdir($sub_view_dir)
cmd = "cleartool lsstream -long -obsolete"
p cmd
lsstream = `#{cmd}`
if lsstream !~ %r{^\s+foundation baselines:\s+([^@]+)@}m
puts lsstream
die("!!! Could not retrieve current baseline")
end
current_baseline = $1
puts "Current baseline: #{current_baseline}"
if current_baseline =~ %r{INITIAL$}
die(
"!!! Current baseline should be newer than INITIAL",
"!!! Please rebase #{$sub_view_dir} to a newer baseline."
)
end
current_baseline_idx = nil
baselines.each_with_index { |b, idx|
current_baseline_idx = idx if b.name == current_baseline
}
if !current_baseline_idx
die(
"!!! Did not find our current baseline in Integration Stream.",
"!!! Make sure that stream of view #{$sub_view_dir} is really",
"!!! a child of stream of view #{$int_view_dir}"
)
end
baselines_to_commit = baselines[current_baseline_idx + 1 .. -1]
if baselines_to_commit.empty?
puts "\nNothing to do, already up-to-date."
exit 0
end
puts "Got #{baselines_to_commit.size} baselines to convert to git commits"
puts "\n=== Abort previous rebase attempt (if any)."
Dir.chdir($sub_view_dir)
system_filter_output("cleartool rebase -cancel -force", true)
baselines_to_commit.each_with_index { |baseline, idx|
# If a lot of loose objects in Git, repack.
# Counts numbers of 'objects' directories, 256 max.
Dir.chdir($git_root)
if Dir[".git/objects/??"].size >= 253
puts "\n=== Repacking."
# To err is non-fatal, so no safe_system is needed.
system("git repack -d")
end
puts
puts "\n=== [#{idx + 1}/#{baselines_to_commit.size}]"
puts "=== Reading effect of rebasing of #{$sub_view_dir} to baseline #{baseline.name}"
Dir.chdir($sub_view_dir)
activities = []
# In the composite baseline, we get little information about the
# activities we bring in. Therefore, ask all contained baselines
# about the activities it contains.
baseline_names = baseline.
children_recursive.collect { |cbl| cbl.name }.join(",")
cmd = "cleartool rebase -preview -baseline #{baseline_names}"
safe_popen(cmd, "rb") { |io|
while line = io.gets
line = line.rstrip
if line =~ %r{^\tactivity:([^\t]*)\t([^\t]*)\t\"(.*)\"$}
id, author, comment = $1, $2, $3
activities << Activity.new(id, author, comment)
end
end
}
activities = activities.uniq
if activities.empty?
puts "? New baseline contains no new activities, skipping"
next
end
puts "Will retrieve activities:"
activities.each { |a|
puts " #{a.author} #{a.id}"
}
system_filter_output("cleartool rebase -qall -baseline #{baseline.name}", false)
subject =
activities.find_all { |a| a.is_activity }.
collect { |a| a.comment }.join(" / ") + "\n\n"
msg = subject +
activities.collect { |a| "#{a.comment}\nActivity: #{a.id}\n\n" }.join("")
git_commit(baseline, msg)
puts "\n=== Completing ClearCase UCM rebase."
system_filter_output("cleartool rebase -complete", true)
}
More information about the git-announce
mailing list