[ANNOUNCE] GitStats development finished (WRT GSoC)
Sverre Rabbelier
alturin at gmail.com
Wed Aug 31 22:00:18 BST 2011
Heya,
I leave to the USA this Friday, and because of US tax laws I would
have to pay an insane amount of tax for just 2 weeks of development. I
have agreed with my mentor a while a go that I would finish up GSoC
before I left, that way avoiding having to pay US taxes. (Over here in
the Netherlands I don't have to pay tax over the $4500 because I
didn't make >6000 euro this year, the joys of being a student.) As a
result it is quite clear what GitStats will look like at the end of my
GSoC. I am going to continue working on it though, I am especially
interested in getting the '--follow' part of 'git log' working in such
a way that it can be incorporated into GitStats. As such, here is a
summary of what GitStat is at the moment. From the documentation:
$ cat gitstats-*
syntax: stats.py author <options>
The purpose of the author module is to gather statistics
about authors, and to aggregate this information to provide
information about all the authors of a repo.
Currently the available metrics in the author module are
the following:
* Determine how many changes an author made (making a
distinction between lines added and lines deleted), and
record this per author, for all files in the repository.
It is possible to get this data for one specific author,
(although data for all authors will still be gathered,)
because the result is stored per author.
* Determine how many commits an author made that affected
a specific file. This metric is less granular, but a lot
faster. To retreive the more granular result, one could
simply iterate over the result of the above metric for
each author, and take only the data for the file one is
interested in.
* Aggregate the information from the first metric, adding
up the statistics from each author. This provides a more
general 'file activity' metric and is a nice example of
how an existing metric can be modified to do something
seemingly unrelated.
This module does not define any auxillery functions.
syntax: stats.py branch <options>
The purpose of the branch module is to gather statistics
about branches, or related to branches.
Currently the available metrics in the branch module are
the following:
* Which of the branch head does a commit belong to
What this metric does is walk down the ancestry of the
target commit and increase the 'dilution' by one or each
merge it finds. The exception applies that when following
the 'primary' parent of the merge (e.g., the branch
recorded as the one the merge commit was made on), the
dilution is not increased. As such, if the target commit
is made on a branch, and then later on it's dilution
is calculated, it will have 0 dilution for that branch.
The branch with the lowest dilution is deemed to be the
branch that the commit belongs to most.
Note: In git there is no 'main' branch, as such any and
all branches branches that 'branched off' after the
target commit will also have dilution 0.
It also defines the following auxillery functions:
* Retreive the name of a commit if it is a the head of a
branch. This can be seen as the reverse of 'rev-parse'
for branch heads. This is used internally to provide the
user with a sensible name when telling them which branch
a commit belongs to.
* List all the branches that contain a specific commit,
optionally searching through remote branches as well as
optionally not filtering at all. This is used internally
to not search through branches that do not contain the
target commit.
syntax: stats.py bug <options>
The purpose of the bug module is to gather statistics on
bugfixes within the content, and to aggregate this
information to provide with a report of the last N commits.
Currently the available metrics in the bug module are the
following:
* Determine whether a specific commit is a bugfix based on
other metrics. When one of the metrics is 'positive',
that is, it's return value indicates that the examined
commit is a bugfix, the 'bugfix rating' is increased by
a pre-configured amount. This amount can be specified per
metric, and can be set to '0' to ignore it.
* Aggregate the above metric over the past N commits. Also,
when running the above metric on more than one commit,
cache the result of calls to the git binary so that the
execution time is reduced. This means that the execution
time is not directly proportional to the size of the
repository. (Instead, there is a fixed 'start up' cost,
after which there is a 'per commit' cost, which is
relatively low.)
This module does not define any auxillery functions.
syntax: stats.py commit <options>
The purpose of the commit module is to gather statistics
about commits, or related to commits, with the exception
of things related to diffs (only retrieving the raw diff is
in this module).
Currently the available metrics in the author module are
the following:
* Find all the commits that touched the same paths as the
specified commit. This is implemented by passing the
result of the 'paths touched' auxillary function to the
'commits that touched' auxillary function. See below.
* Retrieve the diff of a commit, either with or without
context, and optionally ignoring whitespace changes. This
method also works for the root commit (by making use of
the '--root' option to 'diff tree'.
* Show only commits of which the commit message, and/or the
commit diff, match a specific regexp. This is a simple
reimplementation of 'git log -S' and 'git log --grep'. It
is preferable to instead use the options native to
'git log', than to use this slower version.
Note: the regexps for the 'commit message' and the
'commit diff' may be different.
It also defines the following auxillary functions:
* Print a commit in a 'readable' way, this is by default
'git log -1 --name-only'. By the way of an environmental
variable ('GIT_STATS_PRETTY_PRINT'), it can be made to
instead use 'git log -1 --prety=oneline' by setting the
variable to 'online'.
* Retrieve all the paths that a specific commit touches,
this is used internally to limit the commits that have
to be searched when looking for a merge. That is done by
passing the output of this method to the one described
below.
* Find all commits that touched a list of files, optionally
treating the paths as relative to the current working
directory.
syntax: stats.py diff <options>
The purpose of the diff module is to gather statistics
about diffs, or related to diffs.
Currently the available metrics in the diff module are the
following:
* Determine whether two commit diffs are equal, optionally
checking whether they are reverts instead. It is also
possible to just look at what lines were changed (and
ignore the actual changes).
* Find all commits that are reverted by the specified
commit by first retrieving the touched files, and then
examining all the commits that that touch the same files.
It also defines the following auxillery functions:
* Parse a raw commit diff and store it on a hunk-by-hunk
basis so that later on it can be examined more carefully
by other tools. Line numbers are optionally included so
that one can use those. For example, by comparing all the
added hunks with the deleted hunks of a second commit,
and vise versa, one can check for (partial) reverts.
syntax: stats.py index <options>
The purpose of the index module is to gather statistics
about the index, or related to the index.
Currently the available metrics in the index module are the
following:
* List all the commits that touch the same files as the
staged files. This can be useful to find out which commit
introduced the bug fixed in this commit (by piping the
output of this method to one that looks at which lines
were touched for example).
It also defines the following auxillery functions:
* Get all the staged changes (optionally ignoring newly
added files). This is used internally to find all the
commits that touched the same files as those that are
currently staged.
syntax: stats.py matcher <options>
The purpose of the matcher module is to compare hunks
within one diff to one another, and determine whether there
is any code being moved around.
Currently the available metrics in the index module are the
following:
* Try to find a match between the hunks in one diff, so
that code moves can be detected. This makes use of the
'diff size calculation' described below.
It also defines the following auxillery functions:
* Calculate the size of a diff, only counting the amount
of lines added, and the amount of lines deleted. This can
be used to determine a best 'interdiff' (the shortest one
is the best one), when searching for two hunks that are
moved around.
syntax: stats.py <subcommand> <arguments>
Available commands:
author Activity for one author, file, or project
branch In how far a commit belongs to a branch
bug Determine whether a commitis a bugfix
commit Basic functionality already present in git
diff Compare two diffs and find reverts
index Find which commits touched the staged files
matcher Try to match hunks in a diff to find moves
test Run the unittests for GitStats
The stats.py module is the main entry point of GitStats,
it dispatches to the commands listed above. When no
arguments are passed, it automatically runs the command
with '--help' so that a usage message is shown for that
command.
Each of the modules it uses as subcommands defines a
'dispatch' function that is called with the users arguments
(with the exception of the first, which is the name of the
command executed). If anything should be returned to the
system, the dispatch method should return this value.
To run properly it requires the git_stats package to be
a subdirectory of the directory it resides in. That is,
your directory tree should be something like this:
.
|-- git_stats
| `-- <listing of all installed modules>
|-- scripts
| `-- <listing of all installed scripts>
|-- t
| `-- <listing of all installed regression tests>
`-- stats.py
syntax: stats.py tests <options>
The purpose of the tests module is to make available the
unittests for GitStats to external programs. It may be
used to run the unittests for GitStats from for example
a shell script. It's output is made to match the git
regression test suite.
--
Cheers,
Sverre Rabbelier
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the git-announce
mailing list