
Bracket internals

This document is intended for those who plan to work on the bracket
code base itself rather than just running it, or are otherwise curious
about how it works internally.


Overview

Here's an outline of what happens when running the "notify.py" script
from cron:

 - The trampoline script "cronjob.sh" sets up logging, and runs
   the Python script "bracket" with the argument "notify"

 - The "bracket" script recognizes the "notify" argument, loads
   the Python module "notify.py", and invokes its main function.

 - The local copy of the CVS repository is updated using rsync.  This
   is done in the shell script vc/xcvs/update-repo.sh, called via
   update_repository() in bracket.py and update_repo_module in
   vc/xcvs/__init__.py.  A log of what was updated by rsync is created
   in $db_dir/rsync.log; this will be used for incremental indexing
   (see below).

 - The repository is indexed.  This has two steps:

     - The repository is scanned to generate a complete list of every
       change ever made to the branch of interest, using the C++
       program "rcsdates", and sort it in chronological order.  This
       is all orchestrated by index-repo.sh, and the sorted list of
       commits is stored in $db_dir/commits.

       To speed this up, index-repo.sh may do it incrementally, by scanning
       only those files in the repository that were touched by the latest
       rsync run.  This makes use of the new_lines.py script, rsync.log,
       and a file $repo_root/rsync.log.offset.$branch to keep track of
       which parts of rsync.log have already been processed for the
       current branch.

     - The changes are grouped into "commit clusters" by cluster.py.
       Two adjacent changes are consider part of the same cluster if they
       were made less than 15 seconds apart and by the same committer.
       The full list of clusters is stored in $db_dir/dates.hex, as
       hexadecimal timestamps corresponding to time of the last commit
       in each cluster.

 - A NetBSD source tree is checked out from the local repository
   mirror.

 - The patches/ subdirectory is searched for patches that should be
   applied to this particular source date, and any such patches
   are applied.  See the "Patches" section in README for details.

 - A release is built using build.sh.

 - If the build is successful, the release is installed, booted,
   and tested using anita.

 - Notify.py runs report.py to generate HTML reports and rsyncs
   them to the web server.

 - If notify.py detects a new build break, it sends out an email
   notification.


Variable naming conventions

To understand the Python code, it is helpful to be aware of the
following naming conventions for variables:

  "t" or "ts"
      A timestamp, as an integer number of seconds since the epoch.

  "tsp"
      A timestamp pair, used to represent the time period between
      two commits.

  "c" or "cno"
      A commit cluster number.  This is an integer identifying a
      commit, or a cluster of commits that occurred within a short
      period of time and are treated as a single commit by bracket.
      It is 0 for the first commit (cluster) ever made, and increases
      by one for each subsequent commit (cluster).

      Note that commit cluster numbers are only meaningful within
      bracket itself and in the context of a particular version
      of the repository and a particular clustering policy.  That is,
      any given commit may have different cluster numbers at different
      points in time or in different installations of bracket.

  "rcsdate"
      A RCS-format date string, like "2010.09.09.10.00.00".


Locking

Multiple NetBSD versions can be tested simultaneously, and/or tested
in parallel with a build, to take advantage of multiprocessor hosts in
spite of the single-threaded nature of qemu.  The build phase, in
contrast, is deliberately limited to one build at a time, because even
a single build can use multiple CPUs, and running multiple parallel
builds in parallel could easily cause excessive load.

To keep the simultaneously running bracket processes from stepping on
one another's toes, they coordinate using a number of lock files.  The
lock files contain no data, but are locked using fcntl() by any
process that needs to access shared data.


The version control API

To add support for a new version control systems "foo", create a Python
module vc/foo/__init__.py defining the following functions:

default_branch()

Return the name of the default branch, such as "HEAD" or "master".

setup()

Perform any one-time initialization needed before running bracket for
the first time, such as creating a directory for the repository.
Can be empty.

update_repo_module(module)

Update the local repository of the given module from the upstream
repository.

index_repo()

If additional indexing or other postprocessing of the repository is
needed after it has been updated from upstream, do it here.  Can be
empty.

read_dates()

Extract timestamps for all the commit clusters on the branch of
interest from the repository, in chronological order, and assign the
resulting list to the global variable bracket.dates.

checkout(branch, module, ts, builddir, logfd)

Check out "branch" of "module" for timestamp "ts" into "builddir",
logging to "logfd".

last_safe_commit_ts()

Return the timestamp of the newest commit cluster that is safe to use.
For any reasonable version control system, this will be the newest
commit on the current branch, but in a version control system like CVS
where it is possible to accidentally check out only part of a commit
that happens concurrently with the checkout, it may be an older
commit.

get_commits(ts0, ts1)

Get a list of commits (in the CVS sense of individual file revisions,
not commit clusters) between the timestamps ts0 and ts1, which are
timestamps of commit clusters.

The returned set of commits excludes ts0 but includes ts1.  This is
weird from a "C" standpoint, but makes sense, in a way, when considering
the commit cluster timestamps as specifying points in time *between*
commits rather than *at* commits.

Each commit is represented as an object with the following attributes:

  timestamp
  committer
  file
  revision

format_commit_html(c)

Return HTML describing the commit c (a Commit object).  This typically
includes a link to the commit in a public web site such as
cvsweb.netbsd.org.

format_commit_email(c)

Return plain ASCII text describing the commit c (a Commit object),
for use in notification emails.
