Graph-based Software Production Automation

Introduction

A basic high level goal of Software Production Automation is to have a single point of control to perform all the tasks necessary for production. Typically, these include the steps to :

  • synch the source code from repository to build tree
  • build the product and related subproducts
  • run unit tests in the build environment
  • release the build results
  • install the product
  • create image
  • deliver image online
  • run regression tests in the installed environment
  • rerun failed tests
Other support tasks needed would be like :
  • stage the build artifacts so that developers can access them without having
  • to build themselves ( especially if the build results take a lot of resources )
  • build debug versions, ready for developers' use when needed
  • build the product for specific metrics such as PureCoverage and Coverity
  • run regression tests for metric builds

Each of these steps may be in a different state of automation. Some are purely manual, others are automated with scripts but assume manual prechecking of their dependencies, etc. Many scripts may not verify their output, even if strung along with other tools. These steps may be developed by different engineers, and in different states of readiness. Some steps have been performed by one engineer for so long that they don't work when performed using another account.

The build steps have to be repeated across the required range of build platforms. Since one build platform, say Windows XP has to be tested against several test platforms, eg Windows 2000, Windows XP and Windows Vista, the test steps are similarly scaled.

Theoretically, all the build and test platforms can be parallelized. However, depending on circumstances, adjustments may have to be made. Examples include :

  • Some tests require vendor licenses that may be limited.
  • There may be a limited number ( or just one ) of machines with an older platform version. Any testing requiring the older platform will have to be serialized to the limited number.
  • Debug and metrics builds may have to be sequenced after the product builds.

In summary, the complexity of such a system comes from the need to

  • integrate existing scripts and tools that implement parts of the process, ie reuse what works
  • implement new steps for other parts of the process
  • make explicit the dependencies of each step, ie eliminate unnecessary waits and delays
  • maximize use of parallelism
  • support incremental and parallel development since each subgraph may be at a different level of readiness
  • reuse of subgraphs for different purposes
  • consistent collection of runtime information for later process analysis and trending

Features

This showcase paper describes a graph-based software automation system that encapsulates disparate process steps using a consistent "step" methodology. Each "step" allows for pre and post work, and allows hooks for executing scripts or Perl functions. The step methodology is intended as a guideline, ie existing scripts that do not have post operations for example, can be integrated first, and the post operations added later.

These process steps can be flexibly assembled into subgraphs and graphs, thereby making their run dependencies explicit. Process activity can be developed piecemeal as subgraphs and then assembled into a large graph. The final graph then provides the single point of view and control. Other graphs can be tailored for other purposes - eg a purely debugged build and test, etc.

The automation system provides capabilities to run/stop/bypass arbitary points in any graph or subgraph, and to debug and restart failed steps.

Execution of each step is implemented for rsh and the Sun Grid Engine (SGE). For Unix platforms, both rsh and SGE modes are available. For Windows, only rsh mode is available.

During execution, the steps will be colored to show their state - notStarted, Done, Failed, etc. For more details, see the State Transition Diagram for Graph Nodes. As each step is executed, process metrics such as runtime and disk used are collected in a consistent manner. This allows the process itself to be analyzed and trends identified.

Thus, the graph is not only used for visualization of the process, its dependencies and run status, but is also the input for the automation system itself.

Illustrative Graph

Below is a sanitized, yet illustrative sample Production graph. Each of the subgraphs such as "Sync and Build" was developed and tested individually and then assembled.

Click on each subgraph to bring up a more detailed window of the graph. If your browser is not IE7, there may be some reduced functionality.

State Transition Diagram for Graph Nodes

This diagram shows the state transitions for a "step" in the graph. Overlayed on top are colors used to show the states for easy visual feedback.

Supporting Practices

A fully working Software Production system cannot just rely on technology. Other supporting practices need to be brought to bear so as to keep the inputs and environments of the automated system consistent and reliable. Some of these supporting practices include :

  • single point of configuration
  • reliable way to manage licenses needed for testing and metrics
  • consistent build methodology, regardless of whether operating from the automated system
  • or outside it
  • consistent way to document and use environment variables

Implementation

The open-source Graphviz project provides a standard format for describing graphs, and some basic tools to view and manipulate them. For very large graphs, more advanced viewers such as ZGRViewer are also available.

The process graphs are described in the Graphviz .dot format.

The software automation system is implemented with Perl. The .dot graphs are manipulated with the Graph package.

The single point of configuration is enabled using the Perl Config:Scoped package. The format of the configuration files is simple enough for non Perl tools, yet flexible for internal Perl usage.