Testing systems with large and complex test spaces

January 14th, 2008 | Published in Google Testing

Posted by Adam Porter, Professor Department of Computer Science, University of Maryland, and Associate Director, University of Maryland Institute for Advanced Computing Studies

[From time to time, we invite software testing experts to write about their ideas and research. - Ed.]

Motivation:

Software systems are getting larger, more complex and more configurable all the time. While beneficial in many ways, these changes dramatically increase testing obligations.

For example, today's systems are increasingly built to run on multiple OS, compiler and library platforms. They are composed from multiple components, each with multiple versions. They are configured by manipulating numerous compile- and run-time options. Additionally, distributed applications add an allocation dimension in which runtime topologies can vary widely.

This situation is further complicated by agile and flexible development practices in which systems are evolved incrementally over short, but varying update cycles and in which development teams may be geographically distributed.

Basically, each new configuration dimension increases the number of potential runtime configurations combinatorially. Since each of these configurations might behave differently or host different bugs, each of these configurations, at least in theory, must be tested. Furthermore, this increased amount of testing must be done in shorter and shorter time frames because the systems themselves are changing faster than ever.

The Skoll project:

Our research aims to create better, faster and more powerful techniques for testing these kinds of systems. Our vision is to redesign traditional testing processes so that they can be executed around-the-world, around-the-clock. These processes are logically divided into multiple tasks that are distributed intelligently to client machines around the world and then executed by them. The results from these distributed tasks are returned to central collection sites where they are merged and analyzed to complete the overall QA process.

At the second Google Test Automation Conference, I and my colleague Atif Memon presented an infrastructure and approach called Skoll that was created to support this vision. The video is embedded below.

Skoll in action:

To give you a better sense of how this works we're going to walk you through how we set up a Skoll server to run a simplified version of the continuous build, test and integration process we've developed to run the MySQL Build Farm Initiative. This process runs on computing resources that are temporarily volunteered by members of the MySQL developer and user communities. After reading the rest of this post, we hope you too will be ready and willing to volunteer as well!

MySQL, as you probably know, is a very popular open source database comprising over 2 million lines of code. It is developed by a world wide community and runs on many different platforms, has over 150 configuration options, and allows substantial static and runtime end-user customization. For instance, you can select different database front- and back-ends, and you can run the system with different runtime topologies to support database replication. In short, it's exactly the kind of system we had in mind when we created Skoll.

We first started speaking with MySQL developers after the first Google Test Automation Conference. They were interested in running a continuous build, integration and test (CBIT) process. They had several goals, such as: testing a broader variety of configurations, not just a handful of popular ones; giving developers greater visibility into the quality and stability of the system; improving bug fix turnaround time; and managing test data to enable long term statistical analysis and improved decision making.

Since they couldn't seem to find anything off-the-shelf that was sufficiently targeted to their needs, we worked with them to develop a Skoll-based CBIT process. This process has several parts: defining a configuration model, implementing a sampling strategy, executing the tests, and analyzing and visualizing the results.

We will discuss each of these below. Readers who just want to run a client can jump straight to the Section marked Test execution.

Some more details:

Configuration Model: We are starting out by looking at 23 options. There are some inter-option constraints as well. For example, a configuration can compile in support for either the libedit library (--with-libedit) or the readline library (--with-readline), but not both. Here's a ling to the current configuration model . We will expand this model as we gain more experience with this process and more insight into the key issues concerning MySQL.

Sampling strategy: There are over 48 million unique configurations in this test space. Since testing 1 configuration can take up to 2 hours and because new releases come out more or less daily, exhaustive testing of each check-in is clearly impossible. Therefore, we only test specially-chosen subsets of configurations in which all t-way (2

Test execution: To participate in this process, users can go to our MySQL 5.1 project page. On that page you can find links to download a client along with instructions on how to install and run it. This client is a simple Perl script that connects to a Skoll server and asks for a test job. The server examines its internal databases, selects an outstanding job and returns it to the client who then executes it. Currently, testing a configurations involves compiling MySQL in a specific configuration and then running about 750 MySQL-supplied tests on that configuration. The test scripts determine which test cases can be run in the current configuration and runs them. After completing the tests, the client uploads the results to the server.

Feedback, analysis and visualization: For this process, we are interested in understanding where configuration-related bugs might be hiding. To figure this out, we periodically analyze test results by building a classification tree for each test that has failed in a minimum number of configurations. These trees model test failures in terms of the configuration option and settings that were set when the tests failed. Users can look at a web page to see the current test results . This page shows summary statistics for each build ID. Clicking on a build ID takes you to a detail page listing each test that failed a minimum number of times. Each test presents classification information, both in a raw form and as an interactive treemap.

Current status:

We have just started running this process on a continuous basis using an a small number of MySQL developer community machines. We hope to bring many more test machines online in the coming weeks. Please check out the results web page to watch our progress. And don't let everyone else have all the fun -- download your client today! Thanks!

Google Data