About: Google Testing Bloggers

Profile:

Website

http://www.blogger.com/profile/03153388556673050910

Contact:

Posts by Google Testing Bloggers

Discomfort as a Tool for Change

Feb 13, 2017

by Dave Gladfelter (SETI, Google Drive)IntroductionThe SETI (Software Engineer, Tools and Infrastructure) role at Google is a strange one in that there’s no obvious reason why it should exist. The SWEs (Software Engineers) on a project understand its p…

Read | No Comments | Tags: Google Testing

Testing on the Toilet: Keep Cause and Effect Clear

Jan 31, 2017

by Ben Yu

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

Can you tell if this test is correct?

208: @Test public void testIncrement_existingKey() {
209:   assertEquals(9, tally.get("key1"));
210: }

It’s impossible to know without seeing how the tally object is set up:

1:   private final Tally tally = new Tally();
2:   @Before public void setUp() {
3:      tally.increment("key1", 8);
4:      tally.increment("key2", 100);
5:      tally.increment("key1", 0);
6:      tally.increment("key1", 1);
7:   }
// 200 lines away
208: @Test public void testIncrement_existingKey() {
209:   assertEquals(9, tally.get("key1"));
210: }

The problem is that the modification of key1‘s values occurs 200+ lines away from the assertion. Otherwise put, the cause is hidden far away from the effect.

Instead, write tests where the effects immediately follow the causes. It’s how we speak in natural language: “If you drive over the speed limit (cause), you’ll get a traffic ticket (effect).” Once we group the two chunks of code, we easily see what’s going on:

1:   private final Tally tally = new Tally();
2:   @Test public void testIncrement_newKey() {
3:     tally.increment("key", 100);
5:     assertEquals(100, tally.get("key"));
6:   }
7:   @Test public void testIncrement_existingKey() {
8:     tally.increment("key", 8);
9:     tally.increment("key", 1);
10:    assertEquals(9, tally.get("key"));
11:  }
12:  @Test public void testIncrement_incrementByZeroDoesNothing() {
13:    tally.increment("key", 8);
14:    tally.increment("key", 0);
15:    assertEquals(8, tally.get("key"));
16:  }

This style may require a bit more code. Each test sets its own input and verifies its own expected output. The payback is in more readable code and lower maintenance costs.

Read | No Comments | Tags: Google Testing

Happy 10th Birthday Google Testing Blog!

Jan 21, 2017

by Anthony ValloneTen years ago today, the first Google Testing Blog article was posted (official announcement 2 days later). Over the years, Google engineers have used this blog to help advance the test engineering discipline. We have shared informati…

Read | No Comments | Tags: Google Testing

Announcing OSS-Fuzz: Continuous Fuzzing for Open Source Software

Dec 1, 2016

By Mike Aizatsky, Kostya Serebryany (Software Engineers, Dynamic Tools); Oliver Chang, Abhishek Arya (Security Engineers, Google Chrome); and Meredith Whittaker (Open Research Lead). We are happy to announce OSS-Fuzz, a new Beta program developed …

Read | No Comments | Tags: Google Testing

What Test Engineers do at Google: Building Test Infrastructure

Nov 18, 2016

Author: Jochen WuttkeIn a recent post, we broadly talked about What Test Engineers do at Google. In this post, I talk about one aspect of the work TEs may do: building and improving test infrastructure to make engineers more productive. Refurbishing le…

Read | No Comments | Tags: Google Testing

Hackable Projects – Pillar 3: Infrastructure

Nov 10, 2016

By: Patrik Höglund

This is the third article in our series on Hackability; also see the first and second article.

We have seen in our previous articles how Code Health and Debuggability can make a project much easier to work on. The third pillar is a solid infrastructure that gets accurate feedback to your developers as fast as possible. Speed is going to be major theme in this article, and we’ll look at a number of things you can do to make your project easier to hack on.

Build Systems Speed

Question: What’s a change you’d really like to see in our development tools?

“I feel like this question gets asked almost every time, and I always give the same answer:
I would like them to be faster.”
— Ian Lance Taylor

Replace make with ninja. Use the gold linker instead of ld. Detect and delete dead code in your project (perhaps using coverage tools). Reduce the number of dependencies, and enforce dependency rules so new ones are not added lightly. Give the developers faster machines. Use distributed build, which is available with many open-source continuous integration systems (or use Google’s system, Bazel!). You should do everything you can to make the build faster.

Figure 1: “Cheetah chasing its prey” by Marlene Thyssen.

Why is that? There’s a tremendous difference in hackability if it takes 5 seconds to build and test versus one minute, or even 20 minutes, to build and test. Slow feedback cycles kill hackability, for many reasons:

Build and test times longer than a handful of seconds cause many developers’ minds to wander, taking them out of the zone.
Excessive build or release times* makes tinkering and refactoring much harder. All developers have a threshold when they start hedging (e.g. “I’d like to remove this branch, but I don’t know if I’ll break the iOS build”) which causes refactoring to not happen.

* The worst I ever heard of was an OS that took 24 hours to build!

How do you actually make fast build systems? There are some suggestions in the first paragraph above, but the best general suggestion I can make is to have a few engineers on the project who deeply understand the build systems and have the time to continuously improve them. The main axes of improvement are:

Reduce the amount of code being compiled.
Replace tools with faster counterparts.
Increase processing power, maybe through parallelization or distributed systems.

Note that there is a big difference between full builds and incremental builds. Both should be as fast as possible, but incremental builds are by far the most important to optimize. The way you tackle the two is different. For instance, reducing the total number of source files will make a full build faster, but it may not make an incremental build faster.

To get faster incremental builds, in general, you need to make each source file as decoupled as possible from the rest of the code base. The less a change ripples through the codebase, the less work to do, right? See “Loose Coupling and Testability” in Pillar 1 for more on this subject. The exact mechanics of dependencies and interfaces depends on programming language – one of the hardest to get right is unsurprisingly C++, where you need to be disciplined with includes and forward declarations to get any kind of incremental build performance.

Build scripts and makefiles should be held to standards as high as the code itself. Technical debt and unnecessary dependencies have a tendency to accumulate in build scripts, because no one has the time to understand and fix them. Avoid this by addressing the technical debt as you go.

Continuous Integration and Presubmit Queues

You should build and run tests on all platforms you release on. For instance, if you release on all the major desktop platforms, but all your developers are on Linux, this becomes extra important. It’s bad for hackability to update the repo, build on Windows, and find that lots of stuff is broken. It’s even worse if broken changes start to stack on top of each other. I think we all know that terrible feeling: when you’re not sure your change is the one that broke things.

At a minimum, you should build and test on all platforms, but it’s even better if you do it in presubmit. The Chromium submit queue does this. It has developed over the years so that a normal patch builds and tests on about 30 different build configurations before commit. This is necessary for the 400-patches-per-day velocity of the Chrome project. Most projects obviously don’t have to go that far. Chromium’s infrastructure is based off BuildBot, but there are many other job scheduling systems depending on your needs.

Figure 2: How a Chromium patch is tested.

As we discussed in Build Systems, speed and correctness are critical here. It takes a lot of ongoing work to keep build, tests, and presubmits fast and free of flakes. You should never accept flakes, since developers very quickly lose trust in flaky tests and systems. Tooling can help a bit with this; for instance, see the Chromium flakiness dashboard.

Test Speed

Speed is a feature, and this is particularly true for developer infrastructure. In general, the longer a test takes to execute, the less valuable it is. My rule of thumb is: if it takes more than a minute to execute, its value is greatly diminished. There are of course some exceptions, such as soak tests or certain performance tests.

Figure 3. Test value.

If you have tests that are slower than 60 seconds, they better be incredibly reliable and easily debuggable. A flaky test that takes several minutes to execute often has negative value because it slows down all work in the code it covers. You probably want to build better integration tests on lower levels instead, so you can make them faster and more reliable.

If you have many engineers on a project, reducing the time to run the tests can have a big impact. This is one reason why it’s great to have SETIs or the equivalent. There are many things you can do to improve test speed:

Sharding and parallelization. Add more machines to your continuous build as your test set or number of developers grows.
Continuously measure how long it takes to run one build+test cycle in your continuous build, and have someone take action when it gets slower.
Remove tests that don’t pull their weight. If a test is really slow, it’s often because of poorly written wait conditions or because the test bites off more than it can chew (maybe that unit test doesn’t have to process 15000 audio frames, maybe 50 is enough).
If you have tests that bring up a local server stack, for instance inter-server integration tests, making your servers boot faster is going to make the tests faster as well. Faster production code is faster to test! See Running on Localhost, in Pillar 2 for more on local server stacks.

Workflow Speed

We’ve talked about fast builds and tests, but the core developer workflows are also important, of course. Chromium undertook multi-year project to switch from Subversion to Git, partly because Subversion was becoming too slow. You need to keep track of your core workflows as your project grows. Your version control system may work fine for years, but become too slow once the project becomes big enough. Bug search and management must also be robust and fast, since this is generally systems developers touch every day.

Release Often

It aids hackability to deploy to real users as fast as possible. No matter how good your product’s tests are, there’s always a risk that there’s something you haven’t thought of. If you’re building a service or web site, you should aim to deploy multiple times per week. For client projects, Chrome’s six-weeks cycle is a good goal to aim for.

You should invest in infrastructure and tests that give you the confidence to do this – you don’t want to push something that’s broken. Your developers will thank you for it, since it makes their jobs so much easier. By releasing often, you mitigate risk, and developers will rush less to hack late changes in the release (since they know the next release isn’t far off).

Easy Reverts

If you look in the commit log for the Chromium project, you will see that a significant percentage of the commits are reverts of a previous commits. In Chromium, bad commits quickly become costly because they impede other engineers, and the high velocity can cause good changes to stack onto bad changes.

Figure 4: Chromium’s revert button.

This is why the policy is “revert first, ask questions later”. I believe a revert-first policy is good for small projects as well, since it creates a clear expectation that the product/tools/dev environment should be working at all times (and if it doesn’t, a recent change should probably be reverted).

It has a wonderful effect when a revert is simple to make. You can suddenly make speculative reverts if a test went flaky or a performance test regressed. It follows that if a patch is easy to revert, so is the inverse (i.e. reverting the revert or relanding the patch). So if you were wrong and that patch wasn’t guilty after all, it’s simple to re-land it again and try reverting another patch. Developers might initially balk at this (because it can’t possibly be their patch, right?), but they usually come around when they realize the benefits.

For many projects, a revert can simply be

git revert 9fbadbeef

git push origin master

If your project (wisely) involves code review, it will behoove you to add something like Chromium’s revert button that I mentioned above. The revert button will create a special patch that bypasses review and tests (since we can assume a clean revert takes us back to a more stable state rather than the opposite). See Pillar 1 for more on code review and its benefits.

For some projects, reverts are going to be harder, especially if you have a slow or laborious release process. Even if you release often, you could still have problems if a revert involves state migrations in your live services (for instance when rolling back a database schema change). You need to have a strategy to deal with such state changes.

Reverts must always put you back to safer ground, and everyone must be confident they can safely revert. If not, you run the risk of massive fire drills and lost user trust if a bad patch makes it through the tests and you can’t revert it.

Performance Tests: Measure Everything

Is it critical that your app starts up within a second? Should your app always render at 60 fps when it’s scrolled up or down? Should your web server always serve a response within 100 ms? Should your mobile app be smaller than 8 MB? If so, you should make a performance test for that. Performance tests aid hackability since developers can quickly see how their change affects performance and thus prevent performance regressions from making it into the wild.

You should run your automated performance tests on the same device; all devices are different and this will reflect in the numbers. This is fairly straightforward if you have a decent continuous integration system that runs tests sequentially on a known set of worker machines. It’s harder if you need to run on physical phones or tablets, but it can be done.

A test can be as simple as invoking a particular algorithm and measuring the time it takes to execute it (median and 90th percentile, say, over N runs).

Figure 5: A VP8-in-WebRTC regression (bug) and its fix displayed in a Catapult dashboard.

Write your test so it outputs performance numbers you care about. But what to do with those numbers? Fortunately, Chrome’s performance test framework has been open-sourced, which means you can set up a dashboard, with automatic regression monitoring, with minimal effort. The test framework also includes the powerful Telemetry framework which can run actions on web pages and Android apps and report performance results. Telemetry and Catapult are battle-tested by use in the Chromium project and are capable of running on a wide set of devices, while getting the maximum of usable performance data out of the devices.

Sources

Figure 1: By Malene Thyssen (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Read | No Comments | Tags: Google Testing

Hackable Projects – Pillar 2: Debuggability

Oct 11, 2016

By: Patrik Höglund

This is the second article in our series on Hackability; also see the first article.

“Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before.” — Edgar Allan Poe

Debuggability can mean being able to use a debugger, but here we’re interested in a broader meaning. Debuggability means being able to easily find what’s wrong with a piece of software, whether it’s through logs, statistics or debugger tools. Debuggability doesn’t happen by accident: you need to design it into your product. The amount of work it takes will vary depending on your product, programming language(s) and development environment.

In this article, I am going to walk through a few examples of how we have aided debuggability for our developers. If you do the same analysis and implementation for your project, perhaps you can help your developers illuminate the dark corners of the codebase and learn what truly goes on there.

Figure 1: computer log entry from the Mark II, with a moth taped to the page.

Running on Localhost

Read more on the Testing Blog: Hermetic Servers by Chaitali Narla and Diego Salas

Suppose you’re developing a service with a mobile app that connects to that service. You’re working on a new feature in the app that requires changes in the backend. Do you develop in production? That’s a really bad idea, as you must push unfinished code to production to work on your change. Don’t do that: it could break your service for your existing users. Instead, you need some kind of script that brings up your server stack on localhost.

You can probably run your servers by hand, but that quickly gets tedious. In Google, we usually use fancy python scripts that invoke the server binaries with flags. Why do we need those flags? Suppose, for instance, that you have a server A that depends on a server B and C. The default behavior when the server boots should be to connect to B and C in production. When booting on localhost, we want to connect to our local B and C though. For instance:

b_serv --port=1234 --db=/tmp/fakedb
c_serv --port=1235
a_serv --b_spec=localhost:1234 --c_spec=localhost:1235

That makes it a whole lot easier to develop and debug your server. Make sure the logs and stdout/stderr end up in some well-defined directory on localhost so you don’t waste time looking for them. You may want to write a basic debug client that sends HTTP requests or RPCs or whatever your server handles. It’s painful to have to boot the real app on a mobile phone just to test something.

A localhost setup is also a prerequisite for making hermetic tests,where the test invokes the above script to bring up the server stack. The test can then run, say, integration tests among the servers or even client-server integration tests. Such integration tests can catch protocol drift bugs between client and server, while being super stable by not talking to external or shared services.

Debugging Mobile Apps

First, mobile is hard. The tooling is generally less mature than for desktop, although things are steadily improving. Again, unit tests are great for hackability here. It’s really painful to always load your app on a phone connected to your workstation to see if a change worked. Robolectric unit tests and Espresso functional tests, for instance, run on your workstation and do not require a real phone. xcTestsand Earl Grey give you the same on iOS.

Debuggers ship with Xcode and Android Studio. If your Android app ships JNI code, it’s a bit trickier, but you can attach GDB to running processes on your phone. It’s worth spending the time figuring this out early in the project, so you don’t have to guess what your code is doing. Debugging unit tests is even better and can be done straightforwardly on your workstation.

When Debugging gets Tricky

Some products are harder to debug than others. One example is hard real-time systems, since their behavior is so dependent on timing (and you better not be hooked up to a real industrial controller or rocket engine when you hit a breakpoint!). One possible solution is to run the software on a fake clock instead of a hardware clock, so the clock stops when the program stops.

Another example is multi-process sandboxed programs such as Chromium. Since the browser spawns one renderer process per tab, how do you even attach a debugger to it? The developers have made it quite a lot easier with debugging flags and instructions. For instance, this wraps gdb around each renderer process as it starts up:

chrome --renderer-cmd-prefix='xterm -title renderer -e gdb --args'

The point is, you need to build these kinds of things into your product; this greatly aids hackability.

Proper Logging

Read more on the Testing Blog: Optimal Logging by Anthony Vallone

It’s hackability to get the right logs when you need them. It’s easy to fix a crash if you get a stack trace from the error location. It’s far from guaranteed you’ll get such a stack trace, for instance in C++ programs, but this is something you should not stand for. For instance, Chromium had a problem where renderer process crashes didn’t print in test logs, because the test was running in a separate process. This was later fixed, and this kind of investment is worthwhile to make. A clean stack trace is worth a lot more than a “renderer crashed” message.

Logs are also useful for development. It’s an art to determine how much logging is appropriate for a given piece of code, but it is a good idea to keep the default level of logging conservative and give developers the option to turn on more logging for the parts they’re working on (example: Chromium). Too much logging isn’t hackability. This article elaborates further on this topic.

Logs should also be properly symbolized for C/C++ projects; a naked list of addresses in a stack trace isn’t very helpful. This is easy if you build for development (e.g. with -g), but if the crash happens in a release build it’s a bit trickier. You then need to build the same binary with the same flags and use addr2line / ndk-stack / etc to symbolize the stack trace. It’s a good idea to build tools and scripts for this so it’s as easy as possible.

Monitoring and Statistics

It aids hackability if developers can quickly understand what effect their changes have in the real world. For this, monitoring tools such as Stackdriver for Google Cloudare excellent. If you’re running a service, such tools can help you keep track of request volumes and error rates. This way you can quickly detect that 30% increase in request errors, and roll back that bad code change, before it does too much damage. It also makes it possible to debug your service in production without disrupting it.

System Under Test (SUT) Size

Tests and debugging go hand in hand: it’s a lot easier to target a piece of code in a test than in the whole application. Small and focused tests aid debuggability, because when a test breaks there isn’t an enormous SUT to look for errors in. These tests will also be less flaky. This article discusses this fact at length.

Figure 2. The smaller the SUT, the more valuable the test.

You should try to keep the above in mind, particularly when writing integration tests. If you’re testing a mobile app with a server, what bugs are you actually trying to catch? If you’re trying to ensure the app can still talk to the server (i.e. catching protocol drift bugs), you should not involve the UI of the app. That’s not what you’re testing here. Instead, break out the signaling part of the app into a library, test that directly against your local server stack, and write separate tests for the UI that only test the UI.

Smaller SUTs also greatly aids test speed, since there’s less to build, less to bring up and less to keep running. In general, strive to keep the SUT as small as possible through whatever means necessary. It will keep the tests smaller, faster and more focused.

Sources

Figure 1: By Courtesy of the Naval Surface Warfare Center, Dahlgren, VA., 1988. – U.S. Naval Historical Center Online Library Photograph NH 96566-KN, Public Domain, https://commons.wikimedia.org/w/index.php?curid=165211

(Continue to Pillar 3: Infrastructure)

Read | No Comments | Tags: Google Testing

Testing on the Toilet: What Makes a Good End-to-End Test?

Sep 21, 2016

by Adam BenderThis article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.An end-to-end test tests your entire system from one end to the other…

Read | No Comments | Tags: Google Testing

What Test Engineers do at Google

Sep 12, 2016

by Matt Lowrie, Manjusha Parvathaneni, Benjamin Pick, and Jochen Wuttke

Test engineers (TEs) at Google are a dedicated group of engineers who use proven testing practices to foster excellence in our products. We orchestrate the rapid testing and releasing of products and features our users rely on. Achieving this velocity requires creative and diverse engineering skills that allow us to advocate for our users. By building testable user journeys into the process, we ensure reliable products. TEs are also the glue that bring together feature stakeholders (product managers, development teams, UX designers, release engineers, beta testers, end users, etc.) to confirm successful product launches. Essentially, every day we ask ourselves, “How can we make our software development process more efficient to deliver products that make our users happy?”.

The TE role grew out of the desire to make Google’s early free products, like Search, Gmail and Docs, better than similar paid products on the market at the time. Early on in Google’s history, a small group of engineers believed that the company’s “launch and iterate” approach to software deployment could be improved with continuous automated testing. They took it upon themselves to promote good testing practices to every team throughout the company, via some programs you may have heard about: Testing on the Toilet, the Test Certified Program, and the Google Test Automation Conference (GTAC). These efforts resulted in every project taking ownership of all aspects of testing, such as code coverage and performance testing. Testing practices quickly became commonplace throughout the company and engineers writing tests for their own code became the standard. Today, TEs carry on this tradition of setting the standard of quality which all products should achieve.

Historically, Google has sustained two separate job titles related to product testing and test infrastructure, which has caused confusion. We often get asked what the difference is between the two. The rebranding of the Software engineer, tools and infrastructure (SETI) role, which now concentrates on engineering productivity, has been addressed in a previous blog post. What this means for test engineers at Google, is an enhanced responsibility of being the authority on product excellence. We are expected to uphold testing standards company-wide, both programmatically and persuasively.

Test engineer is a unique role at Google. As TEs, we define and organize our own engineering projects, bridging gaps between engineering output and end-user satisfaction. To give you an idea of what TEs do, here are some examples of challenges we need to solve on any particular day:

Automate a manual verification process for product release candidates so developers have more time to respond to potential release-blocking issues.
Design and implement an automated way to track and surface Android battery usage to developers, so that they know immediately when a new feature will cause users drained batteries.
Quantify if a regenerated data set used by a product, which contains a billion entities, is better quality than the data set currently live in production.
Write an automated test suite that validates if content presented to a user is of an acceptable quality level based on their interests.
Read an engineering design proposal for a new feature and provide suggestions about how and where to build in testability.
Investigate correlated stack traces submitted by users through our feedback tracking system, and search the code base to find the correct owner for escalation.
Collaborate on determining the root cause of a production outage, then pinpoint tests that need to be added to prevent similar outages in the future.
Organize a task force to advise teams across the company about best practices when testing for accessibility.

Over the next few weeks leading up to GTAC, we will also post vignettes of actual TEs working on different projects at Google, to showcase the diversity of the Google Test Engineer role. Stay tuned!

Read | No Comments | Tags: Google Testing

Hackable Projects – Pillar 1: Code Health

Aug 18, 2016

By: Patrik Höglund

Introduction

Software development is difficult. Projects often evolve over several years, under changing requirements and shifting market conditions, impacting developer tools and infrastructure. Technical debt, slow build systems, poor debuggability, and increasing numbers of dependencies can weigh down a project The developers get weary, and cobwebs accumulate in dusty corners of the code base.

Fighting these issues can be taxing and feel like a quixotic undertaking, but don’t worry — the Google Testing Blog is riding to the rescue! This is the first article of a series on “hackability” that identifies some of the issues that hinder software projects and outlines what Google SETIs usually do about them.

According to Wiktionary, hackable is defined as:

Adjective
hackable ‎(comparative more hackable, superlative most hackable)

(computing) That can be hacked or broken into; insecure, vulnerable.
That lends itself to hacking (technical tinkering and modification); moddable.

Obviously, we’re not going to talk about making your product more vulnerable (by, say, rolling your own crypto or something equally unwise); instead, we will focus on the second definition, which essentially means “something that is easy to work on.” This has become the mainfocus for SETIs at Google as the role has evolved over the years.

In Practice

In a hackable project, it’s easy to try things and hard to break things. Hackability means fast feedback cycles that offer useful information to the developer.

This is hackability:

Developing is easy
Fast build
Good, fast tests
Clean code
Easy running + debugging
One-click rollbacks

In contrast, what is not hackability?

Broken HEAD (tip-of-tree)
Slow presubmit (i.e. checks running before submit)
Builds take hours
Incremental build/link > 30s
Flakytests
Can’t attach debugger
Logs full of uninteresting information

The Three Pillars of Hackability

There are a number of tools and practices that foster hackability. When everything is in place, it feels great to work on the product. Basically no time is spent on figuring out why things are broken, and all time is spent on what matters, which is understanding and working with the code. I believe there are three main pillars that support hackability. If one of them is absent, hackability will suffer. They are:

Pillar 1: Code Health

“I found Rome a city of bricks, and left it a city of marble.”
— Augustus

Keeping the code in good shape is critical for hackability. It’s a lot harder to tinker and modify something if you don’t understand what it does (or if it’s full of hidden traps, for that matter).

Tests

Unit and small integration tests are probably the best things you can do for hackability. They’re a support you can lean on while making your changes, and they contain lots of good information on what the code does. It isn’t hackability to boot a slow UI and click buttons on every iteration to verify your change worked – it is hackability to run a sub-second set of unit tests! In contrast, end-to-end (E2E) tests generally help hackability much less (and can evenbe a hindrance if they, or the product, are in sufficiently bad shape).

Figure 1: the Testing Pyramid.

I’ve always been interested in how you actually make unit tests happen in a team. It’s about education. Writing a product such that it has good unit tests is actually a hard problem. It requires knowledge of dependency injection, testing/mocking frameworks, language idioms and refactoring. The difficulty varies by language as well. Writing unit tests in Go or Java is quite easy and natural, whereas in C++ it can be very difficult (and it isn’t exactly ingrained in C++ culture to write unit tests).

It’s important to educate your developers about unit tests. Sometimes, it is appropriate to lead by example and help review unit tests as well. You can have a large impact on a project by establishing a pattern of unit testing early. If tons of code gets written without unit tests, it will be much harder to add unit tests later.

What if you already have tons of poorly tested legacy code? The answer is refactoring and adding tests as you go. It’s hard work, but each line you add a test for is one more line that is easier to hack on.

Readable Code and Code Review

At Google, “readability” is a special committer status that is granted per language (C++, Go, Java and so on). It means that a person not only knows the language and its culture and idioms well, but also can write clean, well tested and well structured code. Readability literally means that you’re a guardian of Google’s code base and should push back on hacky and ugly code. The use of a style guide enforces consistency, and code review (where at least one person with readability must approve) ensures the code upholds high quality. Engineers must take care to not depend too much on “review buddies” here but really make sure to pull in the person that can give the best feedback.

Requiring code reviews naturally results in small changes, as reviewers often get grumpy if you dump huge changelists in their lap (at least if reviewers are somewhat fast to respond, which they should be). This is a good thing, since small changes are less risky and are easy to roll back. Furthermore, code review is good for knowledge sharing. You can also do pair programming if your team prefers that (a pair-programmed change is considered reviewed and can be submitted when both engineers are happy). There are multiple open-source review tools out there, such as Gerrit.

Nice, clean code is great for hackability, since you don’t need to spend time to unwind that nasty pointer hack in your head before making your changes. How do you make all this happen in practice? Put together workshops on, say, the SOLID principles, unit testing, or concurrency to encourage developers to learn. Spread knowledge through code review, pair programming and mentoring (such as with the Readability concept). You can’t just mandate higher code quality; it takes a lot of work, effort and consistency.

Presubmit Testing and Lint

Consistently formatted source code aids hackability. You can scan code faster if its formatting is consistent. Automated tooling also aids hackability. It really doesn’t make sense to waste any time on formatting source code by hand. You should be using tools like gofmt, clang-format, etc. If the patch isn’t formatted properly, you should see something like this (example from Chrome):

$ git cl upload
Error: the media/audio directory requires formatting. Please run 
    git cl format media/audio.

Source formatting isn’t the only thing to check. In fact, you should check pretty much anything you have as a rule in your project. Should other modules not depend on the internals of your modules? Enforce it with a check. Are there already inappropriate dependencies in your project? Whitelist the existing ones for now, but at least block new bad dependencies from forming. Should our app work on Android 16 phones and newer? Add linting, so we don’t use level 17+ APIs without gating at runtime. Should your project’s VHDL code always place-and-route cleanly on a particular brand of FPGA? Invoke the layout tool in your presubmit and and stop submit if the layout process fails.

Presubmit is the most valuable real estate for aiding hackability. You have limited space in your presubmit, but you can get tremendous value out of it if you put the right things there. You should stop all obvious errors here.

It aids hackability to have all this tooling so you don’t have to waste time going back and breaking things for other developers. Remember you need to maintain the presubmit well; it’s not hackability to have a slow, overbearing or buggy presubmit. Having a good presubmit can make it tremendously more pleasant to work on a project. We’re going to talk more in later articles on how to build infrastructure for submit queues and presubmit.

Single Branch And Reducing Risk

Having a single branch for everything, and putting risky new changes behind feature flags, aids hackability since branches and forks often amass tremendous risk when it’s time to merge them. Single branches smooth out the risk. Furthermore, running all your tests on many branches is expensive. However, a single branch can have negative effects on hackability if Team A depends on a library from Team B and gets broken by Team B a lot. Having some kind of stabilization on Team B’s software might be a good idea there. Thisarticle covers such situations, and how to integrate often with your dependencies to reduce the risk that one of them will break you.

Loose Coupling and Testability

Tightly coupled code is terrible for hackability. To take the most ridiculous example I know: I once heard of a computer game where a developer changed a ballistics algorithm and broke the game’s chat. That’s hilarious, but hardly intuitive for the poor developer that made the change. A hallmark of loosely coupled code is that it’s upfront about its dependencies and behavior and is easy to modify and move around.

Loose coupling, coherence and so on is really about design and architecture and is notoriously hard to measure. It really takes experience. One of the best ways to convey such experience is through code review, which we’ve already mentioned. Education on the SOLID principles, rules of thumb such as tell-don’t-ask, discussions about anti-patterns and code smells are all good here. Again, it’s hard to build tooling for this. You could write a presubmit check that forbids methods longer than 20 lines or cyclomatic complexity over 30, but that’s probably shooting yourself in the foot. Developers would consider that overbearing rather than a helpful assist.

SETIs at Google are expected to give input on a product’s testability. A few well-placed test hooks in your product can enable tremendously powerful testing, such as serving mock content for apps (this enables you to meaningfully test app UI without contacting your real servers, for instance). Testability can also have an influence on architecture. For instance, it’s a testability problem if your servers are built like a huge monolith that is slow to build and start, or if it can’t boot on localhost without calling external services. We’ll cover this in the next article.

Aggressively Reduce Technical Debt

It’s quite easy to add a lot of code and dependencies and call it a day when the software works. New projects can do this without many problems, but as the project becomes older it becomes a “legacy” project, weighed down by dependencies and excess code. Don’t end up there. It’s bad for hackability to have a slew of bug fixes stacked on top of unwise and obsolete decisions, and understanding and untangling the software becomes more difficult.

What constitutes technical debt varies by project and is something you need to learn from experience. It simply means the software isn’t in optimal form. Some types of technical debt are easy to classify, such as dead code and barely-used dependencies. Some types are harder to identify, such as when the architecture of the project has grown unfit to the task from changing requirements. We can’t use tooling to help with the latter, but we can with the former.

I already mentioned that dependency enforcement can go a long way toward keeping people honest. It helps make sure people are making the appropriate trade-offs instead of just slapping on a new dependency, and it requires them to explain to a fellow engineer when they want to override a dependency rule. This can prevent unhealthy dependencies like circular dependencies, abstract modules depending on concrete modules, or modules depending on the internals of other modules.

There are various tools available for visualizing dependency graphs as well. You can use these to get a grip on your current situation and start cleaning up dependencies. If you have a huge dependency you only use a small part of, maybe you can replace it with something simpler. If an old part of your app has inappropriate dependencies and other problems, maybe it’s time to rewrite that part.

(Continue to Pillar 2: Debuggability)

Read | No Comments | Tags: Google Testing

The Inquiry Method for Test Planning

Jun 6, 2016

by Anthony Vallone

updated: July 2016

Creating a test plan is often a complex undertaking. An ideal test plan is accomplished by applying basic principles of cost-benefit analysis and risk analysis, optimally balancing these software development factors:

Implementation cost: The time and complexity of implementing testable features and automated tests for specific scenarios will vary, and this affects short-term development cost.
Maintenance cost: Some tests or test plans may vary from easy to difficult to maintain, and this affects long-term development cost. When manual testing is chosen, this also adds to long-term cost.
Monetary cost: Some test approaches may require billed resources.
Benefit: Tests are capable of preventing issues and aiding productivity by varying degrees. Also, the earlier they can catch problems in the development life-cycle, the greater the benefit.
Risk: The probability of failure scenarios may vary from rare to likely, and their consequences may vary from minor nuisance to catastrophic.

Effectively balancing these factors in a plan depends heavily on project criticality, implementation details, resources available, and team opinions. Many projects can achieve outstanding coverage with high-benefit, low-cost unit tests, but they may need to weigh options for larger tests and complex corner cases. Mission critical projects must minimize risk as much as possible, so they will accept higher costs and invest heavily in rigorous testing at all levels.

This guide puts the onus on the reader to find the right balance for their project. Also, it does not provide a test plan template, because templates are often too generic or too specific and quickly become outdated. Instead, it focuses on selecting the best content when writing a test plan.

Test plan vs. strategy

Before proceeding, two common methods for defining test plans need to be clarified:

Single test plan: Some projects have a single “test plan” that describes all implemented and planned testing for the project.
Single test strategy and many plans: Some projects have a “test strategy” document as well as many smaller “test plan” documents. Strategies typically cover the overall test approach and goals, while plans cover specific features or project updates.

Either of these may be embedded in and integrated with project design documents. Both of these methods work well, so choose whichever makes sense for your project. Generally speaking, stable projects benefit from a single plan, whereas rapidly changing projects are best served by infrequently changed strategies and frequently added plans.

For the purpose of this guide, I will refer to both test document types simply as “test plans”. If you have multiple documents, just apply the advice below to your document aggregation.

Content selection

A good approach to creating content for your test plan is to start by listing all questions that need answers. The lists below provide a comprehensive collection of important questions that may or may not apply to your project. Go through the lists and select all that apply. By answering these questions, you will form the contents for your test plan, and you should structure your plan around the chosen content in any format your team prefers. Be sure to balance the factors as mentioned above when making decisions.

Prerequisites

Do you need a test plan? If there is no project design document or a clear vision for the product, it may be too early to write a test plan.
Has testability been considered in the project design? Before a project gets too far into implementation, all scenarios must be designed as testable, preferably via automation. Both project design documents and test plans should comment on testability as needed.
Will you keep the plan up-to-date? If so, be careful about adding too much detail, otherwise it may be difficult to maintain the plan.
Does this quality effort overlap with other teams? If so, how have you deduplicated the work?

Risk

Are there any significant project risks, and how will you mitigate them? Consider:
- Injury to people or animals
- Security and integrity of user data
- User privacy
- Security of company systems
- Hardware or property damage
- Legal and compliance issues
- Exposure of confidential or sensitive data
- Data loss or corruption
- Revenue loss
- Unrecoverable scenarios
- SLAs
- Performance requirements
- Misinforming users
- Impact to other projects
- Impact from other projects
- Impact to company’s public image
- Loss of productivity
What are the project’s technical vulnerabilities? Consider:
- Features or components known to be hacky, fragile, or in great need of refactoring
- Dependencies or platforms that frequently cause issues
- Possibility for users to cause harm to the system
- Trends seen in past issues

Coverage

What does the test surface look like? Is it a simple library with one method, or a multi-platform client-server stateful system with a combinatorial explosion of use cases? Describe the design and architecture of the system in a way that highlights possible points of failure.
What platforms are supported? Consider listing supported operating systems, hardware, devices, etc. Also describe how testing will be performed and reported for each platform.
What are the features? Consider making a summary list of all features and describe how certain categories of features will be tested.
What will not be tested? No test suite covers every possibility. It’s best to be up-front about this and provide rationale for not testing certain cases. Examples: low risk areas that are a low priority, complex cases that are a low priority, areas covered by other teams, features not ready for testing, etc.
What is covered by unit (small), integration (medium), and system (large) tests? Always test as much as possible in smaller tests, leaving fewer cases for larger tests. Describe how certain categories of test cases are best tested by each test size and provide rationale.
What will be tested manually vs. automated? When feasible and cost-effective, automation is usually best. Many projects can automate all testing. However, there may be good reasons to choose manual testing. Describe the types of cases that will be tested manually and provide rationale.
How are you covering each test category? Consider:
- accessibility
- functional
- fuzz
- internationalization and localization
- performance, load, stress, and endurance (aka soak)
- privacy
- security
- smoke
- stability
- usability
Will you use static and/or dynamic analysis tools? Both static analysis tools and dynamic analysis tools can find problems that are hard to catch in reviews and testing, so consider using them.
How will system components and dependencies be stubbed, mocked, faked, staged, or used normally during testing? There are good reasons to do each of these, and they each have a unique impact on coverage.
What builds are your tests running against? Are tests running against a build from HEAD (aka tip), a staged build, and/or a release candidate? If only from HEAD, how will you test release build cherry picks (selection of individual changelists for a release) and system configuration changes not normally seen by builds from HEAD?
What kind of testing will be done outside of your team? Examples:
- Dogfooding
- External crowdsource testing
- Public alpha/beta versions (how will they be tested before releasing?)
- External trusted testers
How are data migrations tested? You may need special testing to compare before and after migration results.
Do you need to be concerned with backward compatibility? You may own previously distributed clients or there may be other systems that depend on your system’s protocol, configuration, features, and behavior.
Do you need to test upgrade scenarios for server/client/device software or dependencies/platforms/APIs that the software utilizes?
Do you have line coverage goals?

Tooling and Infrastructure

Do you need new test frameworks? If so, describe these or add design links in the plan.
Do you need a new test lab setup? If so, describe these or add design links in the plan.
If your project offers a service to other projects, are you providing test tools to those users? Consider providing mocks, fakes, and/or reliable staged servers for users trying to test their integration with your system.
For end-to-end testing, how will test infrastructure, systems under test, and other dependencies be managed? How will they be deployed? How will persistence be set-up/torn-down? How will you handle required migrations from one datacenter to another?
Do you need tools to help debug system or test failures? You may be able to use existing tools, or you may need to develop new ones.

Process

Are there test schedule requirements? What time commitments have been made, which tests will be in place (or test feedback provided) by what dates? Are some tests important to deliver before others?
How are builds and tests run continuously? Most small tests will be run by continuous integration tools, but large tests may need a different approach. Alternatively, you may opt for running large tests as-needed.
How will build and test results be reported and monitored?
- Do you have a team rotation to monitor continuous integration?
- Large tests might require monitoring by someone with expertise.
- Do you need a dashboard for test results and other project health indicators?
- Who will get email alerts and how?
- Will the person monitoring tests simply use verbal communication to the team?
How are tests used when releasing?
- Are they run explicitly against the release candidate, or does the release process depend only on continuous test results?
- If system components and dependencies are released independently, are tests run for each type of release?
- Will a “release blocker” bug stop the release manager(s) from actually releasing? Is there an agreement on what are the release blocking criteria?
- When performing canary releases (aka % rollouts), how will progress be monitored and tested?
How will external users report bugs? Consider feedback links or other similar tools to collect and cluster reports.
How does bug triage work? Consider labels or categories for bugs in order for them to land in a triage bucket. Also make sure the teams responsible for filing and or creating the bug report template are aware of this. Are you using one bug tracker or do you need to setup some automatic or manual import routine?
Do you have a policy for submitting new tests before closing bugs that could have been caught?
How are tests used for unsubmitted changes? If anyone can run all tests against any experimental build (a good thing), consider providing a howto.
How can team members create and/or debug tests? Consider providing a howto.

Utility

Who are the test plan readers? Some test plans are only read by a few people, while others are read by many. At a minimum, you should consider getting a review from all stakeholders (project managers, tech leads, feature owners). When writing the plan, be sure to understand the expected readers, provide them with enough background to understand the plan, and answer all questions you think they will have – even if your answer is that you don’t have an answer yet. Also consider adding contacts for the test plan, so any reader can get more information.
How can readers review the actual test cases? Manual cases might be in a test case management tool, in a separate document, or included in the test plan. Consider providing links to directories containing automated test cases.
Do you need traceability between requirements, features, and tests?
Do you have any general product health or quality goals and how will you measure success? Consider:
- Release cadence
- Number of bugs caught by users in production
- Number of bugs caught in release testing
- Number of open bugs over time
- Code coverage
- Cost of manual testing
- Difficulty of creating new tests

Read | No Comments | Tags: Google Testing

GTAC 2016 Registration Deadline Extended

Jun 2, 2016

by Sonal Shah on behalf of the GTAC CommitteeOur goal in organizing GTAC each year is to make it a first-class conference, dedicated to presenting leading edge industry practices. The quality of submissions we’ve received for GTAC 2016 so far has been …

Read | No Comments | Tags: Google Testing

Flaky Tests at Google and How We Mitigate Them

May 27, 2016

by John MiccoAt Google, we run a very large corpus of tests continuously to validate our code submissions. Everyone from developers to project managers rely on the results of these tests to make decisions about whether the system is ready for deploymen…

Read | No Comments | Tags: Google Testing

GTAC Diversity Scholarship

May 4, 2016

by Lesley Katzen on behalf of the GTAC Diversity CommitteeWe are committed to increasing diversity at GTAC, and we believe the best way to do that is by making sure we have a diverse set of applicants to speak and attend. As part of that commitment, we…

Read | No Comments | Tags: Google Testing

GTAC 2016 Registration is now open!

May 4, 2016

by Sonal Shah on behalf of the GTAC CommitteeThe GTAC (Google Test Automation Conference) 2016 application process is now open for presentation proposals and attendance. GTAC will be held at the Google Sunnyvale office on November 15th – 16th, 2016.GTA…

Read | No Comments | Tags: Google Testing

About: Google Testing Bloggers

Profile:

Website

Contact:

Posts by Google Testing Bloggers

Discomfort as a Tool for Change

Testing on the Toilet: Keep Cause and Effect Clear

Happy 10th Birthday Google Testing Blog!

Announcing OSS-Fuzz: Continuous Fuzzing for Open Source Software

What Test Engineers do at Google: Building Test Infrastructure

Hackable Projects – Pillar 3: Infrastructure

Build Systems Speed

Continuous Integration and Presubmit Queues

Test Speed

Workflow Speed

Release Often

Easy Reverts

Performance Tests: Measure Everything

Sources

Running on Localhost

Debugging Mobile Apps

When Debugging gets Tricky

Proper Logging

Monitoring and Statistics

System Under Test (SUT) Size

Sources

Introduction

In Practice

The Three Pillars of Hackability

Pillar 1: Code Health

Tests

Readable Code and Code Review

Presubmit Testing and Lint

Single Branch And Reducing Risk

Loose Coupling and Testability

Aggressively Reduce Technical Debt

Test plan vs. strategy

Content selection

Prerequisites

Risk

Coverage

Tooling and Infrastructure

Process

Utility

Categories

Tags