October 28th, 2008 | Published in Google Testing
So you're working on TheFinalApp - the ultimate end-user application, with lots of good features and a really neat GUI. You have a team that's keen on testing and a level of unit test coverage that others only dream of. The star of the show is your suite of automatic GUI end-to-end tests — your team doesn't have to manually test every release candidate.
Life would be good if only the GUI tests weren't so flaky. Every once and again, your test case clicks a menu item too early, while the menu is still opening. Or it double-clicks to open a tree node, tries to verify the open too early, then retries, which closes the node (oops). You have tried adding sleep statements, which has helped somewhat, but has also slowed down your tests.
Why all this pain? Because GUIs are not designed to synchronize with other computer programs. They are designed to synchronize with human beings, which are not like computers:
- Humans act much more slowly. Well-honed GUI test robots drive GUIs at near theoretical maximum speed.
- Humans are much better at observing the GUI, and they react intelligently to what they see.
- Humans extract more meaningful information from a GUI.
In contrast to testing a server, where you usually find enough methods or messages in the server API to synchronize the testing with the server, a GUI application usually lacks these means of synchronization. As a result, a running automated GUI test often consists of one long sequence of race conditions between the automated test and the application under test.
GUI test synchronization boils down to the question: Is the app under test finished with what it's doing? "What it's doing" may be small, like displaying a combo box, or big, like a business transaction. Whatever "it" is, the test must be able to tell whether "it" is finished. Maybe you want to test something while "it" is underway, like verify that the browser icon is rotating while a page is loading. Maybe you want to deliberately click the "Submit" button again in the middle of a transaction to verify that nothing bad happens. But usually, you want to wait until "it" is done.
How to find out whether "it" is done? Ask! Let your test case ask your GUI app. In other words: provide one or several test hooks suitable for your synchronization needs.
The questions to ask depend on the type, platform, and architecture of your application. Here are three questions that worked for me when dealing with a single-threaded Win32 MFC database app:
The first is a question for the OS. The Win32 API provides a function to wait while a process has pending input events:
DWORD WaitForInputIdle(HANDLE hProcess, DWORD dwMilliseconds). Choosing the shortest possible timeout (dwMilliseconds = 1) effectively turns this from a wait-for to a check-if function, so you can explicitly control the waiting loop; for example, to combine several different check functions. Reasoning: If the GUI app has pending input, it's surely not ready for new input.
The second question is: Is the GUI app's message queue empty? I did this with a test hook, in this case a WM_USER message; it could perhaps also be done by calling PeekMessage() in the GUI app's process context via CreateRemoteThread(). Reasoning: If the GUI app still has messages in its queue, it's not yet ready for new input.
The third is more like sending a probe than a question, but again using a test hook. The test framework resets a certain flag in the GUI app (synchronously) and then (asynchronously) posts a WM_USER message into the app's message queue that, upon being processed, sets this flag. Now the test framework checks periodically (and synchronously again) to see whether the flag has been set. Once it has, you know the posted message has been processed. Reasoning: When the posted message (the probe) has been processed, then surely messages and events sent earlier to the GUI app have been processed. Of course, for multi-threaded applications this might be more complex.
These three synchronization techniques resulted in fast and stable test execution, without any test flakiness due to timing issues. All without sleeps, except in the synchronization loop.
Applying this idea to different platforms requires finding the right questions to ask and the right way to ask them. I'd be interested to hear if someone has done something similar, e.g. for an Ajax application. A query into the server to check if any XML responses are pending, perhaps?