Other Posts

Why I Failed to Use Calabash for Mobile Testing


“When I give up, I write it down. Someday I will pick myself up from the fall.”


I admit that I failed to use Calabash for mobile testing. It was a big failure but it was beautiful, I call it reckless passion. Although I learned a lot from it, I’ve never shared what I learned in public. After that, I changed my focus and gave up on traditional UI automation.

This week, I had some small talks with our awesome developers about test automation. The idea was to implement special UI automated tests, in order to check new product release automatically instead of sanity checking manually by the whole team. I suddenly remembered that I forgot to share my failure with them. I’d like to write down in this post, my recklessness and something I learned. In case the team would like to move further on test automation, we could proceed with caution.


My story begins back in September 2015, a sunny day. At that moment, I was so confident that our mobile apps are fully mature for automating tests. That day I started a draft plan of test automation. In the following week, I planned everything, I finished my research to find a suitable framework, I started my experiment and even I announced my plan to the team in a non-meeting Tuesday…

Chapter I: huge ambitions but no skills

I successfully sold my big picture of test automation future too without any oppositions. With UI test automation, I had my dreams:

  • A dream of brilliant and efficient release process

  • A dream of full user behavior coverage

  • A dream of no painful manual regression testing

My initial plan is to follow the steps below:

  • Step 1: Build the main structure of test automation framework and create general functions

  • Step 2: Create test scenarios to replace all manual sanity checks of app core features

  • Step 3: Add new test scenarios to check acceptance criteria of user stories, to verify bug fixes

To start the first step, I choose Calabash[1] as the framework. The reasons why I pick it because it is:

  • Cross-platform: Android and iOS

  • BDD friendly

  • Pure UI level, no need to dive deep into the code

For me it looks really promising, even I had little knowledge of this tool and scripting using Ruby. My strong passion supports me to start the implementation of Calabash on Android side. It took me at least 2 sprints to make my hands very dirty. I did my best to script, debug, hack… As the final outputs, I created tons of functions to bind general actions with Gherkin syntax.

Now it’s time to move to the 2nd step, but soon in reality, I found it’s hard to simulate user’s general actions, for example: a simple action like scroll down on a list. How many times? How far can it scroll, stop at the end or somewhere randomly?

Then I have to go back and forth between step 1 and step 2, when creating new test scenarios, meanwhile updating general action functions… I felt troublesome.

Next time I will ask myself first…

  • During planning, “What’s the purpose of automation?”, “What’s the limitation of automation?” and “Where to start to automate checks?”[2]

  • When choosing a tool or framework, “What are the pros and the cons of the tool?”, “Does the tool fit all my needs?” and “Do I have the knowledge to master the tool?”

  • Before starting implementation, What’s the testability of the product? and “Do I have a clear architecture and design of test automation?”

Chapter II: unstable execution and unreliable result

Finally, I replaced almost all sanity checks by automated checks. I planned to announce the results to the team proudly but actually I didn’t. Because after I tried to run tests several times, I found that test executions are not stable and test results are not reliable.

Unstable test execution

I created some Android emulators where I launch tests, but the emulator is usually laggy. The frequent timeout of some test steps often destroyed the whole test execution. Then I have to launch tests on the real devices. Timeout issues got fewer. But after one snapshot version, executions were blocked by some small UI changes. After adapted all UI changes, I thought the execution should be smooth but I got the opposite. Due to the differences of the test environment, some test cases are only executable in a production environment, I have to add the mechanism to detect and switch environments.

Unreliable test result

The worst thing in test automation is Schrödinger test, aka flaky test. Like ghosts haunting around, it’s never sure that the result is failing or passing after a running. I got some flaky tests due to the timing of actions, too many preconditions to fulfill, the changeable input data in production environment, the bugs in framework…

A huge test case will also cause unreliable test results. Imagine a 20-step test case, if it’s failed in the 5th step, the hidden failure in the rest part is not able to be detected without continuing the execution. On the surface, it’s only one failure. But it’s possible that the failures of next 15 steps are covered underneath. I separated some huge test cases to small test cases. But in some cases, one step dependents on the app state created by another step. It took the effort to remove dependencies.

Next time I will ask myself first…

  • Do I have a controllable and stable test environment?

  • Which test environment for which test case?

  • Do I have a process to handle flaky tests? Retry, quarantine or remove?[3]

  • Is a failed test case clear enough to identify bugs?

Chapter III: one-man show and demotivated maintenance

3 months later, I didn’t go further in the 3rd step. Because my confidence was torn apart by unreliable results and my passion was extinguished by daily maintenance tasks. I told myself that it’s time to give up. Silently I stopped putting my effort to write new tests and maintain test suite.

2 sprints after, I came back from my vacation. I doubted whether tests were still executable. Then I accessed to the old Calabash repository carefully and executed scripts curiously, as expected, not a single test can start to run.

Since then, no one, including me, ever mentioned about Calabash test automation. I realized my biggest mistake: I should let the team involve in the entire automation project. So it wouldn’t end up now as a ridiculous one-man show.

Next time I will ask myself first…

  • Am I the only one who maintain the whole automation project?

  • Do I manage to update test suite immediately after code changes?

  • How do I measure the test automation whether it’s helpful or not?


The sad story is over. It triggers the change of test automation definition in my mind. Now I see test automation as a tool[4] and a tester augmentation[5]. Following this definition, I made a lot of scripts to help me explore products efficiently.

But somewhere in my mind, I still have my assent of the traditional definition of test automation; Somewhere in my heart, I still have my passion for creating a perfect UI automation for mobile testing. Next time when I restart, I will ask myself the above questions, to explore another path towards the successful “UI automation”.

[1] Calabash homepage
[2] Oleksii Burdin: Where to start to automate your checks?
[3] Walmyr Filho: The importance of dealing with flaky tests
[4] Micheal Bolton: A Context-Driven Approach to Automation in Testing
[5] Josh Meieter: Tester Augmentation, Not Test Automation