This website uses cookies | More info

Behaviour Driven Incident Analysis

Blog postby Andrei Petraru
5 min read

Efficient incident analysis is a major objective of any SaaS provider and behavioral driven techniques might bring great improvements in incident response times.

Why Behaviour Driven Development?

Behaviour Driven Development (BDD) is a branch of Test Driven Development used by domain experts and engineers to describe the requirements as the basis for software tests. BDD tests are on the top of the testing pyramid, they are end-to-end tests that check the behaviour of the entire system. For example, a test might specify that a loan account can be paid off whenever a client wants, or that a revolving loan account can be disbursed to a client if it was previously approved.

In software development, engineers use BDD scenarios to validate system requirements with stakeholders. This is the highest level of abstraction available, it can be understood by all parties involved and it can be automated as User Acceptance Tests. BDD scenarios should be easy to add, understand and mimic the application flows well. Those advantages can be taken into account by software engineers when deciding the best way to reproduce incidents reported by end-users.

Why are our engineers using BDD for incident analysis?

When someone raises an incident, it describes the performed action(s) and the expected result. For example, the revolving loan account with ID 123 can not be paid out because of an error that pops up. The raised incident is then handled directly by an on-duty software engineer. One common approach to determine the cause of the incident is to try reproducing it on a testing system in order to have a clear setup and necessary steps to isolate the bug. This process has some drawbacks:

  • Reproduction steps are difficult to discover because the user that reported the issue might not be a technical person, and the provided details are minimal.
  • It’s difficult to configure the testing system in the same way as the production system.
  • Reproducing the incident needs system time adjustments (including time zone configurations) in some cases that are prone to errors or data corruption.
  • A lot of manual steps are required on a UI application (web, mobile or desktop) that should be performed every time to reproduce the issue.

Instead of chasing for the perfect reproduction steps for an incident, it is easier to use the data that is available on production systems and execute the final user action to reproduce the reported issue. A prerequisite to support such a technique is the possibility of generating a data dump from production systems as fast as possible with minimal required data to reproduce the issue. For example, a relational database dump with only the tables that contains data for account ID 123 is required instead of dumping the entire database that contains millions of accounts.

Software engineers can use the provided database dump on their local systems combined with BDD testing to simply reproduce and investigate the reported incident. There are multiple advantages to this strategy:

  • The issue is reproduced fast on a local system.
  • Debugging of the issue by a software engineer can start immediately, no more lost time on figuring out reproduction steps.
  • BDD automation means that the issue can be reproduced automatically unlimited times until a fix is discovered.
  • Isolation of environment-specific settings because they are configured in the virtual machine that runs the BDD scenarios, local machine settings remain untouched.
  • Running a BDD scenario is a faster and less system-intensive process than running the full-fledged application.

The main focus is on fast debugging of reported incidents. If the cause of the incident is a corner case then it is not recommended to add the top of the testing pyramid tests types, it is better to add unit tests as close to the code that should be changed to fix the reported incident.

Mambu engineers experienced a decrease in the effort needed to start the actual debugging process, from around ten minutes to under two minutes. Moreover, debugging sessions can be executed immediately after a fix is implemented, decreasing the time needed to validate the fix. For complex incidents that require multiple debugging sessions to pinpoint the source of the problem, decreasing the effort needed to reproduce the issue for a single debugging session brings a huge improvement over multiple sessions.

Crafting steps together

Suppose that the database dump is already available and is named “db123.sql”. The setup required to debug the reported incident is:

  1. Define a BDD scenario step that imports the database dump into a local SQL instance. It is recommended to host the database engine into a Docker container.
    Given tenant loaded from db123.sql into container kdev
  2. Define a BDD scenario step that mimics the state of the production system, for example: the time of the system, the user that is logged in and performs the actions.
    Given system time is 2020-01-24
    And user demo is logged in
  3. Define a BDD scenario step that calls the action performed by the user: disburse the loan account with a specified ID:
    When pay out loan account with id 123
  4. Run BDD scenario in debug mode and investigate the source of the problem.

What does the future hold?

By crafting the presented steps into an incident analysis process, our engineers managed to make the painful process of debugging an issue to be smoother, with lower manual interactions with the system. The feedback received from developers adopting behaviour driven debugging was encouraging and some difficult issues were resolved faster.

Behaviour driven approach to incident analysis is a process that was adopted because of the numerous advantages that automation brings into the software development world. Any process that can help developers do their job faster should be adopted by organisations and the return of investment for such automations are high in the long term.

Andrei Petraru

Andrei, Mambu's Lending Revolving & Credit Cards System Owner, has over 9 years of hands-on programming experience with desktop, server-side and mobile applications. Andrei has a passion for developing applications that fit the needs of the customers at the highest level of quality. As a System Owner at Mambu, Andrei delivers high-quality implementations for lending features by using innovative testing strategies and by managing closely the development process within teams.

Andrei Petraru