Vertical Slicing and Product Backlog Management with the Gherkin Syntax

TL;DR: To create a product backlog, “vertically slice” your user stories by grouping similar scenarios. Estimate the work and value of these slices instead of the user story.   

I’ve seen a rise in the demand for “full-stack” developers in the last couple years.

The agile concept of “vertical slicing” made these types of positions very popular.

In a traditional team structure, each person on a team will have knowledge of one layer of an application. When the team attempts to complete some feature, they will have to split the feature into tasks corresponding to layers and then distribute the task to the proper people.

We call this “horizontal slicing”.

If you had a team of “full-stack” developers then you could simply assign a feature to a developer and you could expect them to complete the feature end-to-end with little to no help or coordination.

We call this “vertical slicing”.

This works great in theory, but it has a lot of challenges in practice.

One of these challenges is the exact means of creating high quality product backlog with “vertical slices”.

Enter User Stories

I typically see teams create vertical slices based on “User Stories” and the Gherkin syntax.

The following two code snippets provide examples for two fictitious features: (a) Create Account and (b) Login.

Unfortunately, this sometimes creates very large product backlog items that the team can not deliver quickly.

In particular, I’ve seen this happen when the business dictates that teams can only release  polished and bug-free features.

For example,  it could take a very long time for a team to complete the scenarios “Internal Server Error” and “Wait Too Long” for the feature “Create Account’. Further, those scenarios may not deliver much business value compared to the necessary work.  

In comparison, it could take a very short time for a team to complete the scenario “Valid Account Creation”, and that scenario might have very high business value.

This illustrates that coupling all scenarios together can impede the early and frequent releases we need to create a tight feedback loop between developers and testers or users.

Slice Your Slices

User Stories are not bad, though. We just need a better way to generate vertical slices for our product backlog.

Notice that each user story has multiple scenarios, and that we can conceptually break up each user story into individual scenarios.


Based on this principle, we can create vertical slices by grouping scenarios based on business value.

For example, we could slice our features in the following way.

Feature Vertical Slice Scenario
Create Account Basic Valid Account Creation
Business Rule Violations Duplicate Username
Duplicate Email
User Input Errors Not a Strong Password
Passwords Do Not Match
System Problems Long Wait Time
Internal Server Error
Login Basic Valid Username/Password
Business Rule Violations 1 Invalid Username/Password
Business Rule Violations 2 Too Many Incorrect Attempts
System Problems Long Wait Time
Long Wait Time

Each of these “vertical slices” become product backlog items that we can individually estimate and prioritize.

For example, our fictitious product team could prioritize the “vertical slices” in the following way.

  1. Create Account – Basic
  2. Login – Basic
  3. Login – Business Rule Violations 1
  4. Create Account – User Input Errors
  5. Create Account – Business Rule Violations
  6. Login – Business Rule Violations 2
  7. Create Account – System Problems
  8. Login – System Problems

This allows a more granular approach to creating product backlog items.

As an added benefit, you can leave a “user story” largely undefined so long as you already have its highest priority slices within your product backlog.

This allows you to “groom” your user stories in a “just in time” way.

For example, we created 4 “vertical slices” of the feature “Create Account” in the example above. However, as an alternative, we could simply create the first slice “Create Account – Basic” and not bother with further analysis until someone completes that slice. This could have saved everyone from spending unnecessary time in a grooming session.

I am only providing an illustration, though. Ultimately, the end result depends on the situation and the interaction between team members.


Measuring User Experience with ScalaCheck, Selenium WebDriver, and Six Sigma

I recently stumbled upon an idea that I think can measure defects in user experience, and I want to put it down in writing so I have a starting point for further research.  

The germ of this idea took root in my mind after my last blog post.

In my last blog post, I applied traditional software engineering principles to developing javascript SPAs, and I used automated testing of user stories as an example.

I also happen to have a project where I use scalacheck to automate generative tests for machine learning algorithms and data pipelining architectures.

Further, I happen to have some experience with six sigma from my days working as a defense contractor.

By combining the different disciplines of (a) user story mapping, (b) generative testing, and (c) six sigma, I believe that we can measure the “defects of user experience” inherent in any “system under test”.

Let’s discuss each discipline in turn.

User Story Mapping

User story mapping is an approach to requirements gathering that uses concrete examples of “real world” scenarios to avoid ambiguity.

Each scenario clearly defines the context of the system and how the system should work in a given case, and ideally, describe something that we can easily test with an automated testing framework.

For example, here is a sample “create account” user story

One of the limitations of testing user stories is that they cannot give you a measure of the correctness of your application. This is because to “prove” program correctness with programatic tests we would need to check every single path through our program. 

However, to be fair, the goal of user stories is to gather requirements and provide an “objective” measurement system by which developers, product, and qa can agree to in advance. 

Nevertheless, we still need a means of providing some measure of “program correctness”.

Enter Generative Testing.

Generative Tests

Generative testing tests programs using randomly generated data. This enables you to provide a probabilistic measurements of program correctness. However, this assumes that you know how to setup an experimental design that you can use to measure the accuracy of your program.

For example, the scalacheck documentation provides the following snippet of code that tests the java string class.

If you run scalacheck with StringSpecification as input then scalacheck would randomly generate strings and check whether the properties that you defined in StringSpecification are true.

Here is the result that scalacheck would provide if you ran it with StringSpecification as input.

We can see that scalacheck successfully ran 400 tests against StringSpecification.

Let’s do a little statistical analysis to figure out what this means about our the string class.

According to one view of statistics, every phenomenon has a “true” probability distribution which is really just an abstract mathematical formula, and we use data collection methods to estimate the parameters of the “true” distribution.

We will assume this is the case for this discussion.

Suppose that we do not know anything the String class. Under this assumption, the maximum entropy principle dictates that we assign a 1 to 1 odds to every test that scalacheck runs.

That basically means that we should treat every individual test like a coin flip. This is known as a Bernoulli trial.

Now, some really smart guy named Wassily Hoeffding figured out a formula that we could use to bound the accuracy and precision of an experiment based exclusively on the number of trials. We, unsurprisingly, call it Hoeffding’s inequality.

I will not bother explaining the math. I’m sure that I’d do a horrible job at it.

It would also be boring.

I will instead give a basic breakdown of how the number of trials relate to the accuracy and precision of your experiment.

number of trials margin of error confidence interval
80 10% 95%
115 10% 99%
320 5% 95%
460 5% 99%
2560 2.5% 95%
3680 2.5% 99%
8000 1% 95%
11500 1% 99%

The margin of error measures the accuracy of our experiment and the confidence interval measures the precision of our experiment.

Consider the margin of error as a measurement of the experimental results reliability, and the confidence interval as a measurement of the experimental method’s reliability.

For example, if I had an experiment that used 80 trials and I obtained a point estimate of 50% then this would mean that the “real” value is somewhere between 40% and 60% and that the experiment itself would be correct 95 times out of 100.

In other words, 5% of the time an experiment like this one would generate completely bogus numbers.

Now that I have explained that, let us apply this concept to our StringSpecification object. Based on the fact that we had 400 successful runs we can objectively say that the String class’s “true” accuracy is roughly between 95% – 100%, and that there is only a 1% chance that I am completely wrong.

Easy. Right?

I totally understand if you didn’t understand a single thing of what I just said. Are you still reading?

You might be able to set-up an experimental design and measure the results if you are a statistician. However, it is probably beyond the ability of most people.

It would be nice if there was some general set of methods that we could apply in a cookie cutter way, but still have robust results.

Enter Six Sigma.

Six Sigma

Officially, Six Sigma is a set of techniques and tools for process improvement; so, I do not believe that it is generally applicable to software engineering. However, there are a few six sigma techniques that I think are useful.

For example, we could probably use DPMO to estimate how often out system would create a bad user experience (this is analogous to creating a bad part in a manufacturing process).

DPMO stands for Defects per million opportunities, and it is defined by the formula

Let’s suppose that we decided to use scalacheck to test user stories with randomly generated values.

This would immediately open up the prospect of measuring “user experience” using DPMO.

For example, let’s consider the scenario “Valid Account Information” for the feature “Create Account”.

According to the scenario, there are two things that would make this test fail:

  • not seeing the message “Account Created”
  • not seeing the link to the login screen

Suppose that we ran 200 randomized tests based on this user story, and had 5 instances where we did not see the message “Account Created” and 2 instances where we did not see the link to the login screen.

This means we have 7 defects out of 2 opportunities from 200 samples. Therefore, DMPO = (7 / (200*2)) = 0.0175 * 1,000,000 = 17,500, which implies that if we left our system in its current state then we can expect to see 17,500 defects for every 1,000,000 created accounts.

Notice how much easier the math is compared to the math for pure generative testing.

These are formulas and techniques that an average person could understand and make decisions from.

That is huge.


This is just me thinking out loud and exploring a crazy idea. However, my preliminary research suggests that this is very applicable. I’ve gone through a few other six sigma tools and techniques and the math seems very easy to apply toward the generative testing of user stories.

I did not provide a concrete example of how you would use scalacheck to generatively test user stories because I didn’t want it to distract from the general concept. In a future blog post, I will use an example scenario to walk through a step-by-step process of how it could work in practice.

Stay tuned.

How to Use ES6 Generators to Simplify Your Node.js Selenium Webdriver Scripts

For a couple weeks, I have been using cucumber.js and selenium webdriver for acceptance testing.

For the most part, I’ve been very happy with tools. However, the pervasent use of promises in selenium webdriver annoys me.

For example, consider the following script that opens up a login page, and tests that you can successfully login.

I personally don’t like the 2 levels of indirection from using promises. 

Further, the number of promises will scale with each page you visit.

This leads to very painful reading.

Just recently, I found a way to increase the understandability of selenium webdriver based scripts using es6 generators.

Since node.js does not support es6 yet, you will have to use the babel-node executable to make it work. You can installing babel-node with the following script from the command line:

The following code does the exact same thing as the previous script, but instead uses es6.

The code uses lazy evaluation to stop and continue execution of the script. If you are familiar with functional languages like scala, clojure, or haskell then this type of approach should be second nature to you.

I will give you a high level overview of what is happening:

Step 1: We define a generator function called main.

Step 2: In main, we explicitly tell it to stop execution at two locations. We signal to the javascript engine to stop execution using the keyword yield.

Step 3: We instantiate main and assign the instance to the variable m. The javascript engine will execute the function and stop execution at the first yield statement.

However, after it stops execution, it will call the function visitLoginPage.

Step 4: visitLoginPage will use the variable m to get a reference to the main function, and either tell it to continue or to throw and exception.

Step 5: The main function will continue executing until it reaches the second yield statement.

The javascript engine will stop execution of the function, and it will call the function getUrl.

Step 6: getUrl will continue execution and return a value that we store in currentUrl in the main function.

Now we can print the url to the console.


I’m sure that there are ways alternative ways to simplify selenium webdriver scripts.

I’m not saying that this is the best way. It is just something that I personally like, and I hope that it helps you.

How Frameworks Shackle You, and How to Break Free

I sometimes hate software frameworks. More precisely, I hate their rigidity, and cookie cutter systems of mass production.

Personally, when someone tells me about a cool new feature of a framework, what I really hear is “look at the shine on those handcuffs.”

I’ve defended this position on many occasions. However, I really want to put it down on record so that I can simply point people at this blog post the next time they asks me to explain myself.

Why Use a Framework?

Frameworks are awesome things.

I do not dispute that.

Just from the top of my head, I can list the following reasons why you should use a framework

  1. A framework abstracts low level details.
  2. The community support is invaluable.
  3. The out of the box features enable you not to reinvent the wheel.
  4. A framework usually employs design patterns and “best practices” that enables team development  

I’m sure that you can add even better reasons to this list.

So why would I not be such a big fan?

What Goes Wrong with Frameworks?


Just to reinforce the point, let me say that again: VENDOR LOCK-IN.

There is a natural power asymmetry between the designer/creators of a framework and the users of that framework.

Typically, the users of the framework are the slaves to the framework, and the framework designers are the masters. It doesn’t matter if it is designed by some open source community or large corporation.

In fact, this power asymmetry is why vendors and consultants can make tons of money on “free” software: you are really paying them for their specialized knowledge.

Once you enter into a vendor lock-in, the vendor, consulting company, or consultant can charge you any amount of money they want. Further, they can continue to increase their prices at astronomical rates, and you will have no choice but to pay it.

Further, the features of your application become more dependent on the framework as the system grows larger. This has the added effect of increasing the switching cost should you ever want to move to another framework.

I’ll use a simple thought experiment to demonstrate how and why this happens.

Define a module as any kind of organizational grouping of code. It could be a function, class, package, or namespace.

Suppose that you have two modules: module A and module B. Suppose that module B has features that module A needs; so, we make module A use module B.

In this case we can say that module A depends on module B, and we can visualize that dependency with the following picture.


Suppose that we introduce another module C, and we make module B use module C. This makes module B depend on module C, and consequently makes module A depend on module C.


Suppose that I created some module D that uses module A. That would make module D depend on module A, module B, and module C.


This demonstrates that dependencies are transitive, and every new module we add to the system will progressively make the transitive dependencies worse.

“What’s so bad about that?” you might ask.

Suppose that we make a modification to module C. We could inadvertently break module D, module A, and module B since they depend on module C.

Now, you might argue that a good test suite would prevent that … and you would be right.

However, consider the amount of work to maintain your test suite.

If you changed module C then you would likely need to add test cases for it, and since dependencies are transitive you would likely need to alter/add test cases for all the dependent modules.

That is an exponential growth of complexity; so, good luck with that. (See my post “Testing software is (computationally) hard” for an explanation). 

It gets worse, though.

The transitive dependencies make it nearly impossible to reuse your modules.

Suppose you wanted to reuse module A. Well, you would also need to reuse module B and module C since module A depends on them. However, what if module B and module C don’t do anything that you really need or even worse do something that impedes you.

You have forced a situation where you can’t use the valuable parts of the code without also using the worthless parts of the code.

When faced with this situation, it is often easier to create an entirely new module with the same features of module A. 

In my experience, this happens more often than not.

There is a module in this chain of dependencies that does not suffer this problem: module C.

Module C does not depend on anything; so, it is very easy to reuse. Since it is so easy to reuse, everyone has the incentive to reuse it. Eventually you will have a system that looks like this.


Guess what that center module is: THE FRAMEWORK.


This is a classic example of code immobility.

Code mobility is a software metric that measures the difficulty in moving code. 

Immobile code signals that the architecture of the system doesn’t support decoupling.

You can argue that the framework has so much reusable components that we ought to couple directly to it.

Ultimately, we value code reuse because we want to save time and resource, and reduce redundancy by taking advantage of already existing components.

Isn’t that exactly what the framework gave us?

Well, yes and no.

It depends on what you want to reuse.

Are you trying to reuse infrastructure code? By all means, please use a framework for that.

Are you trying to reuse domain specific business logic? You probably don’t want to couple your business logic directly to a framework.

An example of coupling your business logic directly to a framework is your typical MVC-ORM design. I’ve already explained this in my blog post “MongoDB, MySQL, and ORMS: when and where to use them”; so, I will not elaborate on it, here.

The Best of Both Worlds  

So it seems that we are at an impasse: we want to reuse infrastructure code from a framework, but we also want our business logic to be independent of it.

Can’t we have it both?

Actually, we can.

The direction of the arrow in our dependency graph makes our code immobile.

For example, if module A depends on module B then we have to use module B every time we use module A.

However, if we inverted the dependencies such that module A and module B both depended on a common module — call it module C — then we could reuse module C since it has no dependencies. This makes module C a reusable component; so, this is where you would place your reusable code.

coupling The Dependency Inversion Principle exists to enable this pattern.

The typical repository pattern is the perfect example of how inverting dependencies can decouple your business logic from a framework. I have already talked about it in my post “Contract Tests: or, How I Learned To Stop Worrying and Love the Liskov Substitution Principle”; so, I won’t elaborate on it, here.

So how would you structure your application to support the inverted dependencies?

Well, there are multiple ways you can legitimately do it, but I use the strategy of placing the bulk of the applications code in external libraries.

Under this model, the framework’s views and controllers  only serve as a delivery mechanism for the application. 

Consequently, we can easily move our code to another framework because the components do not depend on the framework in any way.

As an added benefit, we can test the bulk of the application independent of the framework. 

In fact, this is your typical Component Oriented Architecture.

For example, standard Component Oriented Architecture will break large C++ applications into multiple dll files, large JAVA applications into multiple jar files, or large .NET applications into multiple assemblies.

There exists rules about how you structure these packages, however.

You could argue that we would spend too much time with up front design when designing this type of system.

I agree that there is some up-front cost, but that cost is easily off-set by the time savings on maintenance, running tests, adding features, and anything else that we want to do to the system in the future.

We should view writing mobile code as an investment that will pay big dividends over time.

I plan to create a very simple demo application that showcases this design in the near future. Stay tuned.

How to Unit Test Your D3 Applications

TL;DR: you can get the entire code at my repository.

I recently found a way to unit test d3 code, and it has transformed my approach to writing d3 applications.

I want to share this with the d3 community because they typically don’t emphasis unit testing.

Consider the following d3 code.

This code does the following:

  • append a red circle to a canvas
  • change the color of the circle to green when the user clicks on it.

Most of the time, we rely on our eyeballs to verify the correctness of d3 applications.

For example, this is what the canvas looks like when we D3 renders the canvas.


and D3 will render the following when the user clicks on the circle


In this case, we can manually verify that the code works.

Unfortunately, we can’t always rely on human eyeball testing.

However, with the jsdom module, we can programmatically verify whether the browser updated the DOM properly: we can (a) set expectations for our DOM, (b) write d3 code to fulfill our expectations, and (c) automatically run the tests with gulp.

First let’s setup our project.

Setup Project

Create a directory for our project, and initialize it as an npm package

Create a directory for our source code and our tests

Install the necessary npm modules

Create gulpfile.js

Create the d3 Application

Put the following code under src/Circle.js

The Circle module takes two arguments. The first argument is the dom element that we will append the svg canvas to. The second argument is the id we will give circle. We will need this id for testing.

Testing the d3 Application

Create the file test/Circle.spec.js with the following contents

So far we have not added any tests to the test suite. This is only boilerplate code that we will need to write our tests.

The function d3UnitTest encapsulates the jsdom environment for us. We will use this function in each test to alter the dom and test the dom afterwords.

Let’s create our first test.

Append the first test to our test suite.

In this test, we simply create check if the id “circleId” exists within the DOM. We assume that if it exists then the circle exists.

Notice the done parameter in our function. We do this to trigger an asynchronous test. jsdom will run asynchronously. We have to provide a callback to it; so, we manually have to call done() to inform jasmine when we are ready for it to test.

Append the second test to our test suite.

Append the third test to our test suite

This code simulates a click on the circle. You have to know a little bit about the internals of d3 to make it work. In this case, we have to pass the circle object itself, and the datum associated with the circle to the click function.

FYI, I couldn’t find a way to get jsdom to simulate user behavior; so, you have to understand the d3 internals if you want to test user behavior. If you know how to use jsdom to simulate user behavior then please let me know how you do it because this method is very error prone. 

Run the Tests

When we run gulp from the command line we should see the following


I understand that I gave a pretty simple and contrived example. However, the principles that I demonstrated apply to more sophisticated situations. I can vouch for this because I have already written tests for very complicated behaviors.

There is a downside to this approach. You have to use node.js and write your d3 code as node modules. Some people might not like to have this dependency. Personally, I don’t find it very inconvenient. In fact, it has forced me to write very module d3 code.

It is also extremely easy to deploy my d3 code using browserify with this method.
Your milage may vary with this approach. I’m sure that there are ways you can achieve the same result with browser based tools. For example, I’m pretty sure that you could do something similar with protractor.

Contract Tests or: How I Learned to Stop Worrying and Love The Liskov Substitution Principle

We software developers often regret past design decisions because we get stuck with their consequences. As an industry, we face this challenge so much that we have a name for it: accidental complexity.

Developers introduce “accidental complexity” when they design interfaces or system routines that unnecessarily impedes future development.

For example, I might decide to use a database to persist application state, but later I might realize that using a database introduces scalability problems. However, by the time I realize this, all my business logic depends on the database; so, I can’t easily change the persistence mechanism because I coupled it with the business logic.

In this hypothetical example, I only care about persisting application state, but I don’t necessarily care how I persist it or where I persist it. I could theoretically use any persistence mechanism. However, I “accidentally” coupled myself to a database, and that cause the “accidental complexity”.

I see this frequently happen with applications that use Object Relational Mappers (ORMs). Consider the following code snippet:

The Person class has special annotations from the Doctrine ORM framework. It allows me to “automatically” persist information to a database based on the annotation values. This significantly simplifies the persistence logic.

For example, if I wanted to save a new Person to a database, I could do it quite easily with the following code snippet.

However, as a consequence of our “simplification”, I have also coupled two separate responsibilities: (a) the business logic, and (b) the persistence logic.

Unfortunately, any complex situation will force us to make difficult trade-offs, and sometimes we don’t always have the information we need to make proper decisions. This situation can force us to make early decisions that unnecessarily introduce “accidental complexity”.

Fortunately, we have a tool that can let us defer implementation details: the Abstract Data Type (ADT).

Abstraction to the Rescue

An ADT provides me the means to separate “what” a module does from “how” a module does it — we define a module as some “useful” organization of code.

Object oriented programming languages typically use interfaces and classes to implement the concept of an ADT.

For example, suppose that I defined a Person class in PHP

I could use an interface to define a PersonRepository to signal to the developer that this module will (a) returns a Person object from persistent storage, and (b) save a Person object to persistant storage. However, this interface only signals the “what”; not the “how”.

Suppose that I wanted to use a MySql database to persist information. I could do this with a MySqlPersonRepository class that implements the PersonRepository interface.

This class defines (a) “how” to find a Person from a database, and (b) “how” to save a Person to a database.

If I wanted to change the implementation to MongoDb then I could potentially use the following class.

This would enable us to write code like the following.

Notice how I don’t make any reference to a particular style of persistence in the code above. This “separation of concerns” allows me to switch implementations at runtime by changing the definition of AppFactory::getRepositoryFactory.

To use a MySql database, I could use the following class definition.

and if we wanted to use a MongoDb datastore then we could use the following class definition

By simply changing one line of code, I can change how the entire application persists information.

I HAVE THE POWER … of the Liskov Substitution Principle

Recall, the original thought experiment: I originally used a MySql database to persist application state, but later needed to use MongoDb. However, I could not easily move the persistence algorithms because I coupled the business logic to the persistence mechanism (i.e. ORM).

Mixing the two concerns made it hard to change persistence mechanism because it also required changing the business logic. However, when I separated the business logic from the persistence mechanism, I could make independent design decisions based on my needs.

The power to switch implementations comes from the “Liskov Substitution Principle”.

Informally, the Liskov Substitution Principle states that if I bind a variable X to type T then I can substitute any subtype S of T to the variable X.

In the example above, I had a type PersonRepository, and two subtypes (a) MySqlPersonRepository, and (b) MongoDbPersonRepository. The Liskov Substitution Principle states that I should be able to substitute either subtype for a variable PersonRepository.

We call this “behavioral subtyping”. This type of subtyping differs from traditional subtyping because behavioral subtyping demands that the runtime behaviors of subtypes behave in a consistent way to the parent type.

Everybody (and Everything) Lies

Just because a piece of code claims to do something, does not imply that it actually does do it. When dealing with real implementations of an ADT, we need to consider that our implementations could lie.

For example, I could accidentally forgot to save the id of the Person object properly in the MongoDB implementation; so, while I intended to follow the “Liskov Substitution Principle”, my execution failed to implement it properly.

Unfortunately, we cannot rely on the compiler to catch these errors.

We need a way to test the runtime behaviors of classes that implement interfaces. This will verify that we at least have some partial correctness to our application.

We call these “contract tests”.

Trust But Verify

Assume that we wanted to place some behavioral restrictions on the interface PersonRepository. We could design a special class with the responsibility of testing those rules.

Consider the following class

Notice how we use the abstract function “getPersonRepository” in each test. We can defer the implementation of our PersonRepository to some subclass of PersonRepositoryContractTest, and execute our tests on the subclass that implements PersonRepositoryContractTest.

For example, we could test the functionality of a MySql implementation using the following code:

and if we wanted to test a MongoDB implementation then we could use the following code:

This shows that we can reuse all the tests we wrote. Now we can easily test an arbitrary number of implementations.


Of course, in practice, there are many different ways of implementing contract tests; so, you may not want to use this particular method. I only want you to take away the fact that not only can you implement contract tests, but that you can do it in a simple and natural way.

A Tale of Two Approaches: Inside-Out vs Outside-In Testing

I recently had a conversation with a test automation engineer about the virtues of outside-in design vs. inside-out design. I prefer outside-in design while he prefers inside-out design.

I want to tell this story because it clearly demonstrates the current culture war for the heart and minds of developers with respect for testing/design methodologies.

Our Objective

Our disagreement started when the test engineer brought up automated testing. While we both agreed on the virtues of automated testing, we disagreed on the specifics of how we should approach automated testing.

He uses the traditional TDD approach of red-green-refactor for unit tests before writing integration tests (i.e. bottom-up), while I use the approach of writing behavioral tests first, integration tests with stubs second, and unit tests last (i.e. top-down).

Neither of us backed down on this; so, we spared with different types of argument and counter-arguments.

Our Conflict

Our disagreement seemed to center around different value systems: the test engineer valued “program correctness”, while I valued “interface design” feedback.

The test engineer argued that you can’t prove program correctness when you start with an integration test. According to him, you have to “prove” the correctness of isolated modules before you can “prove” the correctness of their integration.

His statement of “proving” program correctness shocked me (especially since it came from an automation test engineer). I mentioned that you can’t actually “prove” program correctness with integration tests, and that you can only use them to falsify program correctness. Further, even if you could “prove” program correctness with integration tests then it would be an NP-Hard problem.

He tried to counter me with several arguments, but I felt that they equated to “grasping at straws”.

Eventually, I told him that I don’t use outside-in testing to “prove” program correctness, anyway: I use it as a means to design interfaces.

By definition, an interface constrains the design of a class.

I said that you can not effectively determine the constraints of an interface working from the inside-out because subclasses depend on the interface (not the other way around). Consequently, most people “discover” the constraints of an interface by constantly refactoring.

I argued that you can more effectively “discover” the constraints of an interface by working from the outside-in. How I want to call the methods of an interface determines the types of constraints I place on the interface. The design feedback I get for interface design working from the outside-in has incredible valuable for me.

I drew the following diagram to help illustrate this point.


The solid lines represent the source code dependencies, while the dashed line represents the runtime dependencies. This illustrates that at compile time both the caller and the callee have a dependency on the interface, but when the program runs the caller has a dependency on the callee.

Object Oriented polymorphism gets its power from this asymmetry between the source code dependency and the runtime dependency. For example, when the compiler runs, it will check that the callee obeys the interface properly. However, when the application actually runs, the runtime engine will swap out the interface for the actual concrete class.

A draw decision

In the end, neither of us could convince the other of the justness of our respective causes. We simply agreed to disagree, and we each went our separate ways.

Ultimately, everyone has to make trade-offs between competing concerns. However, given the nature of object orientation, it seems to me that we should value the design of interfaces over implementation details. Consequently, we should choose a methodology that streamlines that process. Otherwise, why even use an OO language.