Posted by & filed under XP.

At Unruly we have a quarterly whole-company hack day that we call Oneruly day. Hackdays allow the whole company to focus on one thing for a day.

Unlike our 20% time, which is time for individuals to work on what is most important to them, Hackdays are time for everyone to rally around a common goal.

In product development we run this in true Unruly style: avoiding rules or control. We do have a lightweight process that seems to work well for self-organisation in a group this size (~50 people).

Self Organisation

During the week in the run up to Oneruly day we set up a whiteboard in the middle of the office with the topic written on. Anyone with an idea of something related to that topic that we could work on writes it on an oversized postit note and pops it up on the board.

On the day itself there’s usually a last minute flurry of ideas added to the board, and the whole product development team (some 50-60 people) all gather around. We go through the ideas one by one. The proposer pitches their idea for around 60 seconds, explaining why it’s important/interesting, and why others might want to work on it.

Once we’ve heard the pitches, the proposers take their oversized postit and spread out, everyone else goes and joins one of the people with postits—forming small teams aligned around an interest in a topic.

Each group then finds a desk/workstation to use and starts discussing what they want to achieve & the best way of going about it.

This is facilitated by our pair-programming friendly office space—having workstations that are all set up the same, with large desks and plenty of space for groups to gather round, in any part of the office.

Usually each group ends up self-organising into either a large mob (multiple developers all working on the same thing, at the same time, on the same workstation), or a couple of smaller pairs or mobs (depending on the tasks at hand). Sometimes people will decide that their group has too many people, or they’re not adding value and and go and help another group instead.

Teams will often split up to investigate different things, explore different options or tackle sub-problems, and then come back together later.

Inspiring Results

We usually wrap up with a show and tell for the last hour of the day. It’s pretty inspiring to see the results from a very few hours work…

  • There’s great ideas that are commercially strong and genuinely support the goal.
  • We get improvements all the way from idea to production.
  • People step up and on leadership roles, regardless of their seniority
  • We learn things in areas that we wouldn’t normally explore.
  • People work effectively in teams that didn’t exist until that morning.
  • There’s a variety of activities from lightweight lean-startup style experiments to improving sustainability of existing systems.

All this despite the lack of any top down direction other than choosing the high level theme for the day.

What can we learn?

Seeing such a large group consistently self-organise to achieve valuable outcomes in a short space of time begets the question: How much are our normally more heavyweight processes and decision-making stifling excellence rather than improving outcomes?

What rules (real or imaginary) could we get rid of? What would happen if we did?

Hackdays are a great opportunity to run a timeboxed experiment of a completely different way of working.

Posted by & filed under Java, Testing, XP.

End to end automated tests written with Webdriver have a reputation for being slow, unreliable (failing for spurious reasons), and brittle (breaking with any change).

So much so that many recommend not using them. They can become a maintenance burden, making it harder, rather than easier, to make changes to the user interface.

However, these tests can be invaluable. They can catch critical bugs before they hit production. They can identify browser-specific bugs, are implementation-agnostic, can check invariants, be used for visual approval tests, can even be used for production monitoring, not to mention retrofitting safety to poorly tested systems.

Despite their reputation, these tests can be kept reliable, fast, and maintainable. There’s no “one weird trick”—it’s mostly a matter of applying the same good practices and discipline that we ought to be applying to any automated tests; end to end tests really turn up the pain from doing it wrong.

Avoid

When I polled a few people what their top tip for writing reliable, fast, and maintainable webdriver tests was, the most common suggestion was, simply…

“Don’t”

They are indeed hard to write well, they are indeed expensive to maintain, there are easier, better testing tools for checking behaviour.

So don’t use them if you don’t need them. They are easier to retrofit later if you change your mind than most other forms of automated testing.

Certainly they don’t replace other types of automated tests. Nor can they be a replacement for manual exploratory testing.

Often subcutaneous testing (testing just under the UI layer) can be sufficient to cover important behaviours—if you are disciplined about keeping logic out of your UI.

Unfortunately, that’s particularly hard with web tech, where the presentation itself is often complex enough to need testing; while behaviour can work perfectly in one browser or in a simulated environment, it can still fail spectacularly in just one browser.

We often see the pain of maintaining end to end tests, but there’s also lots of value…

Tackling Risk

I work in adtech, where the real user experience in real browsers is really, really important.

This might sound like an odd statement, who likes ads? who would mind if they didn’t work?

I’m sure you can remember a poor user experience with an ad. Perhaps it popped up in front of the content you were trying to read, perhaps it blasted sound in your ears and you had to go hunting through your tabs to find the culprit.

I’m guessing these experiences didn’t endear you to the brand that was advertising? User experience is important, politeness is important. Impolite ads come not only from intentionally obnoxious advertisers, but from bugs, and even browser specific bugs.

We also have an elevated risk, we’re running code out in the wild, on publisher pages, where it interacts with lots of other people’s code. There’s lots that could go wrong. We have a heavy responsibility to avoid any possibility of breaking publisher pages.

However simple our UI, we couldn’t take the risk of not testing it.

Extra Value

If you have invested in end to end tests, there’s lots of opportunities for extracting extra value from them, beyond the obvious.

Multi-device

Once a test has been written, that same test case can be run same across multiple browsers & devices. Checking that behaviour has at least some semblance of working on different devices can be incredibly valuable to increase confidence in changes.

Who has time and money to manually test every tiny change with a plethora of devices? Even if you did, how slow would it make your team, do you want to measure your release lead time in minutes or months?

Approval Tests

Webdriver tests don’t actually check that a user is able to complete an action—they check whether a robot can; they won’t always pick up on visual defects that make a feature unusable.

Approval Tests can help here. Approval tests flag a change in a visual way that a person can quickly evaluate to either approve or reject the change.

We can store a known-good screenshot of how a feature should look, and then automatically compare it to a screenshot generated by a testcase. If they differ (beyond agreed tolerances) flag the change to a somebody to review.

Webdriver can take screenshots, and can be easily integrated with various approval tests tools & services. If you have an existing suite of webdriver tests, using a selected few for visual approval tests can significantly reduce risk.

Approval tests are deliberately brittle, you don’t want many of them. They require someone to manually intervene every time there’s a change. However, they can really help spot unexpected changes.

Legacy

Not everyone is fortunate enough to get to work with systems with high levels of automated test coverage. For those who aren’t, tests that drive the UI provide a mechanism to adding some automated test coverage without invasive changes to the application to introduce seams for testing.

Even a few smoke end to end tests for key workflows can significantly increase a team’s confidence to make changes. Lots of diagnosis time can be saved if breakages are identified close to the point in time at which they were introduced.

Invariants

With a suite of end to end tests, one can check invariants—things that should be true in every single testcase; including things that would be hard to test in other ways. These can be asserted in the test suite or with hooks like junit rules, without modifying each testcase.

Sound

People understandably really don’t like it when they get unsolicited sound while they’re browsing.

By capturing audio to the sound device during every webdriver test execution we are able to assert that we don’t have any features that unintentionally trigger sound.

Security

Preexisting test suites can be run with a proxy attached to the browser, such as OWASP ZAP and the recordings from the proxy can be used to check for common security vulnerabilities.

Download Size

Rules such as “no page may be over 1MB in total size” can be added as assertions across every test.

Implementation Independent

We have webdriver tests that have survived across multiple implementations & technology stacks.

Desired behaviours often remain the same even when the underlying technology changes.

Webdriver tests are agnostic to the technology used for implementation, and can live longer as a result.

They can also provide confidence that behaviour is unchanged during a migration to a new technology stack. They support incremental migration with the strangler pattern or similar techniques.

Production Monitoring

End to end tests usually check behaviour that should exist and work in production. We usually run these tests in an isolated environment for feedback pre-production.

However, it’s possible to run the same test suites against the production instances of applications and check that the behaviour works there. Often just by changing the URL your tests point to.

This unlocks extra value—there’s so many reasons that features may not work as expected in production, regardless of whether your application is “up”.

It does require you to find a way to isolate your test data in production, to avoid your tests polluting your production environment.

Inventory Cost

Browser based tests can be made reasonably reliable and kept reasonably fast, but they do have a significant inventory cost. The more tests we have, the more time we need to invest in keeping them reliable and fast.

A 0.01% failure rate might be tolerable with 10 tests but probably isn’t with 1,000 tests.

Testcases that take 5 seconds each to run might be tolerable with 10 tests, but probably aren’t with 1,000 tests (unless they parallelise really well).

There’s also a maintenance cost to keeping the tests working as you change your application. It takes effort to write your tests such that they don’t break with minor UI changes.

The cost of tests can spiral out of control to the point that they’re no longer a net benefit. To stay on top of it requires prioritising test maintenance as seriously as keeping production monitoring checks working; it means deleting tests that aren’t worth fixing “right now” lest they undermine our confidence in the whole suite.

Reliability

End to end tests have a reputation for being unreliable, for good reason.

They’re difficult to get right due to asynchronicity, and have to be tolerant of failure due to the main moving parts and unreliable infrastructure they tend to depend upon.

Test or Implementation?

One of the most common causes for flakey tests is a non-deterministic implementation. It’s easy to blame the test for being unreliable when it fails one in a hundred times.

However, it’s just as likely, if not more likely, to be your implementation that is unreliable.

Could your flakey test be caused by a race condition in your code? Does your code still work when network operations are slow? Does your code behave correctly in the face of errors?

Good diagnostics are essential to answer this question; see below.

Wait for interactivity

A common cause of the tests themselves being unreliable seems to be failing to wait for elements to become interactive.

It’s not always possible to simply click on an element on the page, the element might not have been rendered yet, or it might not be visible yet. Instead, one should wait for an element to become visible and interactive, and then click on it.

These waits should be implicit, not explicit. If you instruct your test to sleep for a second before attempting to click a button, that might work most of the time, but will still fail when there’s a slow network connection. Moreover, your test will be unnecessarily slow most of the time when the button becomes clickable in milliseconds.

WebDriver provides an API for implicit waits that allows you to wait for a condition to be true before proceeding. Under the hood it will poll for a condition.

I prefer defining a wrapper around these waits that allows using a lambda to check a condition – it means we can say something like

waitUntil(confirmationMessage::isDisplayed);

Under the hood this polls a page object to check whether the message is displayed or not, and blocks until it is (or a timeout is reached)

Wait, don’t Assert

We’re used to writing assertions in automated tests like

assertEquals("Hello World", confirmationMessage.text());

or

assertThat(confirmationMessage.text(), is("Hello World"));

This kind of assertion tends to suffer from the same problem as failing to wait for interactivity. It may take some amount of elapsed time before the condition you wish to assert becomes true.

It’s generally more reliable to wait /until/ a condition becomes true in the future, and fail with an assertion error if a timeout is hit.

It can help make this the general pattern by combining the waiting and the assertion into a single step.

waitUntilEquals("Hello World", confirmationMessage::text);

Poll confirmationMessage.text() until it becomes equal to Hello World, or a timeout is reached.

This means your tests will continue to pass, even if it takes some time to reach the state you wish to assert.

Stub Dependencies

Browser-controlling tests can be unreliable because they rely on unreliable infrastructure and third parties.

We once discovered that the biggest contributor to test flakiness was our office DNS server, which was sometimes not resolving dns requests correctly.

If your tests load resources (images, javascript, html, etc) over the internet, you rely on infrastructure outside your control. What happens if there is packet loss? What happens if the server you’re loading assets from has a brief outage? Do your tests all fail?

The most reliable option seems to be to host the assets your browser tests load on same machine that the tests are running on, so there is no network involved.

Sometimes you have requests to hardcoded URIs in your application, that can’t be easily changed to resolve to localhost for testing purposes. An HTTP proxy server like browsermob can be used to stub out HTTP requests to resolve to a local resource for test purposes. Think of it like mocking dependencies in unit tests.

Quarantine and Delete

Tests that are unreliable are arguably worse than missing tests. They undermine your confidence in the test suite. It doesn’t take many flakey tests to change your default reaction from seeing a failing test from “Something must be broken” to “Oh the tests are being unreliable”

To avoid this erosion of confidence, it’s important to prioritise fixing problematic tests. This may mean deleting the test if it’s not possible to make it reliable within the amount of time it’s worth spending on it. It’s better to delete tests than live with non-determinism.

A downside to “just” deleting non-deterministic tests is that you lose the opportunity to learn what made them non-deterministic, which may apply to other tests that you have not yet observed being flakey.

An alternative is quarantining the failing tests, so they no longer fail your build when non-deterministic, but still run on a regular basis to help gather more diagnostics as to why they might be failing.

This can be done in JUnit with rules, where you annotate the test method as @NonDeterministic and the framework retries it.

It’s possible to have the tests fail the build if they fail deterministically (i.e. if the feature is genuinely broken), but collect diagnostics if they fail and subsequently pass (non-deterministically).

@Test
@NonDeterministic
public void my_unreliable_test() {
 
}

This approach needs to be combined with discipline. e.g. collecting the test failures in tickets that the team treats as seriously as a broken build. If these failures are ignored the non-determinism will just increase until the test suite doesn’t work at all.

Diagnosis is harder the longer you leave between introducing a problem and fixing it, and your buggy approach may end up getting proliferated into other tests if you leave it in place.

Diagnostics

It’s hard to work out why our tests are unreliable if all we get out as diagnostics is the occasional assertion error or timeout.

This is a particular problem when tests only fail one time in a thousand runs; we don’t get to see them fail, we have only the diagnostics we were prescient enough to collect.

This means it’s particularly important to gather as much diagnostics as possible each time a test fails. In particular, I’ve found it useful to collect

  • Browser JS console output
  • HTTP requests made by the test (HAR file)
  • Screenshots captured between steps in the test

This information could simply be logged as part of your test run. I’ve used Unit rules to tag this information onto test failure messages by wrapping the AssertionErrors thrown by junit.

public class AdditionalDiagnostics extends RuntimeException {
 
    public AdditionalDiagnostics(Browser browser, Throwable e) {
        super(
		e.getMessage() +  
		consoleLog(browser) + 
		httpRequests(browser), 
		collectedScreenshots(browser), 
		e
	);
    }
 
}

This gives us a lot of information to diagnose what’s gone on. It’s not as good as having a browser open with devtools to investigate what’s going on, but it’s pretty good.

You could even record the entire test run as a video that can be reviewed later, there are services that can do this for you.

Stress testing new tests

Given it’s very easy to write unreliable webdriver tests, it’s a good idea to run it many times before pushing your changes.

I’ve found a junit rule handy for this too, to re-run the test many times and fail the test run if the test fails a single time.

@ReliabilityCheck(runs=1000)

Another approach is to Junit’s Parameterized test feature to generate many repetitions.

Harder problems

Alas, not all causes of non-determinism in webdriver tests are straightforward to fix. Once you’ve resolved the most common issues you may still experience occasional failure that are outside your control.

Browser Bugs

Browser bugs sometimes cause the browsers to spontaneously crash during test runs.

This can sometimes be mitigated by building support into your tests for restarting browsers when they crash—if you can detect it.

Headless browsers seem less prone to crashing, but also may not yet support everything you might want to test. Headless chrome still has issues with proxies, extensions, and video playback at time of writing.

Treat like Monitoring

Everything from buggy graphics drivers, to lying DNS servers, to slow clocks, to congested networks can cause unexpected test failures.

A production system is never “up”. It is in a constant state of degradation. The same applies to end to end tests to some extent, as they also tend to rely on infrastructure and many moving parts.

When we build production monitoring we take this into account. It’s unrealistic to say things must be up. Instead we look for our system to be healthy. We tolerate a certain amount of failure.

A 0.01% failure rate may be tolerable to the business; what’s the cost? If it’s someone viewing a tweet the cost of failure is probably acceptable. If it’s a transfer of a million dollars it’s probably not. We determine the failure rate that’s acceptable given the context.

We can apply that to our tests as well. If a 1% failure rate is acceptable for a test, and it happens to fail once, perhaps it’s acceptable if it passes for the next 100 times in a row – this can happen, just needs a small infrastructure blip.

You can achieve this kind of measurement/control with junit rules as well. Run tests multiple times and measure its failure rate and see if it’s within a tolerable level

A benefit of treating your tests like production monitoring checks, is that you can also re-use them as production monitoring checks. Don’t you want to know whether users can successfully log-in in production as well as in your test environment? (See above)

Speed

Writing a lot of automated tests brings a lot of nice-to-have problems. End to end tests are relatively slow as tests go. It doesn’t need many tests before running them starts to get tediously slow.

One of the main benefits of automated tests is that they enable agility, by letting you build, deploy, release, experiment—try things out quickly with some confidence that you’re not breaking important things.

If it takes you hours, even several minutes to run your test suite then you’re not learning as fast as you could, and not getting the full benefits of test automation. You’ll probably need to do something else while you wait for production feedback rather than getting it straight away.

It is possible to keep test suites fast over time, but like with reliability, it requires discipline.

Synchronicity

A sometimes unpopular, but effective way to incentivise keeping test suites fast is to make them (and keep them) a synchronous part of your development process.

As developers we love making slow things asynchronous so that we can ignore the pain. We’ll push our changes to a build server to run the tests in the background while we do something else for an hour.

We check back in later to find that our change has broken the test suite, and now we’ve forgotten the context of our change.

When tests are always run asynchronously like this, there’s little incentive to keep them fast. There’s little difference between a 5 min and a 15min test run, even an hour.

On the other hand if you’re sitting around waiting for the tests to run so inform the next change you want to make, then you feel the pain when they slow down and have a strong incentive to keep them fast—and fast tests enable agility.

If your tests are fast enough to run synchronously after each change then they can give you useful feedback that truly informs the next thing you do: Do you do that refactoring because they’re green, or fix the regression you just introduced?

Of course this only works if you actually listen to the pain and prioritise accordingly. If you’re quite happy sitting around bored and twiddling your thumbs then you’ll get no benefit.

Delete

Tests have an inventory cost. Keeping them around means we have to keep them up to date as things change, keep them reliable, and do performance work to keep our entire test suite fast.

Maybe the cost of breaking certain things just isn’t that high, or you’re unsure why the test exists in the first place. Deleting tests is an ok thing to do. If it’s not giving more value than its cost then delete it.

There’s no reason our test suites only have to get bigger over time, perhaps we can trim them. After all, your tests are only covering the cases you’ve thought about testing anyway, we’re always missing things. Which of the things we are testing are really important not to break?

Monitoring / Async Tests

I argued above that keeping tests fast enough that they can be part of a synchronous development feedback loop is valuable. However, maybe there’s some tests that are less important, and could be asynchronous—either as production monitoring or async test suites.

Is it essential that you avoid breaking everything? Is there anything that isn’t that bad to break? Perhaps some features are more important than others? It might be really crucial that you never release a change that calculates financial transactions incorrectly, but is it as crucial that people can upload photos?

How long could you live with any given feature being broken for? What’s the cost? If half of your features could be broken for an hour with minimal business impact, and you can deploy a change in a few minutes, then you could consider monitoring the health of those features in production instead of prior to production.

If you can be notified, respond, and fix a production problem and still maintain your service level objective, then sometimes you’re better off not checking certain things pre-production if it helps you move faster.

On the other hand if you find yourself regularly breaking certain things in production and having to roll back then you probably need to move checks the other way, into pre-production gates.

Stubbing Dependencies

Stubbing dependencies helps with test reliability—eliminating network round trips eliminates the network as a cause of failure.

Stubbing dependencies also helps with test performance. Network round trips are slow, eliminating them speeds up the tests. Services we depend on may be slow, if that service is not under test in this particular case then why not stub it out?

When we write unit tests we stub out slow dependencies to keep them fast, we can apply the same principles to end to end tests. Stub out the dependencies that are not relevant to the test.

Move test assets onto the same machine that’s executing the tests (or as close as possible) to reduce round trip times. Stub out calls to third party services that are not applicable to the behaviour under test with default responses to reduce execution time.

Split Deployables

A slow test suite for a system is a design smell. It may be telling us that this has too many responsibilities and could be split up into separate independently deployable components.

The web is a great platform for integration. Even the humble hyperlink is a fantastic integration tool.

Does all of your webapp have to be a single deployable? Perhaps the login system could be deployed separately to the photo browser? Perhaps the financial reporting pages could be deployed separately to the user administration pages?

Defining smaller, independent components that can be independently tested and deployed, helps keep the test suites for each fast. It helps us keep iterating quickly as the overall system complexity grows.

It’s often valuable to invest in a few cross-system integration smoke tests when breaking systems apart like this.

Parallelise

The closest thing to a silver bullet for end to end test performance is parallelisation. If you have 1,000 tests that take 5 seconds each, but you can run all 1,000 in parallel, then your test suite still only takes a few seconds.

This can sometimes be quite straightforward, if you avoid adding state to your tests then what’s stopping you running all of them in parallel?

There are, however, some roadblocks that appear in practice.

Infrastructure

On a single machine there’s often a fairly low limit to how many tests you can execute in parallel, particularly if you need real browsers as opposed to headless. Running thousands of tests concurrently in a server farm also requires quite a bit of infrastructure setup.

All that test infrastructure also introduces more non-deterministic failure scenarios that we need to be able to deal with. It may of course be worth it if your tests are providing enough value.

AWS lambda is very promising for executing tests in parallel, though currently limited to headless browsers.

State

Application state is a challenge for test parallelisation. It’s relatively easy to parallelise end to end tests of stateless webapp features, where our tests have no side-effect on the running application. It’s more of a challenge when our tests have side effects such as purchasing a product, or signing-up as a new user.

The result of one test can easily affect another by changing the state in the application. There’s a few techniques that can help:

Multiple Instances

Perhaps the conceptually simplest solution is to run one instance of the application you’re testing for each test runner, and keep the state completely isolated.

This may of course be impractical. Spinning up multiple instances of the app and all its associated infrastructure might be easier said than done—perhaps you’re testing a legacy application that can’t easily be provisioned.

Side-Effect Toggles

This is a technique that can also be used for production monitoring. Have a URL parameter (or other way of passing a flag to your application under test) that instructs the application to avoid triggering certain side effects. e.g. ?record_analytics=false

This technique is only useful if the side effects are not necessary to the feature that you’re trying to test. It’s also only applicable if you have the ability to change the implementation to help testing.

Application Level Isolation

Another approach is to have some way of isolating the state for each test within the application. For example, each test could create itself a new user account, and all data created by that user might be isolated from access by other users.

This also enables cleanup after the test run by deleting all data associated with the temporary user.

This can also be used for production monitoring if you build in a “right to be forgotten” feature for production users. However, again it assumes you have the ability to change the implementation to make it easier to test.

Maintainability

Performance is one of the nice-to-have problems that comes from having a decently sized suite of end to end tests. Another is maintainability over the long term.

We write end to end tests to make it easier to change the system rapidly and with confidence. Without care, the opposite can be true. Tests that are coupled to implementations create resistance to change rather than enabling it.

If you re-organise your HTML and need to trawl through hundreds of tests fixing them all to match the new page structure, you’re not getting the touted benefits, you might even be better off without such tests.

If you change a key user journey such as logging into the system and as a result need to update every test then you’re not seeing the benefits.

There are two patterns that help avoid these problems: the Page Object Pattern and the Screenplay Pattern.

Really, both of these patterns are explaining what emerges if you were to ruthlessly refactor your tests—factoring out unnecessary repetition and creating abstractions that add clarity

Page Objects

Page Objects abstract your testcases themselves away from the mechanics of locating and interacting with elements on the page. If you’ve got strings and selectors in your test cases, you may be coupling your tests to the current implementation.

If you’re using page objects well, then when you redesign your site, or re-organise your markup you shouldn’t have to update multiple testcases. You should just need to update your page objects to map to the new page structure.

// directly interacting with page
driver.findElement(By.id("username")).sendKeys(username);
 
// using a page object
page.loginAs(username);

I’ve seen this pay off: tests written for one ad format being entirely re-usable with a built-from-scratch ad format that shared behaviours. All that was needed was re-mapping the page objects.

Page objects can be a win for reliability. There’s fewer places to update when you realise you’re not waiting for interactivity of a component. A small improvement to your page objects can improve many tests at once.

Screenplay Pattern

For a long time our end to end testing efforts were focused on Ads—with small, simple, user journeys. Standard page objects coped well with the complexity.

When we started end to end testing more complex applications we took what we’d learnt the hard way from our ad tests and introduced page objects early.

However, this time we started noticing code smells—the page objects themselves started getting big and unwieldy, and we were seeing repetition of interactions with the pageobjects in different tests.

You could understand what the tests were doing by comparing the tests to what you see on the screen—you’d log in, then browse to a section. However, they were mechanical, they were written in the domain of interacting with the page, not using the language the users would use to describe the tasks they were trying to accomplish.

That’s when we were introduced to the screenplay pattern by Antony Marcarno (tests written in this style tend to read a little like a screenplay)

There are other articles that explain the screenplay pattern far more eloquently than I could. Suffice to say that it resolved many of the code smells we were noticing applying page objects to more complex applications.

Interactions & Tasks become small re-usable functions, and these functions can be composed into higher level conceptual tasks.

You might have a test where a user performs a login task, while another test might perform a “view report” task that composes the login and navigation tasks.

.attemptsTo(loginAs(publisher))
.attemptsTo(navigateToEarnings())
.attemptsTo(viewSavedReport())
 
/* extract, refactor, reuse */
 
.attemptsTo(viewEarnings())

Unruly has released a little library that emerged when we started writing tests in the screenplay pattern style, and there’s also gold standard of Serenity BDD.

Summary

End to end tests with webdriver present lots of opportunities—reducing risks, checking across browsers & devices, testing invariants, and reuse for monitoring.

Like any automated tests, there are performance, maintainability, and reliability challenges that can be overcome.

Most of these principles are applicable to any automated tests, with end to end tests we tend to run into the pain earlier, and the costs of test inventory are higher.

Posted by & filed under Java.

Having benefited from “var” for many years when writing c#, I’m delighted that Java is at last getting support for local variable type inference in JDK 10.

From JDK 10 instead of saying

ArrayList<String> foo = new ArrayList<String>();

we can say

var foo = new ArrayList<String>();

and the type of “foo” is inferred as ArrayList<String>

While this is nice in that it removes repetition and reduces boilerplate slightly, the real benefits come from the ability to have variables with types that are impractical or impossible to represent.

Impractical Types

When transforming data it’s easy to be left with intermediary representations of the data that have deeply nested generic types.

Let’s steal an example from a c# linq query, that groups a customer’s orders by year and then by month.

While Java doesn’t have LINQ, we can get fairly close thanks to lambdas.

from(customerList)
    .select(c -> tuple(
        c.companyName(),
        from(c.orders())
            .groupBy(o -> o.orderDate().year())
            .select(into((year, orders) -> tuple(
                year,
                from(orders)
                    .groupBy(o -> o.orderDate().month())
            )))
       ));

While not quite as clean as the c# version, it’s relatively similar. But what happens when we try to assign our customer order groupings to a local variable?

CollectionLinq<Tuple<String, CollectionLinq<Tuple<Integer, Group<Integer, Order>>>>> customerOrderGroups =
   from(customerList)
   .select(c -> tuple(
       c.companyName(),
       from(c.orders())
           .groupBy(o -> o.orderDate().year())
           .select(into((year, orders) -> tuple(
               year,
               from(orders)
                   .groupBy(o -> o.orderDate().month())
           )))
   ));

Oh dear, that type description is rather awkward. The Java solutions to this have tended to be one of

  • Define custom types for each intermediary stage—perhaps here we’d define a CustomerOrderGroup type.
  • Chaining many operations together—adding more transformations onto the end of this chain
  • Lose the type information

Now we don’t have to work around the problem, and can concisely represent our intermediary steps

var customerOrderGroups =
   from(customerList)
   .select(c -> tuple(
       c.companyName(),
       from(c.orders())
           .groupBy(o -> o.orderDate().year())
           .select(into((year, orders) -> tuple(
               year,
               from(orders)
                   .groupBy(o -> o.orderDate().month())
           )))
   ));

Impossible Types

The above example was impractical to represent due to being excessively long and obscure. Some types are just not possible to represent without type inference as they are anonymous.

The simplest example is an anonymous inner class

var person = new Object() {
   String name = "bob";
   int age = 5;
};
 
System.out.println(person.name + " aged " + person.age);

There’s no type that you could replace “var” with in this example that would enable this code to continue working.

Combining with the previous linq-style query example, this gives us the ability to have named tuple types, with meaningful property names.

var lengthOfNames  =
    from(customerList)
        .select(c -> new Object() {
            String companyName = c.companyName();
            int length = c.companyName().length();
        });
 
lengthOfNames.forEach(
    o -> System.out.println(o.companyName + " length " + o.length)
);

This also means it becomes more practical to create and use intersection types by mixing together interfaces and assigning to local variables

Here’s an example mixing together a Quacks and Waddles interface to create an anonymous Duck type.

public static void main(String... args) {
   var duck = (Quacks & Waddles) Mixin::create;
   duck.quack();
   duck.waddle();
}
 
interface Quacks extends Mixin {
   default void quack() {
       System.out.println("Quack");
   }
}
 
interface Waddles extends Mixin {
   default void waddle() {
       System.out.println("Waddle");
   }
}
 
interface Mixin {
   void __noop__();
   static void create() {}
}

This has more practical applications, such as adding behaviours onto existing types, ala extension methods

Encouraging Intermediary Variables

It’s now possible to declare variables with types that were erstwhile impractical or impossible to represent.

I hope that this leads to clearer code as it’s practical to add variables that explain the intermediate steps of transformations, as well as enabling previously impractical techniques such as the above.

Posted by & filed under XP.

How does your team prioritise work? Who gets to decide what is most important? What would happen if each team member just worked on what they felt like?

I’ve had the opportunity to observe an experiment: over the past 8 years at Unruly, developers have had 20% of their time to work on whatever they want.

This is not exactly like Google’s famed 20% time for what “will most benefit Google” or “120% time”.

Instead, developers genuinely have 20% of their time (typically a day a week) to work on whatever they choose—whatever they deem most important to themselves. There are no rules, other than the company retains ownership of anything produced (which does not preclude open sourcing).

We call 20% time “Gold Cards” after the Connextra practice it’s based upon. Initially we represented the time using yellow coloured cards on our team board.

It’s important to us—if the team fails to take close to 20% of their time on gold cards it will be raised in retrospectives and considered a problem to address.

While it may seem like an expensive practice, it’s an investment in individuals that I’ve seen really pay off, time after time.

Antidote to Prioritisation Systems

If you’re working in a team, you’ll probably have some mechanism for making prioritisation decisions about what is most important to work on next; whether that be a benevolent dictatorship, team consensus, voting, cost of delay, or something else.

However much you like and trust the decision making process in your team, does it always result in the best decisions? Are there times when you thought the team was making the wrong decision and you turned out to be right?

Gold cards allow each individual in the team time to work on things explicitly not prioritised by the team, guilt free.

This can go some way to mitigating flaws in the team’s prioritisation. If you feel strongly enough that a decision is wrong, then you can explore it further on your gold card time. You can build that feature that you think is more important, or you can create a proof-of-concept to demonstrate an approach is viable.

This can reduce the stakes in team prioritisation discussions, taking some of the stress away; you at least have your gold card time to allocate how you see fit.

Here’s some of the ways it’s played out.

Saving Months of Work

I can recall multiple occasions when gold card activities have saved literally team-months of development work.

Avoiding Yak Shaving

One was a classic yak-shaving scenario. Our team discovered that a critical service could not be easily reprovisioned, and to make matters worse, was over capacity.

Fast forward a few weeks and we were no longer just reprovisioning a service, but creating a new base operating system image for all our infrastructure, a new build pipeline for creating it, and attempting to find/build alternatives for components that turned out to be incompatible with this new software stack.

We were a couple of months in, and estimated another couple of months work to complete the migration.

We’d retrospected a few times, we thought we’d fully considered our other options and we were just best off ploughing on through the long, but now well-understood path to completion.

Someone disagreed, and decided to use their gold card to go back and re-visit one of the early options the team thought they’d ruled out.

Within a day they’d demonstrated a solution to the original problem using our original tech stack, without needing most of the yak shaving activities.

Innovative Solutions

I’ve also seen people spotting opportunities in their gold cards that the team had not considered, saving months of work.

We had a need to store a large amount of additional data. We’d estimated a it would take the team some months to build out a new database cluster for the anticipated storage needs.

A gold card used to look for a mechanism for compressing the data, ended up yielding a solution that enabled us to indefinitely store the data, using our existing infrastructure.

Spawning new Products

Gold cards give people space to innovate, time to try new things, wild ideas that might be too early.

Our first mobile-web compatible ad formats came out of a gold card. We had mobile compatible ads considerably before we had enough mobile traffic to make it seem worthwhile.

Someone wanted to spend time working on mobile on their gold card, which resulted in having a product ready to go when mobile use increased, we weren’t playing catch up.

On another occasion a feature we were exploring had a prohibitively large download size for the bandwidth available at the time. A gold card yielded a far more bandwidth-efficient mechanism, contributing to the success of the product.

“How hard can it be?”

It’s easy to underestimate the complexity involved in building new features. “How hard can it be?” is often a dangerous phrase, uttered before discovering just how hard it really is, or embroiling oneself in large amounts of work.

Gold cards make this safe. If it’s hard enough that you can’t achieve it in your gold card, then you’ve only spent a small amount of time, and only your own discretionary time.

Gold cards also make it easy to experiment—you don’t need to convince anyone else that it will work. Sometimes, just sometimes, things actually turn out to be as easy, or even easier, than our hopes.

For a long time we had woeful reporting capabilities on our financial data. The team believed that importing this data to our data warehouse would be a large endeavour, involving redesigning our entire data pipeline.

A couple of developers disagreed, and decided to spend their gold card time working together, attempting to making this data reportable. They ended up coming up with a simple solution, that was compatible with the existing technology, and has withstood the test of time. Huge value unlocked from just one day spent speculatively.

That thing that bothers you

Whether it’s a code smell you want to get rid of, some UX debt that irritates you every time you see it, or the lack of automation in a task you perform regularly; there are always things that irritate us.

We ought to be paying attention to these irritations and addressing them as we notice them, but sometimes the team has deemed something else is more important or urgent.

Gold cards give you an opportunity to fix the things that matter to you. Not only does this help avoid frustration, but sometimes individuals fixing things they find annoying actually produces better outcomes than the wisdom of the crowd.

On one occasion a couple of developers spent their gold card just deleting code. They ended up deleting thousands of unneeded lines of code. Did this cleanup pay off yet? I honestly don’t know, but it may well have done, we have less inventory cost as a result.

Exploring New Tech

When tasked with solving a problem, we have a bias towards tools & technology that we know and understand. This is generally a good thing, exploring every option is often costly and if we pick something new, then the team has to learn it before we become productive.

Sometimes this means we miss out on tech that makes our lives much easier.

People often spend their gold card time playing around with speculative new technologies that they were unfamiliar with.

Much of the tech our teams now rely upon was first investigated and evangelised by someone who tried it out in gold card time; from build systems to monitoring tools, from to databases to test frameworks.

Learning

Tech changes fast; as developers we need to be constantly learning to stay competitive. Sometimes this can present a conflict of interest between the needs of the team to achieve a goal (safer to use known and reliable technology), and your desires to work with new cutting-edge tech.

Gold cards allow you to to prioritise your own learning for at least a day a week. It’s great for you, and it’s great for the team too as it brings in new ideas, techniques, and skills. It’s an investment that increases the skill level of the team over time.

Do you feel like you’d be able to be a better member of the team if you understood the business domain better? What if you knew the programming language you’re working in to a deeper level? If these feel important to you, then gold cards give you dedicated time that you can choose to spend in that way, without needing anyone else’s approval.

Sharing Knowledge

Some people use gold card time to prepare talks they want to give at conferences, or internally at our fortnightly tech-talks. Others write blog posts.

Sharing in this way not only helps others internally, but also gives back to the wider community. It raises people’s individual profiles as excellent developers, and raises the company’s profile as a potential employer.

Furthermore, many find that preparing content in this way improves their own understanding of a topic.

We’re so keen on this that we now give people extra days for writing blog posts.

Remote Working

Many of our XP practices work really well in co-located teams, but we’ve struggled to apply them to remote working. It’s definitely possible to do things like pair and mob-programming remotely, but it can be challenging for teams used to working together in the same space.

We’ve found that gold card time presented an easy opportunity to experiment with remote working—an opportunity to address some of the pain points as we look for ways to introduce more flexibility.

Remote working makes it easier to hire, and helps avoid excluding people who would be unable to join us without this flexibility

Side Projects

Sometimes people choose to work on something completely not work related, like a side project, a game, or a new app. This might not seem immediately valuable to the team, but it’s an opportunity for people to learn in a different context—gaining experience in greenfield development, starting a project from scratch and choosing technologies.

The more diverse our team’s experience & knowledge, the more likely we are to make good decisions in the future. Change is a constant in the industry—we won’t we’ll be working with the tech we’re currently using indefinitely.

Side projects bring some of this learning forward and in-house; we get new perspectives without having to hire new people.

Gold cards allow people to grow without expecting them to spend all their evenings and weekends writing code, encouraging a healthy work/life balance.

Sometimes a change is just what one needs. We spend a lot of our time pair programming; pairing can be intense and tiring. Gold cards give us an opportunity to work on something completely different at least once a week.

Open Source

Most of what we’re working on day to day is not suitable for open sourcing, or would require considerable work to open up.

Gold cards mean we can choose to spend some of our time working on open source software—giving back to the community by working on existing open source code, or working on opening up internal tools.

Hiring & Retention

Having the freedom to spend a day a week working on whatever you want is a nice perk. Offering it helps us hire, and makes Unruly a hard place to leave. The flexibility introduced by gold cards to do the kinds of things outlined above also contribute towards happiness and retention.

Given the costs of recruitment, hiring, onboarding & training, gold cards are worth considering as a benefit even if you didn’t get any of the extra benefits from these anecdotes.

Pitfalls

One trap to avoid is only doing the activities outlined above on gold card days. Many of the activities above should be things the team is doing anyway.

I’ve seen teams start to rely on others—not cleaning up things as a matter of course during their day to day work, because they expect someone will want to do it on their gold card.

I’ve seen teams not set time aside for learning & exploring because they rely on people spending their gold cards on it.

I’ve seen teams ineffectually ploughing ahead with their planned work without stepping back to try to spike some alternative solutions.

These activities should not be restricted to gold cards. Gold cards just give each person the freedom to work on what is most important to them, rather than what’s most important to the team.

There’s also the opposite challenge: new team members may not realise the full range possible uses for gold cards. Gold card use can drift over time to focus more and more on one particular activity, becoming seen as “Learning days” or “Spike days”.

Gold cards seem to be most beneficial when they are used for a wide variety of activities, helping the team notice the benefits of things they hadn’t seen as important.

Gold card time doesn’t always pay off, but it only has to pay off occasionally to be worthwhile.

Can we turn it up?

We learn from extreme programming to look for things that are good and turn them up to the max, to get the most value out of them.

If gold cards can bring all these benefits, what would happen if we made them more than 20% time?

Can we give individuals more autonomy without losing the benefits of other things we’ve seen to work well?

What’s the best balance between individual autonomy and the benefits of teams working collaboratively, pair programming, team goals, and stakeholder prioritisation?

We’ve turned things up a little: giving people extra days for conference speaking and blogging, carving out extra time for code dojos, talk preparation, and learning.

I’m sure there’s more we can do to balance the best of individuals working independently, with the benefits of teams.

What have you tried that works well?

Posted by & filed under XP.

There has been more discussion recently on the concept of a “10x engineer”. 10x engineers are, (from Quora) “the top tier of engineers that are 10x more productive than the average”

Productivity

I have observed that some people are able to get 10 times more done than me. However, I’d argue that individual productivity is as irrelevant as team efficiency.

Productivity is often defined and thought about in terms of the amount of stuff produced.

“The effectiveness of productive effort, especially in industry, as measured in terms of the rate of output per unit of input”

Diseconomies of Scale

The trouble is, software has diseconomies of scale. The more we build, the more expensive it becomes to build and maintain. As software grows, we’ll spend more time and money on:

  • Operational support – keeping it running
  • User support – helping people use the features
  • Developer support – training new people to understand our software
  • Developing new features – As the system grows so will the complexity and the time to build new features on top of it (Even with well-factored code)
  • Understanding dependencies – The complex software and systems upon which we build
  • Building Tools – to scale testing/deployment/software changes
  • Communication – as we try to enable more people to work on it

The more each individual produces, the slower the team around them will operate.

Are we Effective?

Only a small percentage of things I build end up generating enough value to justify their existence – and that’s with a development process that is intended to constantly focus us on the highest value work.

If we build a feature that users are happy with it’s easy to count that as a win. It’s even easier to count it as a win if it makes more money than it cost to build.

Does it look as good when you its compare its cost/benefit to some of the other things that the team could have been working on over the same time period? Everything we choose to work on has an opportunity cost, since by choosing to work on it we are therefore not able to work on something potentially more valuable.

Applying the 0.1x

The times I feel I’ve made most difference to our team’s effectiveness is when I find ways to not build things.

  • Let’s not build that feature.
    Is there existing software that could be used instead?
  • Let’s not add this functionality.
    Does the complexity it will introduce really justify its existence?
  • Let’s not build that product yet.
    Can we first do some small things to test the assumption that it will be valuable?
  • Let’s not build/deploy that development tool.
    Can we adjust our process or practices instead to make it unnecessary?
  • Let’s not adopt this new technology.
    Can we achieve the same thing with a technology that the team is already using and familiar with? “The best tool for the job” is a very dangerous phrase.
  • Let’s not keep maintaining this feature.
    What is blocking us from deleting this code?
  • Let’s not automate this.
    Can we find a way to not need to do it all?

Identifying the Value is Hard

Given the cost of maintaining everything we build, it would literally be better for us to do 10% the work and sit around doing nothing for the rest of our time, if we could figure out the right 10% to work on.

We could even spend 10x as long on minimising the ongoing cost of maintaining that 10%. Figuring out what the most valuable things to work on and what is a waste of time is the hard part.

Posted by & filed under XP.

The most common reaction I hear when I tell people about mob programming (or even paired programing) is “How can that possibly be efficient?”, sometimes phrased as “How can you justify that to management?” or “How productive are you?”

I think that efficiency in terms of “How much stuff can we get done in a week” is the wrong thing to be focussing on in teams. It can often be helpful to be less efficient.

“All the brilliant people working at the same time, in the same space, on the same thing, at the same computer.” — Woody Zuill


At Unruly we’ve been Mob Programming regularly over the last year.

At first glance it’s hard to see why it could be worth working this way. Five or more people working on a single task seems inefficient compared to working on five tasks simultaneously. As developers we’re used to thinking about parallelising work so that we can scale out.

Build Less!

If your team builds twice as much stuff as another team, are you more effective?

What if 80% of the software your team builds is never used, and everything another team builds is heavily used?

What if all the features you build are worth less than a single feature the other team has built?

We’re better off slowing down if it means that what we do build is more valuable

Value Disparity

There’s often a huge disparity between the relative value of different things we can be working on. We can easily get distracted building Feature A that might make us $10,000 this year, when we could be building Feature B which will make us $10,000,000 this year.

It’s often not evident up front which of these will be more valuable. However, if we can order our development to start with testing hypotheses about features A and B we often learn that one is much less valuable than we thought, for some reason it won’t work for us — meanwhile, new opportunities often open up that makes the other option much more interesting.

Focus on Goal

When working alone it’s very easy to get sidetracked into working on things you notice along the way that are important but unrelated to the current goal of the team. When working together there are more people to hold one another accountable and bring the focus of the team back to the primary goal, avoiding time consuming diversions.

When working together we also help hold each other accountable for following working agreements like fixing non-deterministic tests immediately, or refactoring a piece of code the next time we’re in the area.

If you’re going to build it, build it right

It’s easy to plan a feature, implement what you planned to do, and have it technically working, but generating no value. Here is a case where “technically correct” is not the best kind of correct.

If we release a feature and it’s not being used, or not making any money, we need to learn, iterate and improve. This may involve ordering the development to prioritise trying things out early, even if we’re not entirely happy with the finished product.

Unstoppable Team

It’s often more interesting how quickly we can achieve a team goal, than how much our team can get done in a set time period. In programmer parlance Low latency is more valuable than high throughput.

Therefore it can be worth trading off “efficiency” if it means you get to your goal slightly quicker.

In Extreme Programming circles there’s a concept of ideal time — if everything went exactly according to plan, and you had no interruptions, how long would a task take.

Ideal Days

Working together as team in a mob is the closest I’ve experienced to real “Ideal Days”.

When working alone, or even when pairing, there are often interruptions. You have to go off to a meeting, so work stops. Somebody asks you a question, and work stops. You get stuck on a distracting problem, so work stops. You take a bathroom break, and work stops.

This tends to lead to individual or pair developer days being less than ideal. Rather, you get a few periods of productivity interspersed with interruptions where you lose your “flow” and train of thought.

This is quite different with a mob of a few people.

Can’t stop the mob

If you need to go off to a meeting, you go off to your meeting. The mob keeps on rolling.

If someone comes over with a question, someone peels off the mob to help them. The mob keeps on rolling.

You encounter a puzzling problem, no-one has any idea how to approach it, someone peels off to go and spike a couple of approaches. The mob keeps on rolling.

You’re feeling like a break, you can just take one whenever you like. The mob keeps on rolling. In this regard mob programming is actually less tiring than pair programming. There’s no amount of guilt from losing concentration or taking a break. You know the team will continue.

So while a mob requires more people, it lets us achieve a specific goal more quickly than if we were working on individual tasks.

Team Investment

It’s also worth bearing in mind that the value of your team practices can’t be measured purely by the amount of stuff you deliver, or even in the amount of money generated by the features you build.

If your work is investing in the team’s ability to support the software in production in the future, or in their ability to move and learn faster in the future, then that’s adding value, albeit sometimes hard to measure.

So…

Don’t aim to be an efficient team, aim to be an effective team.

Instead of optimising the amount of stuff you deliver, optimise the amount of value you add to your organisation.

Mob-programming and pair-programming are techniques that can help teams be more effective. They may or may not affect productivity, but it doesn’t matter.

Posted by & filed under Java.

Many things can be modelled as finite state machines. Particularly things where you’d naturally use “state” in the name e.g. the current state of an order, or delivery status. We often model these as enums.

enum OrderStatus {
    Pending,
    CheckingOut,
    Purchased,
    Shipped,
    Cancelled,
    Delivered,
    Failed,
    Refunded
}

Enums are great for restricting our order status to only valid states. However, usually there are only certain transitions that are valid. We can’t go from Delivered to Failed. Nor would we go straight from Pending to Delivered. Maybe we can transition from Purchased to either Shipped or Cancelled.

Using enum values we cannot restrict to the transitions to only those that we desire. It would be nice to also let the compiler help us out by not letting us choose invalid transitions in our code.

We can, however, achieve this if we use a class hierarchy to represent our states instead, and it can still be fairly concise. There are other reasons for using regular classes, they allow us to store and even capture state from the surrounding context.

Here’s a way we could model the above enum as a class heirarchy with the valid transitions.

interface OrderStatus extends State<OrderStatus> {}
static class Pending     implements OrderStatus, BiTransitionTo<CheckingOut, Cancelled> {}
static class CheckingOut implements OrderStatus, BiTransitionTo<Purchased, Cancelled> {}
static class Purchased   implements OrderStatus, BiTransitionTo<Shipped, Failed> {}
static class Shipped     implements OrderStatus, BiTransitionTo<Refunded, Delivered> {}
static class Delivered   implements OrderStatus, TransitionTo<Refunded> {}
static class Cancelled   implements OrderStatus {}
static class Failed      implements OrderStatus {}
static class Refunded    implements OrderStatus {}

We’ve declared an OrderStatus interface and then created implementations of OrderStatus for each valid state. We’ve then encoded the valid transitions as other interface implementations. There’s a TransitionTo<State> and BiTransitionTo<State1,State2>, or TriTransitionTo<State1,State2,State3> depending on the number of valid transitions from that state. We need differently named interfaces for different numbers of transitions because Java doesn’t support variance on the number of generic type parameters.

Compile-time checking valid transitions

Now we can create the TransitionTo/BiTransitionTo interfaces, which can give us the functionality to transition to a new state (but only if it is valid)

We might imagine an api like this where we can choose which state to transition to

new Pending()
    .transitionTo(CheckingOut.class)
    .transitionTo(Purchased.class)
    .transitionTo(Refunded.class) // <-- can we make this line fail to compile?

This turns out to be a little tricky, but not impossible, due to type erasure.

Let’s try to implement BiTransitionTo interface with the two valid transition.

public interface BiTransitionTo<T, U> {
    default T transitionTo(Class<T> type) { ... }
    default U transitionTo(Class<U> type) { ... }
}

Both of these transitionTo methods have the same erasure. So we can’t do it quite like this. However, if we can encourage the consumer of our API to pass a lambda, there is a way to work around this same erasure problem.

So how about this API, where instead of passing class literals we pass constructor references. It looks similarly clean, but constructor references are basically lambdas so we can avoid type erasure.

new Pending()
    .transition(CheckingOut::new)
    .transition(Purchased::new)
    .transition(Refunded::new) // <-- Now we can make this fail to compile

In order to make this work the trick is to create a new interface type for each valid transition within our BiTransitionTo interface

public interface BiTransitionTo<T, U> {
    interface OneTransition<T> extends Supplier<T> { }
    default T transition(OneTransition<T> constructor) { ... }
    interface TwoTransition<T> extends Supplier<T> { }
    default U transition(TwoTransition<U> constructor) { ... }
}

Supplier<T> is a functional interface in the java.util.function that is equivalent to a no-args constructor reference. By creating two interfaces that extend this we can overload the transition() method twice, allowing both methods to be passed a constructor reference without the two methods having the same erasure.

Runtime checking

Sometimes we might not be able to know at compile-time what state our statemachine is in. Perhaps a Customer has a field of type OrderStatus that we serialize to a database. We would need to be able to try a transition at runtime, and fail in some manner if the transition is not valid.

This is also possible using the TransitionTo<NewState> approach outlined above. Since supertype parameters are available at runtime, we can implement a tryTransition() method that uses reflection to check which transitions are permitted.

First we’ll need a way of finding the valid transition types. We’ll add it to our State base interface.

default List<Class<?>> validTransitionTypes() {
    return asList(getClass().getGenericInterfaces())
        .stream()
        .filter(type -> type instanceof ParameterizedType)
        .map(type -> (ParameterizedType) type)
        .filter(TransitionTo::isTransition) 
        .flatMap(type -> asList(type.getActualTypeArguments()).stream())
        .map(type -> (Class<?>) type)
        .collect(toList());
}

Note the isTransition filter. Since we have multiple transition interfaces – TransitionTo<T>, BiTransitionTo<T,U>, TriTransitionTo<T,U,V> etc, we need a way of marking them as all specifying transitions. I’ve used an annotation

@Retention(RUNTIME)
@Target(ElementType.TYPE)
public @interface Transition {
 
}
static boolean isTransition(ParameterizedType type) {
     Class<?> cls = (Class<?>)type.getRawType();
     return cls.getAnnotationsByType(Transition.class).length > 0;
}
 
@Transition
public interface TriTransitionTo...

Once we have validTransitionTypes() we can find which transitions are valid at runtime.

static class Pending implements OrderStatus, BiTransitionTo<CheckingOut, Cancelled> {}
@Test
public void finding_valid_transitions_at_runtime() {
    Pending pending = new Pending();
    assertEquals(
        asList(CheckingOut.class, Cancelled.class),
        pending.validTransitionTypes()
    );
}

Now that we have the valid types, tryTransition() needs to check whether the requested transition is to one of those types.

This is a little tricky, but since we’re passing a lambda we make it a lambda-type-token and use reflection to find the type parameter of the lambda.

Our implementation then looks something like

 
interface NextState<T> extends Supplier<T>, MethodFinder {
    default Class<T> type() {
        return (Class<T>) getContainingClass();
    }
}
default <T> T tryTransition(NextState<T> desired) {
    if (validTransitionTypes.contains(desired.type())) {
        return desired.get();
    }
 
    throw new IllegalStateTransitionException();
}

We can make it a bit nicer by allowing the caller to specify the exception to throw on error, like an Optional’s orElseThrow. We can also allow the caller to ignore failed transitions.

@Test
public void runtime_checked_transition() {
    OrderStatus state = new Pending();
    assertTrue(state instanceof Pending);
    state = state
        .tryTransition(CheckingOut::new)
        .unchecked();
    assertTrue(state instanceof CheckingOut);
}

Since we’ve transitioned into a known state (or thrown an exception) with tryTransition we could then chain compile-time checked transitions on the end.

@Test
public void runtime_checked_transition() {
    OrderStatus state = new Pending();
    assertTrue(state instanceof Pending);
    state = state
        .tryTransition(CheckingOut::new)
        .unchecked()
        .transition(Purchased::new); // This will be permitted if the tryTransition succeeds.
    assertTrue(state instanceof CheckingOut);
}

We can even let people ignore transition failures if they wish, just by catching the exception and returning the original value.

@Test
public void runtime_checked_transition_ignoring_failure() {
    OrderStatus state = new Pending();
    assertTrue(state instanceof Pending);
    state = state
        .tryTransition(Refunded::new)
        .ignoreIfInvalid();
    assertFalse(state instanceof Refunded);
    assertTrue(state instanceof Pending);
}

Adding Behaviour

Since our states are classes, we can add behaviour to them.

For instance we could add a notifyProgress() method to our OrderStatus, with different implementations in each state.

interface OrderStatus extends State<OrderStatus> {
    default void notifyProgress(Customer customer, EmailSender sender) {}
}
static class Purchased implements OrderStatus, BiTransitionTo<Shipped, Failed> {
    public void notifyProgress(Customer customer, EmailSender emailSender) {
        emailSender.sendEmail("fulfillment@mycompany.com", "Customer order pending");
        emailSender.sendEmail(customer.email(), "Your order is on its way");
    }
}
...
OrderStatus status = new Pending();
status.notifyProgress(customer, sender); // Does nothing
status = status
    .tryTransition(CheckingOut::new)
    .unchecked()
    .transition(Purchased::new);
status.notifyProgress(customer, sender) ; // sends emails

Then we can call notifyProgress on any OrderStatus instance and it will notify differently depending on which implementation is active.

Internal Transitions

One of the ways to make most use of the typechecked transitions is to have the transitions internally within the state. e.g. in a state machine for the Regex “A+B” the A state can transition either

  • Back to A
  • To B
  • To a match failure state

If we do this we can make them typechecked even though we don’t know what the string we’re matching in advance is.

static class A implements APlusB, TriTransitionTo<A,B,NoMatch> {
    public APlusB match(String s) {
        if (s.length() < 1) return transition(NoMatch::new);
        if (s.charAt(0) == 'A') return transition(A::new).match(s.substring(1));
        if (s.charAt(0) == 'B') return transition(B::new).match(s.substring(1));
        return transition(NoMatch::new);
    }
}

Full example here

Capturing State

If we use non-static classes we could also capture state from the enclosing class. Supposing these OrderStatuses are contained within an Order class that already has an EmailSender available, we’d no longer need to pass in the emailSender and the customer to the notifyProgress() method.

class Order {
    EmailSender emailSender;
    Customer customer;
    class Purchased implements OrderStatus, BiTransitionTo<Shipped, Failed> {
        public void notifyProgress() {
            emailSender.sendEmail("fulfillment@mycompany.com", "Customer order pending");
            emailSender.sendEmail(customer.email(), "Your order is on its way");
        }
    }
}

Guards

Another feature we might want is the ability to execute some code before transitioning into a new state or after transitioning into a new state. This is something we can add to our base State interface. Let’s add two methods beforeTransition() and afterTransition()

interface State {
    default void afterTransition(T from) {}
    default void beforeTransition(T to) {}
}

We can then update our transition implementation to invoke these guard methods before and after a transition occurs.

We could use this to log all transitions into the Failure state.

class Failed implements OrderStatus {
    @Override
    public void afterTransition(OrderStatus from) {
        failureLog.warning("Oh bother! failed from " + from.getClass().getSimpleName());
    }
}

We could also combine state capturing and guard methods to build a stateful-state machine that updates its state on transition instead of just returning the new state. Here’s an example where we use a guard method to mutate the state of lightSwitch after each transition.

class LightExample {
    Switch lightSwitch = new Off();
 
    public class Switch implements State<Switch> {
        @Override
        public void afterTransition(Switch from) {
            LightExample.this.lightSwitch = Switch.this;
        }
    }
    public class On extends Switch implements TransitionTo<Off> {}
    public class Off extends Switch implements TransitionTo<On> {}
 
    @Test
    public void stateful_switch() {
        assertTrue(lightSwitch instanceof Off);
        lightSwitch.tryTransition(On::new).ignoreIfInvalid();
        assertTrue(lightSwitch instanceof On);
        lightSwitch.tryTransition(Off::new).ignoreIfInvalid();
        assertTrue(lightSwitch instanceof Off);
    }
}

Show me the code

The code is on github if you’d like to play with it/see full executable examples

Posted by & filed under Java.

Another use of lambda parameter reflection could be to write html inline in Java. It allows us to create builders like this, in Java, where we’d previously have to use a language like Kotlin and a library like Kara.

String doc =
    html(
        head(
            meta(charset -> "utf-8"),
            link(rel->stylesheet, type->css, href->"/my.css"),
            script(type->javascript, src -> "/some.js")
        ),
        body(
            h1("Hello World", style->"font-size:200%;"),
            article(
                p("Here is an interesting paragraph"),
                p(
                    "And another",
                    small("small")
                ),
                ul(
                    li("An"),
                    li("unordered"),
                    li("list")
                )
            )
        )
    ).asString();

Which generates html like

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
 
<html>
  <head>
    <meta name="generator" content=
    "HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
<meta charset="utf-8"><script type="text/javascript" src=
"/some.js">
</script>
 
    <title></title>
  </head>
 
  <body>
    <h1>Hello World</h1>
 
    <p>Here is an interesting paragraph</p>
 
    <p>And another<small>small</small></p>
 
    <ul>
      <li>An</li>
 
      <li>unordered</li>
 
      <li>list</li>
    </ul>
  </body>
</html>

Code Generation

Why would you do this? Well we could do code generation. e.g. we can programmatically generate paragraphs.

body(
    asList("one","two","three")
        .stream()
        .map(number -> "Paragraph " + number)
        .map(content -> p(content))
)

Help from the Type System

We can also use the Java type system to help us write valid code.

It will be a compile time error to specify an invalid attribute for link rel.

It’s a compile time error to omit a mandatory tag

It’s also a compile time error to have a body tag inside a p tag, because body is not phrasing content.

We can also ensure that image sizes are in pixels

Safety

We can also help reduce injection attacks when inserting content from users into our markup, by having the DSL html-encoding any content passed in.

e.g.

assertEquals(
    "<p>&lt;script src=&quot;attack.js&quot;&gt;&lt;/script&gt;</p>", 
    p("<script src=\"attack.js\"></script>").asString()
);

How does it work?

See this previous blogpost that shows how to get lambda parameter names with reflection. This allows us to specify the key value pairs for html attributes quite cleanly.

I’ve created an Attribute type that converts a lambda to a html attribute.

public interface Attribute<T> extends NamedValue<T> {
    default String asString() {
        return name() + "=\"" + value()+"\"";
    }
}

For the tags themselves we declare an interface per tag, with a heirarchy to allow certain tags in certain contexts. For example Small is PhrasingContent and can be inside a P tag.

public interface Small extends PhrasingContent {
    default Small small(String content) {
        return () -> tag("small", content);
    }
}

To make it easy to have all the tag names available in the context without having to static import lots of things, we can create a “mixin” interface that combines all the tags.

public interface HtmlDsl extends
        Html,
        Head,
        Body,
        Link,
        Meta,
        P,
        Script,
        H1,
        Li,
        Ul,
        Article,
        Small,
        Img
        ...

Then where we want to write html we just make our class implement HtmlDsl (Or we could staticly import the methods instead.

We can place restrictions on which tags are valid using overloaded methods for the tag names. e.g. HTML

public interface Html extends NoAttributes {
    default Html html(Head head, Body body) { 
    ...

and restrict the types of attributes using enums or other wrapper types. Here Img tag can only have measurements in pixels

public interface Img extends NoChildren {
    default Img img(Attribute<String> src, Attribute<PixelMeasurement> dim1, Attribute<PixelMeasurement> dim2) {
    ...

All the code is available on github to play with. Have a look at this test for executable examples. n.b. it’s just a proof of concept at this point. Only sufficient code exists to illustrate the examples in this blog post.

What other creative uses can you find for parameter reflection?

Posted by & filed under Java.

Java 8 introduced a compiler flag -parameters, which makes method parameter names available at runtime with reflection. Up to now, this has not worked with lambda parameter names. However, Java 8u60 now has a fix for this bug back-ported which makes it possible.

Some uses that spring to mind (and work as of recent 8u60ea build) are doing things like hash literals

Map<String, String> hash = hash(
    hello -> "world",
    bob   -> bob,
    bill  -> "was here"
);
 
assertEquals("world",    hash.get("hello"));
assertEquals("bob",      hash.get("bob"));
assertEquals("was here", hash.get("bill"));

Or just another tool for creating DSLs in Java itself. For example, look how close to puppet’s DSL we can get now.

static class Apache extends PuppetClass {{
    file(
        name -> "/etc/httpd/httpd.conf",
        source -> template("httpd.tpl"),
        owner -> root,
        group -> root
    );
 
    cron (
        name -> "logrotate",
        command -> "/usr/sbin/logrotate",
        user -> root,
        hour -> "2",
        minute -> "*/10"
    );
}}

Doing

System.out.println(new Apache().toString());

Would print

class Apache {
 
	file{
		name	=> "/etc/httpd/httpd.conf",
		source	=> template(httpd.tpl),
		owner	=> root,
		group	=> root,
	}
	cron{
		name	=> "logrotate",
		command	=> "/usr/sbin/logrotate",
		user	=> root,
		hour	=> "2",
		minute	=> "*/10",
	}
 
}

The code examples for the hash literal example and the puppet example are on Github.

They work by making use of a functional interface that extends Serializable. This makes it possible to get access to a SerializedLambda instance, which then gives us access to the synthetic method generated for the lambda. We can then use the standard reflection API to get the parameter names.

Here’s a utility MethodFinder interface that we can extend our functional interfaces from that makes it easier.

public interface MethodFinder extends Serializable {
    default SerializedLambda serialized() {
        try {
            Method replaceMethod = getClass().getDeclaredMethod("writeReplace");
            replaceMethod.setAccessible(true);
            return (SerializedLambda) replaceMethod.invoke(this);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
 
    default Class<?> getContainingClass() {
        try {
            String className = serialized().getImplClass().replaceAll("/", ".");
            return Class.forName(className);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
 
    default Method method() {
        SerializedLambda lambda = serialized();
        Class<?> containingClass = getContainingClass();
        return asList(containingClass.getDeclaredMethods())
                .stream()
                .filter(method -> Objects.equals(method.getName(), lambda.getImplMethodName()))
                .findFirst()
                .orElseThrow(UnableToGuessMethodException::new);
    }
 
    default Parameter parameter(int n) {
        return method().getParameters()[n];
    }
 
    class UnableToGuessMethodException extends RuntimeException {}
}

Given the above, we can create a NamedValue type that allows a lambda to represent a mapping from a String to a value.

public interface NamedValue<T> extends MethodFinder, Function<String, T> {
    default String name() {
        checkParametersEnabled();
        return parameter(0).getName();
    }
    default void checkParametersEnabled() {
        if (Objects.equals("arg0", parameter(0).getName())) {
            throw new IllegalStateException("You need to compile with javac -parameters for parameter reflection to work; You also need java 8u60 or newer to use it with lambdas");
        }
    }
 
    default T value() {
        return apply(name());
    }
}

Then we can simply ask our lambda for the name and value.

NamedValue<Integer> pair = five -> 5;
assertEquals("five", pair.name());
assertEquals(5, pair.value());

See also HTML in Java for another example use of this.

I imagine there are more uses too, any ideas?

Posted by & filed under Java.

Java allows casting to an intersection of types, e.g. (Number & Comparable)5. When combined with default methods on interfaces, it provides a way to combine behaviour from multiple types into a single type, without a named class or interface to combine them.

Let’s say we have two interfaces that provide behaviour for Quack and Waddle.

interface Quacks {
    default void quack() {
        System.out.println("Quack");
    }
}
interface Waddles {
    default void waddle() {
        System.out.println("Waddle");
    }
}

If we wanted something that did both we’d normally declare a Duck type that combines them, something like

interface Duck extends Quacks, Waddles {}

However, casting to an intersection of types we can do something like

with((Anon & Quacks & Waddles)i->i, ducklike -> {
    ducklike.quack();
    ducklike.waddle();
});

What’s going on here? Anon is a functional interface compatible with the identity lambda, so we can safely cast the identity lambda i->i to Anon.

interface Anon {
    Object f(Object o);
}

Since Quacks and Waddles are both interfaces with no abstract methods, we can also cast to those and there’s still only a single abstract method, which is compatible with our lambda expression. So the cast to (Anon & Quacks & Waddles) creates an anonymous type that can both quack() and waddle().

The with() method is just a helper that also accepts a consumer of our anonymous type and makes it possible to use it.

static <T extends Anon> void with(T t, Consumer<T> consumer) {
    consumer.accept(t);
}

This also works when calling a method that accepts an intersection type. We might have a method that accepts anything that quacks and waddles.

public static <Ducklike extends Quacks & Waddles> void doDucklikeThings(Ducklike ducklike) {
    ducklike.quack();
    ducklike.waddle();
}

We can now invoke the above method with an intersection cast

doDucklikeThings((Anon & Quacks & Waddles)i->i);

Source code

You can find complete examples on github

Enhancing existing objects

Some have asked whether one can use this to add functionality to existing objects. This trick only works with lambdas, but we can make use of a delegating lambda, for common types like List that we might want to enhance.

Let’s say we are fed up of having to call list.stream().map(f).collect(toList()) and wish List had a map() method on it directly.

We could do the following

List<String> stringList = asList("alpha","bravo");
with((ForwardingList<String> & Mappable<String>)() -> stringList, list -> {
    List<String> strings = list.map(String::toUpperCase);
    strings.forEach(System.out::println);
});

Where Mappable declares our new method

interface Mappable<T> extends DelegatesTo<List<T>> {
    default <R> List<R> map(Function<T,R> mapper) {
        return delegate().stream().map(mapper).collect(Collectors.toList());
    }
}

And DelegatesTo is a common ancestor with our ForwardingList

interface DelegatesTo<T> {
    T delegate();
}
interface ForwardingList<T> extends DelegatesTo<List<T>>, List<T> {
    default int size() {
        return delegate().size();
    }
    //... and so on
}

Here’s a full example