Standard Conformance in Open Source BPMN Engines and Design Time Adaptability

In three weeks, from March 30th until April 4th, I will attend the 9th International IEEE Symposium on Service Oriented System Engineering (SOSE 2015) in Redwood City, USA. I have the pleasure of attending the conference, since I’ll present two of my recent research papers. Both deal with my ongoing efforts of benchmarking process engines and computing metrics for process models. In this blog post, I’d like to provide a short outline of the two papers.

BPMN Conformance in Open Source Engines

The first paper is joint work with my colleagues Simon Harrer and Matthias Geiger and emerged from a student project involving Mathias Casar and Andreas Vorndran. Matthias, Simon, and me supervised the project and improved and extended it after completion. Based on these results, we wrote the paper.

As indicated by its name, the study is quite similar to prior work from Simon and me. At SOCA 2012, we presented a paper on BPEL Conformance in Open Source Engines. We built a benchmarking tool for process engines (it is called betsy and is freely available at github) and used it to benchmark the standard conformance of BPEL enignes. Betsy can easily be used for performance benchmarking of process engines as well, but our main interest lies on standard conformance. For the paper I’ll present at SOSE, we adapted betsy to benchmark the standard conformance of BPMN engines. We implemented a set of 70 standard-conformant BPMN processes that use a single feature of the language in an isolated fashion. Betsy takes these process models, generates executable artifacts for specific engines from them, automatically deploys them to the engines, and executes them.

But more on the actual results: We benchmarked Activiti, camunda BPM, and jBPM, i.e., the major players in the Open Source BPMN market. Here is a general overview (more details in the paper) on the number of BPMN language constructs each engine supports:engine-support

ACT stands for actitivities (task, subprocesses, …), BA for basics (sequenceFlows, lanes, participants, …), ERR for errors (illegal combinations of gateways), EV for events, and GW for gateways. We count a construct as fully supported if an engine passes all our tests, we count it as unsupported if an engines passes no tests and everything inbetween is partial support. Without discussing everything in detail you can see that support varies a lot. Just to highlight a few aspects: Suprisingly, even supposedly simple aspects such as the standard looping mechanism in BPMN is not supported by a single engine. Moreover, not even the exclusive gateway is implemented in the same fashion by all engines. Consequently you can’t do conditional branching in BPMN if you want your process model to remain portable. Surprisingly, parallel gateways are supported by all engines. Another interesting aspect can be observed if we look at the feature sets that engines have in common: Common Features

Around two fifths of the total language is supported by all engines in the same manner with respect to semantics (lanes, parallel gateway). That is a quite small part of the language. Around one fifth is not supported by any engine (looping, complex gateway, …). The remaining two fifths are supported by either one engine or a combination of two. Interestingly, camunda BPM supports a strict superset of Activiti. Processes will always be portable from Activiti to camunda BPM, but not vice versa. As a summary: If you want to build portable process models in BPMN, you can currently use only two fifths of the language, everything else will lock you into your vendor. Isn’t that a surprising result, given the promises of BPMN?

We currently working on an extended test set, since we do not yet cover all features of executable process models. Moreover, we work on benchmarking more engines. If you feel that your engine outshines the ones above and should be benchmarked, please let me know!

On the Measurement of Design-Time Adaptability for Process-Based Systems

My second paper is a bit more theory-heavy than the first. It is part of my ongoing effort of developing a measurement framework for the portability of process models. The ISO/IEC systems and software quality model defines portability as a major quality characteristic with several subcharacteristics and one of them is adaptability. In my case, adaptability refers to the ease with which you can modify a process model at design-time to enable its execution on another engine. In the paper, I develop several metrics for measuring this concept, validate them theoretically with two measurement frameworks and practically with a large set of real-world BPMN process models.

The idea for measuring adaptability presented in the paper work in the following: Every programming language has a large vocabulary and there are many ways in which you can express the same functionality. This also applies to process languages like BPMN. For instance, in BPMN you have receive tasks and receive message events. Both do the same thing: consume a message. So, if an engine does not support a receive task, but supports receive events, you can just exchange the receive tasks in your process model with receive events and execute it on the engine. This is just a single example, but for nearly every language element in BPMN there are multiple alternatives. The more such alternatives exist, the more adaptable the language element is. In the paper, I formalized this idea and defined metrics based on it. I won’t talk about the math or the theoretical validation in this blog post, but you can read that up in the paper. Instead, I’ll provide a little information of the practical evaluation.

For the evaluation, I implemented the metrics computation in my metrics suite, prope, which is freely available on github. Then I used the Open HUB Open Source network to query for BPMN process models. This network crawls and indexes the most important open source software repositories (github, sourceforge, etc.). I queried the network for files ending in .bpmn, .bpmn2, or .xml, that contain the BPMN 2.0 namespace, and the top-level definitions element. This resulted in a large amount of files, which I downloaded for further analysis. I performed some sanity checks on the files to see if they actually contain BPMN code, in particular BPMN processes, and also performed reference checking on these processes using an early version of the BPMNspector. This resulted in a set of around 2700 BPMN processes for which I computed metric values. Based on this set, I performed several statistical analyses and tests to verify quality properties of the metrics, such as their discriminative power, stability and their ability to analyse processes of vastly different size. If you’re interested, I encourage you to look at the paper. Let me just provide one take home point:

Element NumbersI also replicated the analysis of a famous paper on “How Much Language is Enough? Theoretical and Practical Use of the Business Process Model and Notation” by zur Muehlen and Recker published in 2008. In that paper, the authors look at what elements of BPMN are actually used in practice and found that only the most basic elements are actually used, anything complicated is used close to never. The picture above lists the most frequent BPMN elements and the percentage of process models in which they occur for the models I gathered. These results are basically the same as in the study by zur Muehlen and Recker. Unsuprisingly, sequence flows and ordinary start and end events are used in basically every process model. Next most frequent are certain task (or rather tasks in general), but everything else is hardly used. Drawing the line to the first paper I’ll present at SOSE: If no advanced constructs of the language are used, porting process models is just so much easier :-)