Besty – A BPEL Engine Test System

In my work of the last years, I have dealt quite often with the Web Services Business Process Execution Language (BPEL). BPEL is an open specification of a (Web services-based) process language and thanks to that you can write process definitions in this language without locking in to a specific BPEL engine. So if your engine vendor decides to increase the prices for the the next release, you can just switch to one of the several open source engines available without having to modifiy your actual program code. That – at least – is the theory.
Practically, we have used different engines in our group over the last years and it has always annoyed us that each engine comes with its pecularities, specialities, limits in support, and addons. No engine actually supports the complete specification, but only varying parts and that essentially links a process definition to the platform of its definition and makes the portability of process definitions an illusion. Furthermore, the gap of what is defined in the specification and what actually works for an engine (as oposed to what engine providers claim that works) was so large for some engines that it made development really frustrating from time to time. That – at least – was my impression.

Recently, I teamed up with my colleague Simon Harrer to replace this impression with some hard facts. That is, we wanted to get a comparable picture of what parts of the BPEL specification actually are supported in today’s engines. The outcome of our conspirancy is betsy, a tool for testing BPEL engines, in particular for determining the standard conformance of engines. It is freely available on Github and licensed under LGPL, so feel free to use it and we also welcome participation and improvements to it. In this blog post, I give a short outline of its structure and describe how it works. A more comprehensive description is available in its architectural whitepaper.

Besty consists of a testing engine that can transform test cases written in pure standard BPEL to deployment artifacts for specific engines, execute the tests, and aggregate the results to a set of reports. On top of that, betsy provides a large set of test cases (140) for checking standard conformance to the BPEL spec.

Requirements and Execution

Betsy is written in Groovy and makes heavy use of soapUI for sending and validating SOAP messages. The build tool we use is gradle. To install and run betsy, you need:

  • JDK 1.7.0_3 (64 bit) or higher (including the setting of the JAVA_HOME environment variable)
  • Ant 1.8.3 or higher (including the setting of ANT_HOME)
  • SoapUI 4.5.1 (64 bit)

In the current version of the tool, we link SoapUI by its installation path (which defaults to C:\Program Files\SmartBear\soapUI-4.5.0) and use .bat scripts. Please note that this ties the tool to the Windows operating system family. However, the scripts and installation paths can be modified to work on Linux as well. Most of the scripts do nothing more than starting up and shutting down specific Tomcat instances, so this should not be too difficult.
You can download the software by cloning from our git repository at github. Simply use this command:

git clone

On execution, you provide betsy with the name of the engine(s) you want to test and the names of the test cases you want to execute. A run of betsy works like this:

That is, betsy first organizes execution and result directories, executes each test case specified for each engine specified and thereafter aggregates the test results. Each test case execution is strictly sequential and each engine is reinstalled from scratch for each test case. This implies that a run can take quite long. For 140 test cases and five engines, it takes around seven hours on our testing server (i7 with 16 GB RAM). However, this is a necessary restriction as parallel test executions can corrupt the results – some engines turned out to not handle parallelism very well – and single test cases were capable to disable engines for any further use and make a reinstallation necessary. This indicates that performance testing might be worthwhile (and would likely produce outrageous results), but so far we limited the tool to conformance testing.

To fire up betsy, you can can use gradle with according parameters. A call would for instance be:

gradlew run -Pargs="ode SEQUENCE,FLOW"

which executes the test cases for the sequence and flow activities on Apache ode. If you leave out the arguments and just execute gradlew run in the project root, all tests for all engines we support will be exeuted, so keep in mind that it can take time.

Betsy natively supports five engines. All are open source and written in Java. The engines are:


Betsy currently provides reports in html, csv and latex table format. The html reports offer the possibilty to drill down to the SOAP messages exchanged for every test case and engine. Here is a rough outline of how they look (I cannot embed the generated html in the blog post, so it is just an image):
Examplary Result Report

These are the results for all engines and three tests for certain structured activities. If you look at the complete set using all 140 test cases, it is not at all (repeat: not at all) that green. A discussion of the implications of our findings will be part of a future blog post. For now, I hope I could raise your interest in betsy! If you want to know more, visit the project page.

8 thoughts on “Besty – A BPEL Engine Test System

  1. You have a point about the invoke-catch test case, I think. I’ll have a look into it. One of the engines correctly deals with the case, but defining the fault in the WSDL might make more engines pass the test. Thanks for the feedback!

  2. You’re right about validation, but I recall there being some problem where it wasn’t running for some reason. I’m sure I fixed that in 5.3-SNAPSHOT but it’s been a while since I’ve looked at the code.

    I read through the summary of results for bpel-g in your paper and extracted all of the problems into the project’s issue list:

    There are a few things in that list that I don’t see how they could be broken but ultimately it requires writing a test to see. I won’t have time to look at this for a while (perhaps December/January) but I took a quick look at the BPEL/WSDL for your invoke/catch example. The partner’s WSDL doesn’t have a wsdl:fault for the operation. Shouldn’t the WSDL define faults for the operation? I’ve been away from BPEL for a while, but I don’t recall what the expected behavior is for unexpected faults from a service.

  3. Hi Mark,

    I thought validation was bpel-g’s default behaviour? Looking at I can only find a way to turn it of (which we don’t). Let me know if we are missing a configuration option to turn on validation.

    Concerning bpel-g’s failures: Most of the problems relate to not replying certain faults in the cases where they should be replied. Furthermore bpel-g has problems in handling xsd:dateTime and xsd:duration elements in for- and until blocks. All in all however, bpel-g passes more tests than any other engine!

    Betsy helping to improve bpel-g would be great news indeed :-)

  4. I’m surprised that bpel-g had so many failures. I’m curious if you ran the tests with process validation enabled. With this feature enabled, invalid process definitions are rejected at deployment time as opposed to being allowed to fail at runtime. There was something about parallel non-start activities that made me think you ran without validation.

    In any case, BETSY seems like a very useful tool. One of the things sorely lacking with bpel-g is a complete suite of unit and integration tests since these were never part of the open source release. Perhaps BETSY could serve as a good replacement for these tests.

    I’ll review the failures when I have time and see if I can get it to 100%.

  5. In addition to Jörg’s post, I want to add that we investigate the issue of the log files of Tomcat and ODE, too. As a “quick fix”, you can invoke a single test via gradlew run -Pargs=”ode WHILE” as the log files will not be deleted in that case. Furthermore, the log4j configuration is changed to DEBUG for most of ODEs loggers during installation leading to a very detailed log.

  6. Thanks for your interest in the project! It would be really great if it is actually used to improve the engines. Also thanks for the hint on ODE’s in-memory mode, we weren’t aware of that and will look into it.

    As for the testing speed, we know it is really slow. We spend quite some time on parallelizing the whole process, but as it turned out there were several nasty edge cases with race conditions and the only way to really produce deterministic results is by sequentializing the whole thing. Nevertheless, you can command the tool to install the engine only once using a specific flag. Use this command to run all tests for ODE:

    gradlew run -Pargs=”-s ode ALL”

    However, you will see that ODE runs into an endless loop when executing one of the repeat until test cases (see this process: It seems that ODE has problems with the equality comparison in the repeatUntil. This also makes all subsequent test cases fail. We could only resolve this by reinstalling the engine.

  7. Very cool project, thanks for making it available. I just opened two issues at the project page since I’d like to use the result in order to improve the standard compliance of Apache ODE. Unfortunately I could not locate the engine’s log and in addition I figured that you’re currently using ODE’s in-memory mode, which is very restricted and thus not really suitable for a feature comparison. Also, the tests run really slow. Is it really necessary to unpack, start, deploy, delete Tomcat and ODE for every single test? I understand that this might be needed to have deterministic results, however it would be great to have dev mode that enables to speed things up.

Comments are closed.