Archive for the ‘Uncategorized’ Category

Unlimited scaling, easy!

Friday, August 1st, 2008

Suppose you want to develop a high-volume transaction processing system in Java/J2EE. How would you do it? Most people would say: don’t use JTA/XA transactions because they kill performance. Wrong. And they would also say: use an appserver to scale. Again, they couldn’t be more wrong.

Here is the magic recipe on how we build systems with virtually unlimited scalability at Atomikos:

  • Kick out your appserver as soon as you can, as explained here. J2EE is not limited to an appserver. J2EE is a set of APIs. The appserver ties these APIs to a programming model that almost nobody needs. Conclusion: drop the latter.
  • Use a persistent JMS queue to store transaction requests. This allows easy load-balancing and provides crash resilience for ongoing requests. It also de-couples the clients from the transaction processing system.
  • Use ExtremeTransactions to process the requests (stored in JMS). This allows for reliable, exactly-once message processing as outlined here. Make sure to use the supplied JMS and JDBC drivers!
  • To add more power, just add a second VM (process) on a separate CPU.
  • Repeat until performance is high enough.

You will reach the required performance because of the intra-VM nature of each process you add. The only potential bottlenecks are your own database or JMS backend. So scaling comes down to scaling your backends, which is much simpler than scaling your application itself (which has already been done in a natural way as outlined above).

So don’t let anybody fool you: transactions do scale - even without limits!.

Loosely-coupled deployment vs loosely-coupled design

Thursday, July 31st, 2008

I have talked to a number of people who claim to be doing SOA, when in the end all they do is loosely-coupled design. Let me explain what I mean by an example.

A team of enterprise architects was designing an SOA infrastructure for a bank I know. The system they were building would be based on interfaces, so that it would be possible to deploy parts of the system as separate instances later on. This was their notion of SOA…

The good thing about it is that there are interfaces in their design, meaning it is likely to be loosely-coupled. The bad news is that this is not SOA, at least not in my view: one of the biggest advantages of SOA - reuse in place - is never realized in this way. So, whereas this approach to ‘SOA’ may be loosely coupled in design, it is not loosely coupled in deployment (which is at least as important).

The consequence? Whenever a ’service’ is upgraded, they will need to upgrade all the dependent services and redeploy them. This is because each ’service’ is really an embedded module inside other parts of the system.

I guess this also holds for the debate on cloud vs grid computing: in my view, a cloud is more loosely coupled than a grid in its deployment.

BPEL and compensation

Friday, October 19th, 2007

Is BPEL a good tool for implementing compensation? It really depends, and you really have to know what you are doing - which (with all respect) doesn’t seem the case for most people (not even BPEL specialists). So if not even those experts know, how can we expect the rest of us to know? Hence this blog entry.

For instance, on repeated occasions I have heard renowned BPEL and workflow experts mention that compensating transactions are “perhaps” best modeled at the business logic level. This, by the way, includes Bill Burke in the case of JBoss/jBPM - see here. Note that I emphasized the word “perhaps”: this indicates the shade of misunderstanding usually present in the arguments.

I have been saying this here and there in the past (and in fine detail in this article), but I want to repeat it again: BPEL, nor workflow nor WS-BA are ideal for compensation unless the compensating party doesn’t care whether it needs to compensate eventually. In other words, if the compensation is business as usual to the provider of the compensatable service then BPEL might be OK (though certainly not desirable - see below).

Why is that? Put yourself in the place of a service that is asked to compensate by a BPEL engine somewhere. Also suppose that you are in a B2B ecosystem where you don’t necessarily trust the party that owns the BPEL engine. Now what would you rather do: trust the BPEL to compensate - eventually (which might be never!) or rather deal with compensation yourself, say after a timeout? I would definitely choose the latter. I don’t want someone else to decide when I need to compensate. I want to decide for myself, and the Atomikos TCC model allows for that. BPEL and jBPM don’t.

So BPEL is ruled out for me - at least as far as compensation goes. What about WS-BA? It is a step in the right direction, but unfortunately it is a bloated protocol, very inefficient and loaded with application-level messages that pollute the compensating part. Even worse, it also suffers in a large part from the lack of timeout and depends on the BPEL to at least trigger compensation.

Also, WS-BA doesn’t allow for application logic on close - I won’t go and bother you with the entire spec details but it is like a try..catch…finally where the exception is raised by the client (ugly!) and where the finally block can only be empty! Again, Atomikos TCC is far superior, more efficient and more elegant. It is also more natural for compensation than any BPEL engine will ever be.

One last note on BPEL and this supposed “modeling the compensation in the business process”: I was talking to an IBM architect the other day. He said that they were doing a large telco project with BPEL to co-ordinate things. One of the things he complained about was exactly this: they have to model the compensation and error logic as explicit workflow paths, and it was literally overloading everything with complexity. Moreover, this complexity is hard to test. As he correctly put it, they were implementing a transaction manager at the business logic (BPEL) level, over and again in every process model. In addition, this was also hard to test he said and that it was virtually killing the project - especially if there were change requests to consider. I believe him:-) I gave him the URL to our TCC article above.

Atomikos and TCC allow you to focus on the happy path of your workflow models. We take care of the rest. Now imagine what a reduction in complexity that is, and how much more reliable things get! So no, compensation should NOT be modeled at the business level. Except on rare occasions maybe.

REST and reliability

Friday, October 19th, 2007

Whenever I see a presentation on REST I am impressed by its simplicity. With just four operations (GET, POST, PUT, DELETE) it seems to accomplish a simple model for service-oriented architectures, where every business resource has a URL.

With this simplicity, REST also leverages the ubiquitous HTTP protocol as the underlying mechanism. More and more people seem to like this, including me.

However, the big question for me is: how do you make this reliable? Imagine that you integrate 4 systems in a REST style. You would be using HTTP and a synchronous invocation mechanism for each service. Now comes the question: how reliable is this? The answer: less than the least reliable system that you are using! More precisely, availability goes down quickly because your aggregated service fails as soon as one of the services fails…

With transports like JMS you can improve reliability, but how do you do REST of JMS, given its close relationship with HTTP and URLs? That is the problem with REST for me.

Data Replication in SOA: The Price of Loose Coupling

Thursday, October 11th, 2007

When designing a corporate SOA architecture you are often faced with a tough choice: do you rely on a common database (centralized) or do you implement replication instead?

Let me explain what I mean. The idea in SOA is that you define more or less independent services that correspond (hopefully) to clearly defined and business-related activities. For instance, you could have a customer management service and a payment/invoicing service. The customer management service belongs to CRM, the invoicing to the billing department. However, both of these services might need the same customer data. Now what do you do? Basically, you have the following options:

  1. Use the same centralized customer database. This gives you the benefit of easy maintenance because there is only one copy. However, this also means that you are coupling your services into the same database schema, and updates to the schema are likely to affect more than one service.
  2. Replicate the customer database, by identifying one master (the CRM?) that regularly pushes or publishes updates (in an XML feed, for instance). While you lose the benefit of easy maintenance, this does give you loose coupling: as long as the XML format is the same, you can change DBMS schemas as much as you like - without affecting other services.
  3. Merge the customer and invoicing services into one. However, this may not always be possible or desirable, and may even defeat the purpose of service-oritentation altogether.
  4. Have the invoicing query the customer service for each payment. Thi seems to incur a lot of dependencies and network traffic.

So what do you do? My preference tends to go to the second option. However, it means that realistic SOA architectures are likely to have an event-driven nature.

Atomikos Offers 3rd Generation TP Monitors

Monday, October 8th, 2007

This post on InfoQ was made by Arjuna, one of our (ex) competitors after JBoss (and then Red Hat) bought their transaction technology.

More interesting than the referred paper are the comments, which I would like to discuss here. Most posts seem to rule out transactions as something that doesn’t scale. None of these comments I agree with.

The main complaints uttered seem to fall into these categories:

  1. Transaction managers are supposedly centralized.
  2. Transaction managers are accused of overhead for two-phase commit and synchronization.

I will now show that both these statements are a misconception, claiming that the 3rd generation transaction monitor already exists. Moreover, I will show that 3rd generation transaction managers are better than (or at least as good as) the alternatives - when used correctly.

The product I am talking about is Atomikos ExtremeTransactions, including its JTA/XA open source edition named TransactionsEssentials. Let me now outline why none of the above objections are actually accurate:

  1. Atomikos ExtremeTransactions is a peer-to-peer system for transactions. Whenever two or more processes are involved in the same transaction, the transaction manager component (library) in each process will collaborate with its peer counterpart in the other process. This is how it is done. Consequently, there is no centralized component nor bottleneck. Our studies have shown that this gives you linear (i.e., perfect) scalability. This invalidates the first criticism above.
  2. While two-phase commit does incur some synchronization, the same is true for any other solution (assuming that you want to push operations to one or more backends). A simple example to illustrate my point: many people think that queuing is a way to avoid the need for transactions (and two-phase commit). Is it? Hardly: even if we neglect the resulting risk in message loss (see then you have to realize that most queueing systems use two-phase commit internally anyway. This invalidates the second criticism above.
  3. The often-heard criticism that transactions may block your data is not fair either.
    There is some interesting theoretical work done by Nancy Lynch (MIT) et al - I believe it is this one. Basically, this is mathematics that proves that you cannot have a non-blocking (read: perfect) solution for distributed agreement in realistic scenarios.
    In practice, this means that a queued operation may not make it if the connection to the receiver is down too long. So your system is ‘blocked’ in the queue, even though you don’t use transactions. This is the equivalent of the perceived ‘blocking’ but now placed in a non-transactional scenario.
  4. Again on the perceived synchronization overhead: if you don’t keep track of “what” you have done and “where” (by synchronizing) then you end up with an error-prone process. This is especially true for many critical applications that consume messages and insert the results in a database. If you don’t use transactions then you will find yourself implementing duplicate message detection and/or duplicate elimination, none of which are safe without the proper commit ordering. Basically, you are implementing a transaction manager yourself (yuk!).

Am I saying that transactions and two-phase commit don’t block? Not exactly - especially if you use XA then things can block. However, Atomikos avoids this in two ways:

  • Very strong heuristic support: unilateral decision are encouraged both in the backend and in the Atomikos transaction manager. If a transaction takes too long, it is terminated anyhow. Where classical scenarios would block, Atomikos enforces a unilateral termination by either party. The resulting anomaly is reflected in the transaction logs, so the transaction manager can track problem cases (instead of letting you chase different systems to find out what happened - the alternative without transactions). Ironically, we have seen more blocks caused by non-XA transactions: if your database does not support an internal timeout mechanism for non-XA (which seems to be so in the most commonly used DBMS) then it will be non-XA transactions that cause the blocking!. I can go on for hours about this - but that is another post.
  • Atomikos also offers local transactions with compensation instead of rollback: you can use our TCC (Try-Cancel/Confirm) API to handle overall rollback. This allows you to use non-XA, local transactions. It never blocks your application, ever! TCC is similar to WS-BA, only better because we have been working on it for much longer than anybody else in the world. See for more on TCC.

Summing up then: do I recommend two-phase commit? Yes, if needed. In the past, this need arose out of legacy integration. In the present and future, that need arises out of up-front requirements. The most typical use cases are:

  • Processing persistent messages with exactly-once guarantees. There is no substitute for the reliability and ease of Atomikos ExtremeTransactions here. Note that this can be done intra-process!
  • Across processes/services if you have a reservation model inherent in your business process. Our TCC technology will make sure that your database never blocks.

More information about Atomikos products can be found here

My JP06 talk on Parleys…

Tuesday, May 8th, 2007

My J06 talk on WS-AT and WS-BA is now online here

Thanks to Stephan Janssen, Guy Crets and the rest of the BEJUG crew!


Controlling Lock Duration in the DBMS

Tuesday, March 6th, 2007

Some databases, like Oracle in particular, don’t seem to allow you to set the maximum duration for transactions (hence locks). This implies that some applications (those that don’t behave well) can be holding long-lived locks on your data. The result is that some data may become unavailable (even for days in one particular case I have seen!!!).

The solution? I am not sure about other products, but the Atomikos transaction libraries make sure that none of your applications can hold locks longer than the configured XA transaction timeout. Meaning: you get the benefit of ensured control and availability of your data. It’s ironic really; many people believe that XA can block your data but as this case shows it is exactly the opposite!

Building a SOA with JINI

Monday, February 26th, 2007

I have been waiting for ages to see web services get ready for SOA. Recently, hinted by a customer, I (re)discovered JINI. What that moment was like? Well, looking at JINI (in combination with JavaSpaces) I saw a dynamic lookup platform based on interfaces (read: capabilities - not names) and with scalability, self-healing characteristics and the performance of RMI. Javaspaces even adds the best of messaging and asynchrony. It sounds too good to be true.

To be continued…

WS-BA for Compensation-Based Web Services

Saturday, February 24th, 2007

For my Javapolis 2006 talk I decided to have a closer look at the WS-BA specification (then still in draft) and its relationship to BPEL 2.0 (then also still in draft). While I was at it, I also decided to use the committee’s minutes to clarify any remaining questions I had. This exercise took me a few days but the result made clear that the WS-BA protocol has serious limitations that make it not so useful as it could be:

The WS-BA protocol is almost entirely modeled after BPEL. WS-BA participants map one-to-one to BPEL compensation scopes. Because BPEL doesn’t provide close handlers, neither does the WS-BA protocol allow application logic on close. The implication? If you model your services as WS-BA services then you remain ‘in-doubt’ about every service invocation (in theory, the WS-BA close event would notify you that the deal is closed, but you’re not supposed to do business logic in that callback so it might as well not be there).

To give an example: if you are an airline and want to use WS-BA to make seat reservations transactional then you would never know whether any reservation needs to be canceled or not. More precisely: it will always be possible for any of your current reservations to be compensated at some later time.

The bottom line for you as a service provider: compensation is always possible. The consequence is far-reaching: how do you produce sales reports? You can’t, unless you accept that you are dealing with temporary data (that may later be compensated for). Every single sale you made can theoretically still be compensated.

Fortunately, WS-BA and BPEL allow you to model compensation as something that costs to the customer, so your sales reports may not suffer that much from compensation after all. But this leads us to another problem I have with WS-BA/BPEL: if you model compensation as something that leaves tangible effects (costs?) for the customer then what good is it for me to have that kind of transactional guarantee? After all, BPEL also says that compensation can be triggered by the failure of a parent task. So my customer may have to pay for my service just because some intermediary task has failed! I am not sure if it is just me, but I think this is a big problem.

One more point I have to make about WS-BA is that it appears polluted with workflow messages that don’t really contribute to the purpose of an agreed outcome across services. For instance, the ‘Completed’ message seems to be there just to indicate whether a participating service should be canceled (leave no effects) or compensated. But like I argued before, cancelation can still lead to compensation somewhere down the call stack so this is an utterly useless protocol message anyway. It only makes sense in the context of BPEL. And since BPEL is workflow, WS-BA is a workflow protocol and not a transaction termination protocol. In terms of efficiency it isn’t exactly very good either: there are too many unnecessary message rounds involved. It could all have been much simpler.

My advice: use the Atomikos TCC (Try-Confirm/Cancel) paradigm if you want really reliable and compensation-based web services. It is faster, better and leads to real business-level consistency across service invocations. You will at least know that your sales reports are permanent and correct, and your customers won’t pay for failed business transactions.