Optimistic Commit - SharePlex has time on your side

If you haven’t seen my blog on Oracle Open World 2015 , this topic, “Optimistic Commit” came out of Oracle GoldenGate’s opening session.

When a transaction runs against a database, it is either saved or discarded.  Typically through processes called commit or rollback. Replication can either begin in the same timeframe as the transaction or wait for the commit. If replication begins near the same time as the transaction, this is sometimes referred to as the optimistic commit approach.

The optimistic commit can be critical in replication. Let’s examine the use cases of reporting and disaster recovery as two examples. 

First, let’s take the fictitious but very plausible case of an eCommerce site where they carefully monitor their pricing versus those of competitors. Obviously if they are too low, they are losing margin. If too high, the volume of their sales will suffer. The site carries a large number of products and pricing updates are done on individuals SKUs as well as in bulk. For example, on cyber Monday, there might be a bulk adjustment of all prices. Given the number of the SKUs, regional and currency adjustments, these bulk updates can take 20 minutes and are resource intensive. At this web site (and many others), the ratio of inquiry activity, i.e. “how much does this cost?” versus purchasing is about 1000:1. In order to support significant scaling, the applications updates, i.e. purchases, are directed to one database. All inquiry is directed to a series of reporting databases which replication keeps up to date. Last year on cyber Monday, an error was discovered with pricing after adjustments were run and sales went to 5% of expected volume. The pricing was to have decreased by 31% and instead increased by 31%. With SharePlex’s optimistic commit approach on replication, the prices changes appear in typically less than a second of their completion on the source database and distributed across all the reporting databases. Without optimistic commit, it would take an additional 20 minutes for these pricing changes to appear to customers. With 95% of their business being continued to be lost for another 20 minutes on the busiest shopping day of the year, how important is optimistic commit?

Secondly, let’s take same eCommerce company where replication has been chosen as the means of disaster recovery. Why? Replication sends data to a database already started and up and running. Previously this company had used asynchronous disk mirroring and while the disk mirroring worked perfectly, it wasn’t configured to pick up all the data files so the database didn’t mount. Oracle’s physical standby had also been eliminated because while it is a good disaster recovery solution, it didn’t help maintain availability and replication did. Physical standby DR had been eliminated due to an initiative to reduce operational costs from a labor and infrastructure standpoint.  The same pricing update was being put into next year prior to cyber Monday when a Mr. Cutter A. Cable from Murphy construction severed the network fiber connection to building.  This occurred 3 minutes after the pricing update completed. The IT operations invoked their business continuity plan and resumed services within 2 minutes. Fortunately, with SharePlex running, the updated pricing had completed across all servers. Without the optimistic commit, this would have had to start over resulting in additional lost business.

Time matters. What is the cost per minute of these pricing changes? SharePlex’s optimistic approach puts time on your side instead for working against you.

Let’s change the tone of this blog by going from the hypothetical to some specific numbers around replication performance.

Suppose a batch transaction or other process takes 20 minutes to run. With Dell SharePlex, this transaction begins to post changes to the target system typically within 1 second for a LAN and 3 seconds for the a WAN connection. Just over 20 minutes later, all the data which appears on the source will be committed on the target system. Oracle GoldenGate waits for this commit before it begins to apply changes.

Let’s assume this transaction runs as fast on the target as it does on the source. In actuality, with logical replication systems, there are cases where the target transaction will run faster than the source, this tends to be quite rare. For smaller transactions, i.e. a call center record, customer order, and other customer business transactions, there is some consistency between the run time on the source and the target. It is not unusual for very large batch transactions to take significantly longer to run on the target. When this occurs, the detrimental effect of not having optimistic commit increases as the latency will increase significantly. For purposes of a fair comparison, let’s assume the transaction takes as long to run on the target as it does on the source.

With GoldenGate, the apply process starts after 20 minutes and then takes another 20 minutes to run. This would mean that there would be 1.4% of the day where latency between the source and target is 20 minutes. Let’s assume the batch processing really helped the business.  So the batch was now run 4 times a day. However, a single follow on process which is dependent on data from the first batch was added in and took 5 minutes each time it ran. This would create latency 6.9% of the time. Again the batch process really helped to conduct business and it is desired to run every hour. With this much activity, some optimization was done and the batch was shortened by 25% to 15 minutes. However, the follow on process took an additional two minutes to run but had to be run twice. At this point, the target system would be behind nearly 50% of the time.  An increase of 1 minute with an additional iteration of the follow on process means there is latency during an additional 16.67% of the day. No tuning or amount of hardware on the target will change that. None of this latency occurs with SharePlex. None.

This can be summarized below.

Perhaps this kind of batch expansion seems unrealistic. It is the result of the following:

 

  • Single Process Introduced

  • Usage Increase

  • Existing batch tuned but new additional process added

  • Usage on new additional process expanded.

 

It may well be that your application doesn’t do any of that today. It is difficult to ensure the application will not require that kind of process support. Dell SharePlex does not limit an application architecture which can expand with your business’s needs. Don’t let your replication architecture box you in. Keep things current with Dell SharePlex’s low latency replication solution in more ways than one.

Anonymous