Quest Customer Story: Tuning SharePlex with Analyze

In an earlier blog, I showed how the SharePlex ANALYZE CONFIG command could be used to get a picture of how an application interacts with your database.   It’s also a great tool to use to tune SharePlex, especially with very large databases.

A Tuning Scenario

A customer recently came to us with the need to speed up reporting.  They have an 8 Tb database collecting billing data from a variety of sources, including real-time inputs and periodic but random batch updates.  The batch updates come several times per day and range in size from a few hundred bytes. 

Our customer needed a way for their customers to look up their bills; but wanted to be sure they had a consistent experience.   Using the same database, when the batch processes were running, response time for the customer queries was unacceptable.   Also, in some cases, the customer queries would cause slowdowns in the real-time billing updates.

The Reporting Use Case

The customer purchased and installed SharePlex and did some “basic” tuning of the target database, making sure tables and indexes were in separate tablespaces and so on. Data replication was proceeding, and their customers were able to query the target database; but, during the large batch updates on the source, they saw latencies (the amount of time it took for a transaction to appear on the target after it had completed on the source) in excess of 30 minutes, which was not acceptable to their customers.

Using Analyze

To help bring the latencies down to an acceptable level (5 minutes or less), we used ANALYZE to see what was happening on the source database.  To recap, ANALYZE will use SharePlex to capture statistics on changes to the source database, at the table level.   Here’s ANALYZE command syntax:

analyze config filename n {minutes | hours | days} [ on host |

on host:portnumber |

on login/password@host |

on login/password@host:portnumber  

We let ANALYZE run for 3 weeks, to be sure we captured data across all of the customer processing periods.

Once we had the analyze results, we took the data and broke it up by groups of tables.  Here’s an example of Analyze output.

Group 1 of related tables: 1000 total operations in group

"TEST"."SS2_TEST1" 346

"TEST"."SS2_TEST2" 348

"TEST"."SS2_TEST3" 306

Group 2 of related tables: 1124 total operations in group

"TEST"."SRC_TEST1" 232

"TEST"."SRC_TEST2" 177

"TEST"."SRC_TEST3" 178

"TEST"."SRC_TEST4" 175

"TEST"."SRC_TEST5" 188

"TEST"."SRC_TEST6" 174

 Note that this example is from a test case much smaller than our customer’s environment. 

Building Parallel Post Queues

In our customer’s environment, we saw immediately that we had several related groups, meaning they all were part of a single transaction.  We also saw many tables that were not in groups.   Using this information, we were able to logically group tables into different POST queues, to parallelize the POST process on the target machine. We grouped the tables so that we evenly spread the total number of operations across 20 queues.  Here’s an example of a config file entry with multiple post queues:









By creating multiple post queues, we were able to reduce the overall latency for this customer’s replication from a maximum of around 35 minutes to less than 5 minutes, even during periods of extremely high transaction volume, more than 500 Gb of redo per hour.   We also implemented some data collection scripts that allowed the customer to monitor and graph latency.  I’ll cover those scripts and more monitoring in another blog.

Learn more to see how SharePlex data replication can be tuned for your environment.