Exception handling by SharePlex when dealing with out of sync?

 

  

The conventional methodology followed by SharePlex and most other replication solutions is to ignore out of sync DML within a transaction while applying rest of the successful DML to the target database. 

A transaction may span one or more objects, each having one or more rows, or DML. When posting that transaction, if any row runs into out of sync conditions, the default behavior is to simply discard that row, record the failed DML in a file and move on to the next row and so on till the complete transaction is processed. In the process the failed DML are logged in a file named SID_errlog.sql (where SID denotes the Oracle system identifier of the source database), whereas the successful DML make it to the target table(s). Though rarely used, it is possible to configure SharePlex to stop posting altogether when running into out of sync and manually start post again after correcting that out of condition. There is a Post parameter named SP_OPO_OUT_OF_SYNC_SUSPEND which can be configured to 1 to disallow posting at the first encounter of out of sync DML. 

The reasons behind allowing successful DML to post while discarding the rest are, targets are usually meant for reporting so speed is of essence and a near perfect accuracy is acceptable though it never means that the integrity of the target data should be compromised. It is just that reporting involves aggregates and some missing rows will not alter result in a significant way. Moreover, such out of sync are not a regular phenomenon and if they were, they need to be urgently investigated and corrected. Since the failed DML are logged separately, it is possible to correct the target data for the tables impacted at a convenient time. If this was a configuration for High Availability or Peer to Peer replication, then the out of sync should be investigated and fixed sooner as the nodes of SharePlex may undergo role reversal at a short notice and it is not a good idea to make the erstwhile secondary as primary when the out of sync conditions lurk in that secondary. 

With the advent of version 8.6.6, it is possible to take a different approach when running into out of sync situations while posting to the target. Call it an exception handling methodology. The 8.6.6 has a Post parameter SP_OPO_SAVE_OOS_TRANSACTION, which, if enabled on target, logs the out of sync transaction in a separate sql file (other than the *errlog.sql file) along with the “alter session” entries so that the failed transaction can be applied after taking corrective action. The complete transaction is discarded instead of only the failed DML within the transaction. The latter is the default behavior where only failed DMLs within a transaction are discarded. 

This feature is also referred in SharePlex jargon as logical transaction rollback. Here is the way to enable it: 

On target sp_ctrl set the parameter to 1 and bounce Post: 

sp_ctrl>set param SP_OPO_SAVE_OOS_TRANSACTION 1 

sp_ctrl>stop post (make sure it is “stopped by user” and not “stopping” and wait till it is stopped by user) 

sp_ctrl>start post 

From this point on, all the transactions that can result in out of sync will not be applied but would be written to a separate .sql file located in /vardir/log/oos subdirectory. The following are some of the points to be noted. 

  • Each rolled back transaction has its own SQL file. The file name is SCN_queue.sql, where: 

SCN is the commit System Change Number (SCN) of the transaction and queue is the name of the Post queue that contains the transaction.

Example file name:

4346118046_postq1.sql 

Here is a sample file generated when the parameter is enabled: 

view 002404049121_alvsupu15.sql 

alter session set nls_date_format = 'MMDDSYYYYHH24MISS';

alter session set nls_timestamp_format = 'MMDDSYYYYHH24MISS.FF';

alter session set nls_timestamp_tz_format = 'MMDDSYYYYHH24MISS.FF TZR';

alter session set time_zone = '+00:00';

UPDATE "SHA86"."TABLE1" SET "NUM_COL" = '5' WHERE "NUM_COL" = '4'

AND ROWNUM = 1;

-- Prior line has out of sync

-- redolog seq#/offset 14340/686608

-- redolog timestamp 933528685 (01/17/17 17:31:25)

-- original rowid AAA1qHAAEAAAXaEAAD

-- Row Not Found

DELETE FROM "SHA86"."TABLE1" WHERE "NUM_COL" = '5' AND ROWNUM = 1;

-- Prior line has out of sync

-- redolog seq#/offset 14340/744464

-- redolog timestamp 933528711 (01/17/17 17:31:51)

-- original rowid AAA1qHAAEAAAXaEAAD

COMMIT; 

  • The logging in the SCN_queue.sql and SID_errlog.sql is mutually exclusive. If the parameter is enabled and if out of sync occurs, then the complete transaction is logged in SCN_queue.sql. On the other hand if the parameter is disabled, then the failed DML in a transaction are logged in SID_errlog.sql. So the logging is in either of the files but not in both. 
  • The SCN_queue.sql is ready to be applied to the target database to sync it, so long as the underlying out of condition are corrected. Otherwise the sql listed in it will fail to apply. It has all the “alter session…” entries in it that are needed to apply the sql listed in it. 
  • There is no way to undo the skipping of the out of sync transaction once it is skipped. 
  • The enabling of the parameter will only make logical sense if the application apply all the interdependent changes in one transaction only. 
  • The parameter will operate on tables having most of the common data type but may exclude some data type. For the exhaustive list of supported data type, please refer to SharePlex Reference Guide. 
  • When the parameter is enabled, the statusdb file which keeps track of the out of sync tables is still updated for out of sync. The only difference is, all tables forming part of the discarded transaction go out of sync and the statusdb is updated for all of these tables. Without the parameter, the statusdb is only updated for table(s) in the specific DML(s) which fail to apply in a given transaction.   
  • Due to the fact that each failed transaction generates its own .sql file, it is imperative that periodic housekeeping be done for the files generated. When correcting the out of sync using repair or other means, the .sql file should be removed from the SCN_queue.sql file. 
  • Currently the feature is only supported for Oracle targets. 

Please refer to SharePlex Reference Guide for details.

About the Author
Shailendra.Jhalani
Shailendra Jhalani is a Senior Technical Support Engineer for SharePlex Support group at Quest Software. He joined Quest Software as Technical Support Engineer in 2001. He is an Oracle Certified DBA. ...