Introduction
SharePlex customers often have questions about how SharePlex detects, processes and resolves situations where rows in the source and target databases are different.
SharePlex is designed to maintain synchronization between the source and target databases in a replication stream and to detect and correct cases where the data in the two databases is not the same. In SharePlex, we differentiate between “out-of-sync” which can occur in any replication environment, and “conflict” which only occurs in bi-directional, or peer-to-peer replication.
Definitions
In any replication configuration, a given transaction or change has one and only one source; and while there may be one or more target databases, each target database is considered separately. Also, in a configuration that uses named post queues, each post process will operate only on the tables associated in the configuration file with the specific queue.
In SharePlex, “out-of-sync” is defined as a condition in which the contents of a row in the source database do not match the contents of the same row in the target database.
A “conflict” arises in bi-directional replication when conflicting changes to the same row are made in the databases involved that would result in an “out-of-sync” condition.
Out of Sync Detection
SharePlex can detect out-of-sync conditions in two ways, during the replication process, or using the COMPARE or REPAIR commands. Note that at the present time, COMPARE and REPAIR are limited to Oracle and PostgreSQL databases only.
Out-of-Sync detection during replication
To detect out-of-sync conditions during replication, SharePlex relies on the target database. The SharePlex creates a “where” clause for each message in the replication stream. The predicate in this clause is used to determine which row in the target database should be processed AND to detect out-of-sync conditions. In the default configuration, the predicate contains the changed columns and key columns from the source database.
For example, in the DEMO_SRC table, if we insert a record like ‘Clay’, ‘Prosser’, ‘1234’, on source database, post will generate a SQL statement that looks like this:
INSERT INTO "DEMO"."DEMO_SRC" ("NAME","ADDRESS","PHONE") VALUES ('CLAY','PROSSER','1234');
Since the DEMO_SRC table does not have a key defined, for an UPDATE, SharePlex will use all the non-binary columns of the table as a key. If we issue this SQL on the source:
UPDATE DEMO_SRC SET ADDRESS = ‘Tacoma’ WHERE NAME = ‘CLAY’
Post will generate this:
UPDATE "DEMO"."DEMO_SRC" SET "ADDRESS"='Tacoma' WHERE ROWNUM= 1 and "NAME"='CLAY' AND "ADDRESS"='PROSSER' AND "PHONE"='1234';
Where ROWNUM=1 is used to identify the change must be applied to only one row.
For tables with defined primary keys or unique indexes, SharePlex, by default, will pass the key and any changed columns. In the above example, if the PHONE column was declared as Primary key, and the source table contained these values:
NAME ‘Clay’
ADDRESS ‘Prosser’
PHONE ‘1234’
Post would generate the following SQL:
UPDATE "DEMO"."DEMO_SRC" set "ADDRESS"='Tacoma' WHERE ROWNUM=1 AND "ADDRESS"='Prosser' AND "PHONE"='1234';
As the diagram below shows, since the target table ADDRESS column contains ‘TACOMA’, not ‘Prosser’; the database will return Row not Found” and SharePlex will issue an “Out of Sync” error.
Which columns are included in the replication stream and the where clause are controlled by the following parameters:
SP_OCT_REDUCED_KEY
SP_OCT_USE_SUPP_KEYS
SP_OPO_REDUCED_KEY
SP_CAP_REDUCED_KEY
SP_OPX_REDUCED_KEY
See the SharePlex Reference Guide for detailed information about these parameters and their interactions.
When a change is applied to the target database, if an attempt is made to insert a duplicate unique key, the database returns a “unique constraint violated” error.
In the same fashion, if the database returns a “row not found” error on an UPDATE or DELETE SharePlex will treat that as an out-of-sync condition.
For these conditions, SharePlex logs an error in the event_log on the target system, and the SQL that generated the error can be found in the errlog.sql log. By default, these errors are logged, but post will continue processing. This behavior can be changed with either the SP_OPO_OUT_OF_SYNC_SUSPEND or SP_OPX_ OUT_OF_SYNC_SUSPEND parameters.
Note that for Oracle targets only, if any of the following conditions are met:
- All values (including the key value) of a replicated INSERT match the existing values of a row in the target.
- The key value of an UPDATE matches a row in the target and the existing values in that row match the after (change) values that were replicated from the source
- The target row of a DELETE operation does not exist
SharePlex will suppress the out-of-sync message. This is controlled by the SP_OP_SUPRESSED_OOS parameter.
Hidden Out of Sync
There may be conditions where a row is changed on the target independent of replication, which create an out-of-sync condition that will not be detected until the same row is changed on the source. This is referred to as “hidden” out-of-sync and can be detected using the COMPARE command.
Compare
SharePlex for Oracle and PostgreSQL includes a COMPARE command, which will compare a source table with Its corresponding target table and generate a log containing the SQL to correct the condition.
Out of Sync Correction
There are several options available to assist in correcting an out of sync condition.
Background
First, we can decide if we want SharePlex to continue replication, which is the default, or stop replication after it detects an out of sync condition. This behavior is controlled by the SP_OPO_OUT_OF_SYNC_SUSPEND or SP_OPX_OUT_OF_SYNC_SUSPEND parameter. Note that while these parameters apply to the entire SharePlex instance, in a configuration with multiple post process, only the post process that detected the out of sync will be stopped.
For replication to an Oracle target, we can also decide how SharePlex deals with the out of sync transaction. By default, the transaction is discarded, the SQL operation that caused the error is written to the errlog.sql log, and processing continues. If the SP_OPO_SAVE_OOS_TRANSACTION parameter is set, the ENTIRE transaction or SCN, rather than just the operation that caused the out of sync, is written to a file in the OOS subdirectory of the VARDIR directory.
Caution should be used with the *OUT_OF_SYNC_SUSPEND parameters, since post will stop, which will increase latency. Also, note that when post is stopped, the transaction is retained in the post queue, so the condition in the database that caused the out of sync condition must be corrected BEFORE restarting post; or a support case should be opened to obtain assistance in removing the transaction from the queue.
Correcting an out of sync condition
When an out of sync condition is detected, for an update, the row(s) in the source and target database involved in the transaction that caused the out of sync condition no longer contain the same data. For an insert, the row already exists in the target, and for a delete the record did not exist in the target.
Depending on the state the *OUT_OF_SYNC_SUSPEND parameter, there are several options that can be used to repair the out of sync rows in the target database
OUT_OF_SYNC_SUSPEND not set (default 0)
The out of sync condition can be corrected using the SharePlex repair command, issued from the source command line processor (sp_ctrl), which will read the source database tables specified and execute SQL against the appropriate rows in the target database to re-synchronize the databases. See the SharePlex Reference Guide for options for the repair command that can be used to limit the number of rows processed. Note that for “Hidden” out of sync conditions, using repair is the recommended option.
The out of sync condition can also be corrected manually, using the SQL statements logged by SharePlex. The statements are logged in errlog.sql log file found in the <VARDIR>/log directory; or, in the case of an Oracle target, if the SP_OPO_SAVE_OOS_TRANSACTION parameter is set, from the <VARDIR>/oos directory.
Either of these options can be used while SharePlex continues replication.
OUT_OF_SYNC_SUSPEND set (1)
If OUT_OF_SYNC_SUSPEND is set, when the out of sync condition is detected, the post process will stop and the transaction will remain in the post queue. In this case, the underlying records in the target database must be corrected prior to (re)starting post and then once post is restarted, the transactions will be applied.
The SQL saved in either the errlog.sql log or the OOS subdirectory can be used as a guide to determine what operations need to be performed on the target database to allow the transaction to be successfully applied. For example, if the out of sync was caused by a “row not found” and the row truly does not exist on the target, inserting the row into the target database would allow the incoming transaction to successfully delete the row. If the row exists and the operation was an update, update the row in the target database so that the “before” values in all of the columns match the values from the SQL found in the log or OOS directories.
Once the condition that caused the out of sync is corrected, post can be (re) started with the “start post” command. The transaction will be processed and SharePlex will continue replication.
Summary
In this blog I’ve offered a detailed description of how SharePlex detects an Out of Sync or conflict and how an out of sync condition can be corrected in SharePlex configurations using one-way replication.
In a future blog, I’ll describe how SharePlex conflict resolution can be used to correct conflicts in peer-to-peer or bi-directional replication configurations.