There is a driving need for peer-to-peer replication, which brings out a concern : how do you handle data collision? Any system that supports peer-to-peer replication needs some sort of conflict resolution interface that allows the user to specify how conflicts should be resolved. The conflict resolution logic should not only resolve the conflict, but as an end result the data should be the same on all nodes within the replication scheme. Sometimes, the goal of keeping all nodes in sync is forgotten and as a result, the data on the different nodes end up out-of-sync.
When we look at data replication, conflicts occur on three types of operations: INSERT, UPDATE and DELETE. With peer-to-peer replication, collisions on an INSERT operation should not occur if the database design is correct. That is, collisions on INSERTs are due to UNIQUE key violations. UNIQUE keys are generated by a SEQUENCE in most cases. There are mechanisms for handling SEQUENCE generation in a peer-to-peer environment to ensure uniqueness. There will be a topic to discuss this further on. Collisions on a DELETE operation can be ignored since the DELETE operation is the end cycle for the data. The only conflict that proves problematic occurs with the UPDATE operation. Here are some methods which customers are using to resolve UPDATE conflicts:
- Timestamp Resolution – the update that occurred last should be the one that wins. This means that the server ignores UPDATES with an older timestamp than the last UPDATE for the same row. If the timestamp on the UPDATE is newer than the last one applied to that row then the record will be applied
- Host Resolution – even in a peer-to-peer environment, the customer can dictate the priority of operations based upon where the data originated from. In this scenario, the data from a lower priority host will be ignored if there is a conflict. If the data came from a server with higher priority, it will be applied.
- Business Logic Resolution – a record’s fate is dependent on the business logic applied. Since everyone’s business logic is different, the mechanism of resolving conflict should be provided by customer code.
SharePlex for Oracle has some build-in conflict resolution procedures for Timestamp Resolution and Host Resolution. It also provides the ability to write Business Logic conflict resolution procedures through PL/SQL. With the power for PL/SQL, customers can apply any business logic to resolve the data. Remember the goal is to resolve the conflict as well as to bring the data back in sync on all nodes.
With all these mechanisms for resolving conflicts, the cheapest mechanism is avoidance. If avoiding conflicts can be done, the integrity of the data is preserved without any cost. Sometimes, conflict resolution on the target cannot be done automatically due to the complexity of the logic. In this case, avoiding conflicts is the best scenario. Here are some methodologies which customers use to avoid conflicts.
- In WEB based applications, persisting one connection throughout the transaction. This will isolate changes from one customer to one server with a resulting reduction of conflict.
- In some application, the isolation can be done by the login. If you look at the customer login, you can see that if you isolate some of the customers to certain machines then the changes from that customer can be isolated to that machine, avoiding conflict all together.
Avoidance mechanisms cannot always address all the conflicts but they can reduce the conflicts to a small manageable number. By deploying conflict avoidance, you might narrow your actual conflicts down to a hand full of tables, and then conflict resolution procedures only need to be applied to those tables.
Written by Tom Chu, Product Manager, SharePlex for Oracle