Oracle to SQL Server Migration and Replication/ CDC : Handling Different Character Sets/endian and Collations

18 Mar 2025

Migrating databases from Oracle to SQL Server can be challenging, particularly when character sets between the two systems differ. Mismatched character sets can lead to data loss, corruption, or issues with character encoding. In this blog, we'll explore best practices and key considerations to ensure a smooth migration from Oracle to SQL Server while handling character set differences effectively.

Understanding Character Sets in Oracle and SQL Server

Character sets, also known as encoding standards, define how characters are stored in a database. Oracle and SQL Server support different character sets, so understanding them is crucial for a seamless migration.

- Oracle Character Sets: Oracle databases typically use UTF-8 (AL32UTF8) or single-byte character sets such as WE8ISO8859P1, which support ASCII and Latin-based characters.

- SQL Server Character Sets: SQL Server predominantly uses Windows collations, such as ‘SQL_Latin1_General_CP1_CI_AS’ and supports Unicode character encoding (UTF-16) with the `NVARCHAR` data type.

Character Set Compatibility Check

Before migrating, evaluate compatibility between the Oracle and SQL Server character sets to avoid data loss or character mismatches. Start by running the following steps:

Identify Source Character Set: Use the following query in Oracle to determine the database character set:

SQL> SELECT * FROM NLS_DATABASE_PARAMETERS
WHERE PARAMETER IN (‘NLS_LENGTH_SEMANTICS’, ‘NLS_CHARACTERSET’)

Another important factor while migrating database is to check NLS_LENGTH_SEMANTICS value and make sure similar values on both sides.

-Identify Target Character Set: In SQL Server, check the collation and character set of the target database using:

SELECT DATABASEPROPERTYEX('database_name', 'Collation') AS Collation;

Type USE db_name

Type SELECT @@character_set_database, @@collation_database

Comparing these outputs helps you determine if you need to implement conversion strategies during migration.

Character Set Conversion Strategies

If there are character set incompatibilities, consider these options:

- Use NVARCHAR Data Types in SQL Server Since `NVARCHAR` supports Unicode (UTF-16), it can store data from different languages and symbols without corruption. Converting Oracle ‘VARCHAR2’ data to ‘NVARCHAR’ in SQL Server can help preserve character integrity during migration.

- Apply Data Transformation Tools Leverage tools like SQL Server Migration Assistant (SSMA) or Oracle SQL Developer to automate data type conversions, including adjustments for character encoding.

- Custom Scripts for Character Set Conversion For custom migration, scripts can convert data from Oracle’s encoding to a compatible format in SQL Server. This may involve using the `CONVERT` function in SQL Server to manage incompatible characters during the import.

Handling Data Type Mapping and Character Conversion

Oracle and SQL Server support different data types for character storage, which can cause issues if not mapped correctly:

- VARCHAR2 and NVARCHAR2 (Oracle) to NVARCHAR (SQL Server) For Unicode data, map Oracle `VARCHAR2` and `NVARCHAR2` columns to SQL Server `NVARCHAR` to maintain character encoding.

- CLOB and NCLOB Map Oracle `CLOB` or `NCLOB` columns to SQL Server’s `NTEXT` or `NVARCHAR(MAX)`. This helps store large text data while preserving special characters.

Using Quest Shareplex automates these mappings, making the migration process more efficient and less prone to human error.

Shareplex is a logical replication solution which offers variety of replication and migration use cases, one of the unique use cases will fit into this migration scenario as well

Oracle Database Setup:

enable force logging on (schema) which need to be migrated to MSSQL.

Supplemental log should be enabled on atleast PK level .

An intermediary system will be used in this scenario for replicating initial load (historical data) and to minimize downtime on production environment, we will configure shareplex on actual production database for capture real time activity during migration on historical data: illustrate below.

Shareplex Character set conversion -

SharePlex posts data to open target (MSSQL) in UTF8 format. If source is US7ASCII or Unicode and you are not replicating LOB, then no character set conversion needed and you don’t need to install Oracle client on target server and SP_OPX_NLS_CONVERSION can be set to 0 to disable conversion on target.

Otherwise for other character sets or if you are replicating LOB, install Oracle client on target and SP_OPX_NLS_CONVERSION must be set to 1 on target to enable conversion.

Testing and Validation

Testing is essential to ensure the correct migration of character sets:

Data Validation Scripts: Run the cust_DBinfo script to compare all out-of-the-box supported data types, as well as any specific data types that require attention during the migration. Carefully check the original data in Oracle against the data in SQL Server, paying special attention to special characters, non-English text, and symbols.
Application Testing: If applications interact with the database, verify their performance on SQL Server to ensure that data is displayed correctly.
Performance Testing: After the migration, evaluate query performance and indexing on the new SQL Server system. Since SQL Server’s collation affects sorting and searching behavior, ensure that indexes are optimized accordingly.

Addressing Common Issues

During character set migration, a few common issues may arise:

- Data Truncation If character length differs between systems, data may be truncated. Ensure the column sizes in SQL Server accommodate the full length of Oracle data.

- Character Replacement Special characters that are not supported may appear as `?` or `□`. Implement character mapping or custom transformation scripts to avoid this.

- Collation Conflicts SQL Server collation impacts character comparison and sorting. Choose a collation that aligns with the intended usage and data requirements.

Post-Migration Considerations

Once the migration is complete, continue with these best practices:

- Optimize Database Performance Use SQL Server’s indexing and collation settings to optimize performance, especially for string-based queries.

- Monitor Data Integrity Run integrity checks periodically to ensure that special characters remain intact.

- Documentation Document any custom mappings or transformations applied during migration for future reference or audits.

Conclusion

Migrating from Oracle to SQL Server with differing character sets requires careful planning, testing, and validation. By understanding the nuances of character encoding, using the right data types, and employing automation tools, you can ensure a seamless migration process that preserves data integrity and supports your applications.

https://www.quest.com/products/shareplex/

Author: Muhammad Shahzad Hussain
Date: Mar 2025