Business Scenario:
Organizations are considering migrating their relational database objects into Azure Cosmos DB (SQL API) as Cosmos DB offers high availability, throughput, 99.99% read availability, low latency, and consistency across regions.
The legacy database objects don’t have any relationship\ reference with other existing objects, and these objects need to handle massive amounts of data, read and write at a global scale. Some of the challenges include:
- Data ingestion is massively increasing.
- Increased cost due to large volume of data storage and scalability.
- Database performance issues leading to unexpected application outages.
Solution Strategy:
To overcome challenges like massive data ingestion, Performance, scalability, and global scale availability, we suggest migrating current SQL databases to Azure cosmos databases.
To facilitate this, we need to create two pipelines in Azure Data Factory to perform the following.
- Initial load of data: Create a pipeline with a copy activity – to copy the entire data from the source data store (Azure SQL Database) to the destination data store (Azure Cosmos DB).
- Enable change tracking technology (CDC) in the source database in Azure SQL Database.
- Get the initial value in the database as the baseline to capture changed data using change data capture (CDC) functions.
- Load complete data from the source database into Azure Cosmos DB.
- Incremental load of delta changes: Create pipelines with the following activities and run them periodically:
- Create a lookup activity to get the change count value on the source using the change data capture (CDC) table and function.
- Create a conditional activity to capture delta changes.
- Create one copy activity to copy the inserts and updates, except deletes from Azure SQL Database to Azure Cosmos DB.
- Load the delta data using the table created as part of the CDC.
- For deletes, we should refer to the CDC table (the table name has _CT extension to the original table) and _$operation column values for deleted records and manually delete those from the destination.
Outcome & Benefits:
- Azure Cosmos DB is a scalable multi-model database providing low latency and high availability.
- Multi-region writes and data distribution to any Azure region.
- Supports multiple APIs (Core SQL API)
- Azure Cosmos DB offers single-digit millisecond response times and automatic and instant scalability.
- No need for Schema Management.
Mastanvali Shaik
Solution Architect – Managed Services