What are legacy data warehouses?
Workload: SQL-based access can be extended to offer ML capabilities, but not natively. Sequential ETL operations are used to ingest and transform the data before making it available for analytics.
Schema: Predefined Schema-On-Write gives structure to the data assets stored in the data platform.
Scale: The cost and complexity of operations increases as larger datasets are imported for processing and analytics.
Agility: Fixed predefined database configurations govern data management and orchestration processes, limiting flexibility and scalability.
Storage: The storage cost is high but low on compute processing as all data assets are stored with a consistent structure and format.
Security: Limited flexibility to enforce data access that complies with stringent security policies while also accommodating diverse and evolving access needs of the growing list of data consumers.
Administrative Overhead: High administrative and infrastructure costs to run highly scalable and complex analytics use cases.
Why migrate your enterprise data warehouse to a modern architectural platform like a data lake?
- Workload: Native support for programmatic data processing frameworks such as Apache Spark, Tensorflow, and more. They are designed for batch processing, real-time analytics, ML processing, streaming analytics applications, and continuous data engineering.
- Schema: Schemaless data processing – data assets are stored in raw formats; processing resources and efforts are assigned where required (end of the pipeline).
- Data Movement and Orchestration: A variety of data movement and orchestration strategies can be adopted. A modernized data platform can consist of purpose-built data stores, central and unified data lake repositories, and data warehouse systems that aggregate data assets for various integrated IT services and analytics tools depending on varied requirements from data consumers.
- Security and Governance: Flexible configurations and access control can be adopted based on specific policies and procedures applicable to a variety of data consumers and data producers. This allows organizations to maintain data security and improved agility to develop advanced and complex significant data analytics use cases.
- Cost: Administrative cost and complexity is reduced with automated and programmable data orchestration and management practices. Cloud-based enterprise data platform infrastructure resources can be provisioned on demand, on a subscription-based service, and at a lower Total Cost of Ownership.
- Long development time: complex ETL logic and processing; heavily normalized models and limited flexibility to change
- Highly dependent processes and data sets: coupling at data ingestion; growth of new relationships. Highly designed and complex processes and models.
- Acquisition of real-time data streams at scale and low cost.
- Digital transformation: exploring new use cases. Advanced AI/ML applications power them.
The migration process may involve the transfer of the data from one database system to another, to a cloud-based system located off-site and require migration of the dependent application components. Different types of data transfers are necessary for the following data migration activities:
- Database Migration: Transferring database from one resource to another. The migrated database resides entirely in the target database system and may be restructured into a different format.
- Datacenter Transfer: Transferring data from one infrastructure system to another located at a different site. This may be required for data replication and backup as part of a disaster recovery strategy and guarantee availability targets as part of an SLA agreement.
- Application Migration: Moving application components to a new infrastructure system. This may be required to improve application performance and scalability, which may be possible with a cloud-based infrastructure or platform service.
- Business Process Migration: Transferring applications and data assets from one infrastructure system to another. Business processes are discovered in an existing IT environment, and dependent database systems and applications are migrated to ensure optimized business processes in the target IT environment.
- Cloud Migration: Transferring applications and data assets from one cloud vendor to another. This choice may be motivated by cost, performance, security requirements, or a multi-cloud strategy.
What are the steps in a data migration?
- Plan: The initial stage involves discovering data sources, devising a migration scope and validation strategy, and producing an actionable work plan with milestones. The documentation defines data model changes; data movement and orchestration plan; source data extract mechanism; tools, solutions, and training programs that will help set up and run the modernized enterprise data warehouse.
- Migrate: The goal of this stage is to begin the execution of the migration plan. Execution may be divided into small activities that are non-disruptive and can be automated. Various migration strategies may be adopted: simple lift and shift (rehosting), re-platforming, refactoring, retiring, or retaining.
- Validate the Data verification process designed to verify migrated records on the target data platform. Issues and exceptions are fixed at this stage, and migration validation reports and metrics are reviewed and updated until desired levels of accuracy are achieved before signing off on the project.
- Data Discovery and Evaluation: Input data sources, data ingestion pipelines, data transformation requirements, and target data models are identified at this stage.
- Data Warehouse Remodeling: Since a Lift and Shift migration strategy is insufficient to meet the desired objectives, the data warehouse is remodeled, and the ETL process is rearchitected for the AWS Redshift cloud environment.
- Data and ETL Migration: A variety of data warehouse remodeling tools are analyzed, and the chosen tool is used to design an ETL framework that supports critical considerations such as high tolerance, multi-cut data migration, data extraction, and data validation and testing.
- Reports and Business Process Migration: New workload management and business validation strategy are designed. The workloads are scaled by allocating required resources, isolating and protecting predictable workloads from unpredictable workloads.
- Go Live: Once all requirements are met, the Oracle data warehouse is decommissioned, and all systems run optimally in the modern AWS Redshift platform.