Staging Blog

Deep Dive: Enterprise Data Warehouse Migrations

Written by Balkrishna Heroor | Jan 31, 2023 8:00:00 AM
An enterprise data warehouse (EDW) is a central repository for an organization's structured data to support business intelligence (BI) activities, such as reporting and analytics. Enterprise data warehouse migrations involve moving an organization's EDW from one technology platform to another or from one environment to another (such as from on-premises to the cloud). This process can be complex, as it involves transferring large amounts of data and ensuring that the migrated EDW functions correctly in the new environment. Migrations may be necessary for various reasons, such as to take advantage of new features or technologies, improve performance or scalability, or reduce costs.
 
Is it time to migrate your enterprise data warehouse? Do you know how to go about this migration or why you should prioritize it? We’ve created this deep dive into enterprise data warehouse migration in hopes of answering your questions and helping you to prepare for your next project.

What are legacy data warehouses?

A legacy data warehouse is an older data warehouse that has been used for an extended period and is based on outdated technology. These systems can be challenging to maintain and may need help to meet the needs of an organization as it evolves and grows. They may also be expensive to operate and maintain due to the outdated technology they are based on. Organizations may migrate from a legacy data warehouse to a more modern platform to take advantage of newer features and technologies, improve performance and scalability, and reduce costs. Migrating from a legacy data warehouse can be a complex and time-consuming process. It involves transferring large amounts of data and ensuring that the migrated data warehouse functions correctly on the new platform.
 
Business organizations are adopting data platform technologies that support all levels of data access and analytics against multiple distributed sources of real-time information streams across a growing number of third-party services integrated into the data platform. This trend comes as a transition from legacy enterprise data warehouse systems, which refer to a collection of hardware resources deployed on-premise to access data assets, transform the information into consumable formats and run analytics and reporting on the stored databases. A traditional enterprise data warehouse architecture typically runs as a three-tier system: bottom-tier hardware resources such as servers that extract data assets from multiple sources; middle-tier systems that perform data orchestration and management activities such as data transformation, cleansing, and enrichment; the top-tier layer that runs analytics and reporting tools.
 
Legacy enterprise data warehouse systems offer limited flexibility, scalability, and cost-optimization for running data-intensive real-time analytics and machine learning operations. Organizations running traditional data warehouse technologies face the following key challenges that ultimately motivate investments in a modern data platform.
 
In contrast, modern data lake technologies are cloud-based systems that load data directly from multiple distributed sources and store it as untransformed assets at the leaf level. Let’s review the legacy enterprise data warehouse for running modern analytics and machine learning use cases:
  • Workload: SQL-based access can be extended to offer ML capabilities, but not natively. Sequential ETL operations are used to ingest and transform the data before making it available for analytics.
  • Schema: Predefined Schema-On-Write gives structure to the data assets stored in the data platform.
  • Scale: The cost and complexity of operations increases as larger datasets are imported for processing and analytics.
  • Agility: Fixed predefined database configurations govern data management and orchestration processes, limiting flexibility and scalability.
  • Storage: The storage cost is high but low on compute processing as all data assets are stored with a consistent structure and format.
  • Security: Limited flexibility to enforce data access that complies with stringent security policies while also accommodating diverse and evolving access needs of the growing list of data consumers.
  • Administrative Overhead: High administrative and infrastructure costs to run highly scalable and complex analytics use cases.

Why migrate your enterprise data warehouse to a modern architectural platform like a data lake?

Let’s compare the characteristics of a modern data platform such as a data lake, especially for analytics use cases that involve large-scale processing of multiple real-time data streams, strict security requirements, and a flexible governance framework. A modernized enterprise data platform offers the following capabilities:
  • Workload: Native support for programmatic data processing frameworks such as Apache Spark, Tensorflow, and more. They are designed for batch processing, real-time analytics, ML processing, streaming analytics applications, and continuous data engineering.
  • Schema: Schemaless data processing – data assets are stored in raw formats; processing resources and efforts are assigned where required (end of the pipeline).
  • Data Movement and Orchestration: A variety of data movement and orchestration strategies can be adopted. A modernized data platform can consist of purpose-built data stores, central and unified data lake repositories, and data warehouse systems that aggregate data assets for various integrated IT services and analytics tools depending on varied requirements from data consumers.
  • Security and Governance: Flexible configurations and access control can be adopted based on specific policies and procedures applicable to a variety of data consumers and data producers. This allows organizations to maintain data security and improved agility to develop advanced and complex significant data analytics use cases.
  • Cost: Administrative cost and complexity is reduced with automated and programmable data orchestration and management practices. Cloud-based enterprise data platform infrastructure resources can be provisioned on demand, on a subscription-based service, and at a lower Total Cost of Ownership.
What happens when your legacy enterprise data warehouse is modernized? Business organizations can expect the following key benefits:
  • Long development time: complex ETL logic and processing; heavily normalized models and limited flexibility to change
  • Highly dependent processes and data sets: coupling at data ingestion; growth of new relationships. Highly designed and complex processes and models.
  • Acquisition of real-time data streams at scale and low cost.
  • Digital transformation: exploring new use cases. Advanced AI/ML applications power them.
What are the different types of data migration?

The migration process may involve the transfer of the data from one database system to another, to a cloud-based system located off-site and require migration of the dependent application components. Different types of data transfers are necessary for the following data migration activities:

  • Database Migration: Transferring database from one resource to another. The migrated database resides entirely in the target database system and may be restructured into a different format.
  • Datacenter Transfer: Transferring data from one infrastructure system to another located at a different site. This may be required for data replication and backup as part of a disaster recovery strategy and guarantee availability targets as part of an SLA agreement.
  • Application Migration: Moving application components to a new infrastructure system. This may be required to improve application performance and scalability, which may be possible with a cloud-based infrastructure or platform service.
  • Business Process Migration: Transferring applications and data assets from one infrastructure system to another. Business processes are discovered in an existing IT environment, and dependent database systems and applications are migrated to ensure optimized business processes in the target IT environment.
  • Cloud Migration: Transferring applications and data assets from one cloud vendor to another. This choice may be motivated by cost, performance, security requirements, or a multi-cloud strategy.

What are the steps in a data migration?

Your data migration strategy to a modern data warehouse system will typically include three key stages: Plan, Migrate and Validate. Activities may be divided across multiple milestones and phases depending on the project's complexity.
  • Plan: The initial stage involves discovering data sources, devising a migration scope and validation strategy, and producing an actionable work plan with milestones. The documentation defines data model changes; data movement and orchestration plan; source data extract mechanism; tools, solutions, and training programs that will help set up and run the modernized enterprise data warehouse.
  • Migrate: The goal of this stage is to begin the execution of the migration plan. Execution may be divided into small activities that are non-disruptive and can be automated. Various migration strategies may be adopted: simple lift and shift (rehosting), re-platforming, refactoring, retiring, or retaining.
  • Validate the Data verification process designed to verify migrated records on the target data platform. Issues and exceptions are fixed at this stage, and migration validation reports and metrics are reviewed and updated until desired levels of accuracy are achieved before signing off on the project.
Mactores recently completed a large manufacturing company's enterprise data warehouse modernization project. The project involved migrating 120 GB of on-premise database systems running on the Oracle Data Warehouse to AWS Redshift. Data migration followed a complete remodeling of the schema structure.
  • Data Discovery and Evaluation: Input data sources, data ingestion pipelines, data transformation requirements, and target data models are identified at this stage. 
  • Data Warehouse Remodeling: Since a Lift and Shift migration strategy is insufficient to meet the desired objectives, the data warehouse is remodeled, and the ETL process is rearchitected for the AWS Redshift cloud environment.
  • Data and ETL Migration: A variety of data warehouse remodeling tools are analyzed, and the chosen tool is used to design an ETL framework that supports critical considerations such as high tolerance, multi-cut data migration, data extraction, and data validation and testing.
  • Reports and Business Process Migration: New workload management and business validation strategy are designed. The workloads are scaled by allocating required resources, isolating and protecting predictable workloads from unpredictable workloads.
  • Go Live: Once all requirements are met, the Oracle data warehouse is decommissioned, and all systems run optimally in the modern AWS Redshift platform.
With this five-step data migration approach, Mactores could triple the system performance and improve security while reducing operational costs: data refresh rates were reduced from 48 hours to 3 hours, and the TCO was decreased by 30 percent.