Skip to main content
This migration guide provides a comprehensive overview of migrating from legacy ETL tools to modern cloud-based architectures using Prophecy. Whether you’re running Informatica, Ab Initio, IBM DataStage, or other legacy systems, this guide outlines an approach to help you understand the migration process.

Phases

Review the following table for a quick look into the different phases of migration before reviewing each section in-depth.
PhaseWhat HappensKey Deliverable
DiscoveryMap current state, define targetArchitecture analysis & target stack
ScopingPlan approach, build teamMigration plan & team setup
Creating test datasetsCreate validation data1 historic + 2 incremental datasets
TranspilationConvert pipelines to SparkMigrated pipelines (automated + manual)
Data validationTest converted pipelinesValidated pipeline functionality
Integration testingEnd-to-end orchestration testWorking production-ready system
OptimizationPerformance & cost tuningSLA-compliant pipelines
DeploymentProduction rolloutLive system
HandoverDocumentation & trainingIndependent operation capability

Discovery

Solidify your understanding of current systems in use and research the outcomes you would like to see in Prophecy. Specifically, you might want to:
  • Understand business use cases: How do you use your existing ETL system? Document how your data flows, how you transform your data, and what outputs you expect for each use case.
  • Analyze your current ETL system architecture: Identify the ETL tools, cloud data stack, and other data tools that you employ. For example, you might use Alteryx for ETL and send output data to Tableau from each workflow.
  • Determine your target stack: Based on your requirements, Prophecy will collaborate with you to determine the ideal target stack to migrate to. For example, you might be best suited to moved to Databricks for its Spark engine, or maybe your use case better suites using BigQuery for data storage and processing.

Scope

Determine the scope for migration. Scope depends on factors such as cost, team readiness, and infrastructure preparedness. In many cases, you may start by migrating a subset of workflows from your ETL tool. This phase may involve the following roles to facilitate scoping:
RoleTeam SizeResponsibilityTeam
Project Manager1Responsible for project coordination and removing bottlenecks.Prophecy
Data Steward1Helps in understanding data structures and resolves data issues.Customer
Legacy ETL Expert2Assists in understanding existing ETL architecture and code.Customer
Spark Architect/Data Engineer2Builds and ensures stability of Spark infrastructure.Prophecy and Customer
ETL Developer5Modernizes data pipelines and debugs issues.Prophecy
ETL Tester4Tests converted data pipelines and identifies discrepancies.Customer/System Integrators
This is one example. The configuration can vary depending on the business use case and complexity.
Once the scope is determined, Prophecy will assist you in setting up your target infrastructure. This includes:
  • Implementing a unified identity system
  • Creating development, QA, and production environments
  • Integrating Git and CI/CD pipelines
  • Configuring Prophecy for your environment

Creating test datasets

Reliable test data is essential for validating migrated pipelines. Poor or incomplete datasets can cause long delays. At minimum, create the following for each pipeline:
  • One historic dataset, often from the target table in your data warehouse
  • Two incremental datasets containing new raw data for transformation and merging
Two incremental datasets allow you to validate pipelines that perform merge operations with high confidence.

Transpilation

Begin the migration during the Import phase. Prophecy’s Import tool converts existing pipelines to optimized, open-source Spark or SQL code. Approaches vary:
  • Automated migration works best for pipelines of low to medium complexity.
  • Manual migration is more suitable for complex workflows with custom logic.
  • In practice, some manual review is always necessary to fully understand workflow structures.
Migration is collaborative. Prophecy usually handles the first application alongside your team, establishing the process before enabling you or your system integrators to manage future migrations independently.

Data validation

Validation ensures that migrated pipelines produce correct and reliable results. In the early stages, Prophecy performs validation for the first application using automated testing tools. Over time, your team takes over responsibility for validation, with Prophecy available for guidance.

Orchestration and integration testing

Integration testing verifies that all components—from raw data ingestion to final output tables—work as intended in production-like conditions. This includes:
  • Parallel production runs to compare outputs against legacy systems.
  • Testing both historical and incremental datasets.
  • Migrating orchestration logic to modern tools such as the native Prophecy scheduler or Databricks jobs. Prophecy can reverse-engineer certain legacy orchestration formats like Ab Initio plans if needed.

Optimization

After functional validation, the focus shifts to meeting performance and cost SLAs. Typically:
  1. You (the customer) define target execution times and resource budgets.
  2. Pipelines that don’t meet targets are optimized by a data/platform architect or engineer.
  3. Optimized pipelines are re-tested before moving to production.
Prophecy handles optimization for the first application and supports your team in subsequent ones.

Deployment

Modernized pipelines can be deployed through Prophecy’s built-in deployment features or integrated into your existing CI/CD process (for example, Jenkins). Deployment includes:
  • Continuous integration testing
  • Artifact creation
  • Pipeline orchestration

Documentation and handover

The final step is enabling your team to operate independently. Prophecy provides:
  • Framework and process documentation
  • Guidance for adding or modifying tables
  • Procedures for data rollback and debugging
After handover, your team manages daily operations, with Prophecy available for complex issues or ongoing advisory support.

Conclusion

Migrating to Prophecy is a structured process that helps you move from legacy ETL platforms to a modern, scalable architecture. By following these phases—discovery, scoping, testing, import, validation, optimization, deployment, and handover—you can reduce risk and achieve a smooth transition. For questions or support, contact us at contact.us@prophecy.io or join our Slack community.