Migration overview

This migration guide provides a comprehensive overview of migrating from legacy ETL tools to modern cloud-based architectures using Prophecy. Whether you’re running Informatica, Ab Initio, IBM DataStage, or other legacy systems, this guide outlines an approach to help you understand the migration process.

Phases

Review the following table for a quick look into the different phases of migration before reviewing each section in-depth.

Phase	What Happens	Key Deliverable
Discovery	Map current state, define target	Architecture analysis & target stack
Scoping	Plan approach, build team	Migration plan & team setup
Creating test datasets	Create validation data	1 historic + 2 incremental datasets
Transpilation	Convert pipelines to Spark	Migrated pipelines (automated + manual)
Data validation	Test converted pipelines	Validated pipeline functionality
Integration testing	End-to-end orchestration test	Working production-ready system
Optimization	Performance & cost tuning	SLA-compliant pipelines
Deployment	Production rollout	Live system
Handover	Documentation & training	Independent operation capability

Discovery

Solidify your understanding of current systems in use and research the outcomes you would like to see in Prophecy. Specifically, you might want to:

Understand business use cases: How do you use your existing ETL system? Document how your data flows, how you transform your data, and what outputs you expect for each use case.
Analyze your current ETL system architecture: Identify the ETL tools, cloud data stack, and other data tools that you employ. For example, you might use Alteryx for ETL and send output data to Tableau from each workflow.
Determine your target stack: Based on your requirements, Prophecy will collaborate with you to determine the ideal target stack to migrate to. For example, you might be best suited to moved to Databricks for its Spark engine, or maybe your use case better suites using BigQuery for data storage and processing.

Scope

Determine the scope for migration. Scope depends on factors such as cost, team readiness, and infrastructure preparedness. In many cases, you may start by migrating a subset of workflows from your ETL tool. This phase may involve the following roles to facilitate scoping:

Role	Team Size	Responsibility	Team
Project Manager	1	Responsible for project coordination and removing bottlenecks.	Prophecy
Data Steward	1	Helps in understanding data structures and resolves data issues.	Customer
Legacy ETL Expert	2	Assists in understanding existing ETL architecture and code.	Customer
Spark Architect/Data Engineer	2	Builds and ensures stability of Spark infrastructure.	Prophecy and Customer
ETL Developer	5	Modernizes data pipelines and debugs issues.	Prophecy
ETL Tester	4	Tests converted data pipelines and identifies discrepancies.	Customer/System Integrators

This is one example. The configuration can vary depending on the business use case and complexity.

Once the scope is determined, Prophecy will assist you in setting up your target infrastructure. This includes:

Implementing a unified identity system
Creating development, QA, and production environments
Integrating Git and CI/CD pipelines
Configuring Prophecy for your environment

Creating test datasets

Reliable test data is essential for validating migrated pipelines. Poor or incomplete datasets can cause long delays. At minimum, create the following for each pipeline:

One historic dataset, often from the target table in your data warehouse
Two incremental datasets containing new raw data for transformation and merging

Two incremental datasets allow you to validate pipelines that perform merge operations with high confidence.

Transpilation

Begin the migration during the Import phase. Prophecy’s Import tool converts existing pipelines to optimized, open-source Spark or SQL code. Approaches vary:

Automated migration works best for pipelines of low to medium complexity.
Manual migration is more suitable for complex workflows with custom logic.
In practice, some manual review is always necessary to fully understand workflow structures.

Migration is collaborative. Prophecy usually handles the first application alongside your team, establishing the process before enabling you or your system integrators to manage future migrations independently.

Data validation

Validation ensures that migrated pipelines produce correct and reliable results. In the early stages, Prophecy performs validation for the first application using automated testing tools. Over time, your team takes over responsibility for validation, with Prophecy available for guidance.

Orchestration and integration testing

Integration testing verifies that all components—from raw data ingestion to final output tables—work as intended in production-like conditions. This includes:

Parallel production runs to compare outputs against legacy systems.
Testing both historical and incremental datasets.
Migrating orchestration logic to modern tools such as the native Prophecy scheduler or Databricks jobs. Prophecy can reverse-engineer certain legacy orchestration formats like Ab Initio plans if needed.

Optimization

After functional validation, the focus shifts to meeting performance and cost SLAs. Typically:

You (the customer) define target execution times and resource budgets.
Pipelines that don’t meet targets are optimized by a data/platform architect or engineer.
Optimized pipelines are re-tested before moving to production.

Prophecy handles optimization for the first application and supports your team in subsequent ones.

Deployment

Modernized pipelines can be deployed through Prophecy’s built-in deployment features or integrated into your existing CI/CD process (for example, Jenkins). Deployment includes:

Continuous integration testing
Artifact creation
Pipeline orchestration

Documentation and handover

The final step is enabling your team to operate independently. Prophecy provides:

Framework and process documentation
Guidance for adding or modifying tables
Procedures for data rollback and debugging

After handover, your team manages daily operations, with Prophecy available for complex issues or ongoing advisory support.

Conclusion

Migrating to Prophecy is a structured process that helps you move from legacy ETL platforms to a modern, scalable architecture. By following these phases—discovery, scoping, testing, import, validation, optimization, deployment, and handover—you can reduce risk and achieve a smooth transition. For questions or support, contact us at contact.us@prophecy.io or join our Slack community.

Import overview

Alteryx to SQL

Alteryx to Spark

Migration Analysis

Phases

Discovery

Scope

Creating test datasets

Transpilation

Data validation

Orchestration and integration testing

Optimization

Deployment

Documentation and handover

Conclusion

Import overview

Alteryx to SQL

Alteryx to Spark

Migration Analysis

​Phases

​Discovery

​Scope

​Creating test datasets

​Transpilation

​Data validation

​Orchestration and integration testing

​Optimization

​Deployment

​Documentation and handover

​Conclusion

Phases

Discovery

Scope

Creating test datasets

Transpilation

Data validation

Orchestration and integration testing

Optimization

Deployment

Documentation and handover

Conclusion