Exit Oracle Data Integrator (or other ELT), Migrate and sustainably break the complexity of flows.

Exit Oracle Data Integrator

(or other ELT),

 

Migrate and permanently break down the complexity of flows.  

ODI (Oracle Data Integrator), and other legacy ELT solutions, automatically generate SQL queries that execute directly in the database. We're not telling you anything new 😉.


The ELT engine generates new nested subqueries and joins as the system evolves. These transformations eventually become buried within massive SQL blocks, leaving teams with little understanding of them.

 

Furthermore, traditional ELTs are facing significant challenges from IT modernization projects, the cornerstone of which is often a "move to the cloud."
Cloud-native ELTs, which use the same logic as their predecessors, are rapidly expanding: Matillion, DBT, Dataflow (Google), and Glue (Amazon) are just a few examples. They are coupled with powerful cloud databases such as Redshift, BigQuery, and Snowflake. The combination is impressively efficient! 

 

In a migration context, the goal will be to move transformations from the source to the target, implementing upstream mechanisms that create simplicity and observability.
Then, two approaches will clash regarding how the flows are executed in the target:

  • Continue to support transformations on the database  (ELT mode), to leverage the power of the target Data Warehouse.
  • Or, outsource job execution to containers  for maximum control.

 

We'll explain it to you.

 

Essentialize,  migrate,  simplify

data  pipelines

👉 Only essential transformations

 

  • Jobs and sub-jobs from ODI (or other legacy ELT) will be analyzed to identify sources, targets, filters and different transformation logics.
  • The intersection of actual information usage and technical data lineage will make it possible to eliminate unnecessary transformations upstream of the migration  (sometimes more than 50% of the whole ).
  • Thus, only truly useful transformations will be directed towards the target . 

👉 Conversion to SQL

 

  • Once the key transformations are isolated, they will be converted into SQL queries, step by step.
  • Each request will correspond to a distinct phase of the source processing.

👉 Break down the complexity

 

  • This step-by-step SQL approach will allow the processing to be structured into several levels  to avoid generating complex monolithic queries (as in the source).
  • The nested SQL will be reworked,  which is crucial for the maintainability of the flows.
  • Materialized tables or volatile tables can also be used to store intermediate results . This will allow for a better understanding of the data flow.
  • We will create explicit checkpoints at each step of SQL transformation (which does not exist in "legacy" ELTs, such as ODI for example).
 

Option #1:

Execute the transformations in the target database,  in ELT mode 

👉 Target audience adaptation

 

  • SQL transformations will be directly integrated into Cloud-native ELT jobs.
  • The generated SQL can be adjusted to fit the target syntax.
  • Option: adding jobs to project configuration, defining dependencies between jobs, etc.
  • Job testing to verify their operation and possibly data testing to guarantee their integrity.

👉 Advantages

 

  • Latency reduction through transformations within the database.
  • Native exploitation of the computing capabilities of the target Data Warehouse.
 

Option #2:

Execute the transformations in containers, in  ETL mode

👉 Containers

 

  • The transformations will be executed in an external container,  detached from the target database and the target ELT. It will serve as a pre-configured application gateway, simple to set up, easily duplicable, to parallelize certain pipelines if necessary.
  • Temporary containers will serve as intermediate storage areas  where data will be loaded before transformation.
  • The volume of the container holding the buffer database will be externalized  so that developers can access it. 

  • The jobs will be tested to verify their functionality (and possibly the data, to guarantee its integrity).

👉 Advantages

 

  • Reduced load on the target database  : jobs will be executed in parallel with a "multi-threaded" architecture.
  • Continuity of service ensured even in the event of intensive processing (heavy processing or batch operations).
  • Increased flexibility for workflow execution and orchestration.
 

Commentaires

Posts les plus consultés de ce blog

Power BI libère les utilisateurs… Mais comment garder la maîtrise de sa plateforme dans le temps ?

De la source à la cellule du dashboard : Cartographier le SI pour le reconstruire intelligemment

Migrer de SAP BO vers GCP Looker - Garder ses données en source ? Possible ?