The new Cloud data pipelines: The principles of ELT and lots of SQL

janvier 30, 2025

The new Cloud data pipelines :

The principles of ELT and lots of SQL !

For years, ETL tools like BODS, DataStage, Informatica, Ab Initio, and Talend dominated the data pipeline building landscape.
This presented a window of opportunity:

The hardware constraints and limited performance of "on-premises" Data Warehouses suggested that transformations be moved to third-party servers, hence the lesser success of ELTs (Oracle Data Integrator, ODI are examples).
Companies wanted to democratize the creation of data pipelines with business-oriented tools.

But cloud architectures have changed the game, particularly due to the power of cloud data warehouses.

Are we witnessing a massive, but SQL-boosted, revival of the principles that were at work in ELT?

If so, how can we be part of this dynamic?

Why are companies abandoning E & L logic in the Cloud?

Historical ETL technologies revolutionized the world of data, but their design relies on paradigms that now have limitations:

Optimizing ETL chains and parallelizing processes is not necessarily very simple or intuitive and depends essentially on the tool.
The workflows we design work well with relatively "static" systems. However, in the era of the "data-driven" enterprise, systems are inflationary.
Specialists in specific ETL tools are becoming rare (and the same is true for ELT), which is problematic for maintenance! Especially since everyone has been "tuning" the ETL tools, with countless nested features that create opacity.

Moving towards new data pipeline tools: combining power and SQL !

Low-cost power

The transformations take place within the Data Warehouse as with the old ELTs, but this time in the Cloud: the native performance of Cloud DWHs (BigQuery, Snowflake, Redshift) is almost unlimited, which boosts the performance of these tools (Matillion, dbt, Airbyte, Fivetran, etc.).

These data pipeline tools are more accessible in terms of cost: they often rely on open source modules (such as dbt).

SQL as a technical pivot

The SQL transformations encapsulated in these data pipeline tools are open, modularized and designed for collaboration : versioning, reusable models, etc.

SQL (or its dialects), which forms the technical basis of these data pipeline tools, is mastered by most IT teams , which in times of "talent" shortage is not insignificant!

Is migrating ETL jobs to a cloud data pipeline tool "challenging"?

For example, imagine a company (like the ones we know) that has accumulated more than 10,000 ETL jobs, built over several years with complex transformations such as joins, aggregations, or even specific business rules...

Migrating ETL jobs to a Cloud data pipeline tool requires a complete overhaul of their execution, especially when adopting an SQL approach.

Automation is essential.

80% of the effort = regaining control over your existing system. How?

Perform a comprehensive and automated inventory of the source system with {openAudit}

Through an analysis of internal processes via physical data lineage, in the field, in the source database, in the ETL: analysis of views, nested views and other scripts associated with feeding the flows.
Through an analysis of information usage, via an analysis of audit database logs, for data consumption and injection.
By analyzing the impacts on the data visualization tools associated with the source database, to understand the related complexity (calculation rules) and to be able to do true end-to-end data lineage.
Through an analysis of task scheduling , in order to link it to data lineage and data usage.

Optimize the scope of what needs to be migrated by cross-referencing usage with data lineage using {openAudit}

To reduce migration effort by identifying living branches and dead branches to be decommissioned upstream.
To define a roadmap by profession, by type of flow, etc.

Migration: it absolutely has to involve SQL !

Converting ETL logic into source, in SQL, using {openAudit}

Scheduling of elementary transformations : {openAudit} detects the sources, targets, filters and transformation logics of all ETL jobs and subjobs in the source, and isolates the transformations that will be carried into the target tool.
Conversion to SQL : these ETL transformations are converted into SQL queries by {openAudit} , step by step.
Control points : each SQL step corresponds to a step in the source job, which allows {openAudit} to introduce real control points (which do not exist in ETL).

Conversion of ETL logic to data pipeline tool logic in open models with {openAudit} and tests:

SQL transformations are directly integrated into models of the target tool .
The generated SQL can be adjusted by {openAudit} to fit the target syntax.

Then, add the models to the project configuration, define the dependencies between models, etc.
The models are tested to verify their functionality.
The data can be tested to ensure its integrity.

CONCLUSION

The abandonment of legacy ETL processes is not just a matter of technical modernization. It is a strategic opportunity for companies to rethink their data pipelines, making them more efficient, scalable, and cost-effective (open source model).

Modern, more open data pipeline tools, combined with powerful cloud-based data warehouses, embody this new era. By adopting these principles, businesses can leverage leaner yet robust architectures.
The transition may seem complex, but with the right strategies and tools, it's possible to turn the page and build the data pipelines of the future.

Rechercher dans ce blog

{openAudit}, le data lineage et les usages de l'information pour transformer les SI