What if the real complexity of BI migrations lies in the sources?

In any data visualization solution migration project, we tend to think that the main pitfall lies in the change of tool. Yes, but "not only" !

Moving from SAP Business Objects to Looker, for example, could be a matter of “translating” dashboards and the semantic layer.

We know how to do it very well .

R EX: SAP BO migration to GCP Looker

But our experience shows that there is a structuring aspect that adds complexity to complexity in the context of a technical migration: the management of data sources in the target.

When migrating from SAP BO to Looker, they are rarely fully migrated to GCP BigQuery. Sources may remain in Oracle, for example, rely partly on custom SQL (freehand SQL), or combine multiple databases. Complex cases are numerous.

We have built a technical solution to effectively handle all of these scenarios.

A new paradigm:

Create an interoperable abstraction layer

In the context we are discussing, we imagined setting up an intermediate layer of data in Parquet format, stored in Google Cloud Storage (GCS) . Parquet is a columnar, compressed file format which is therefore of real interest for analytical queries.

This middle layer will be queried through a SQL orchestrator, i.e. a distributed SQL query engine. It will query this optimized cache (built on the fly - or scheduled) taking into account the original BO prompt values to dynamically apply the BO report logic. The results will then be transmitted to Looker (in the case mentioned). This infrastructure can be containerized to better manage scalability.

The results:

Undeniable added value at different levels

Technological decoupling: we will remove the direct link between Looker and legacy databases (Oracle to use our example), by creating a pivot format readable by all tools.

Efficiency: Parquet files will be generated on the fly (i.e., on demand). The source database will then no longer be accessed. They will act as a cache. This will prevent repeated access to the source databases. Parquet files can, however, be scheduled if necessary.

Load reduction : we will apply BO filters upstream, as close as possible to the data providers , to limit the volumes to be processed.

Moving intelligence : The SQL orchestrator will merge the different sources and reproduce the intelligence of BO dashboards, even complex ones. This will allow for real performance gains.

This strategy will not only allow for the treatment of cases excluded from the initial migration scope, it will also open the door to broad interoperability of dataviz sources: the tools will no longer connect to the databases, but to an agnostic abstraction layer.

To summarize, the migration of the dataviz tool will not in itself determine the complete success of a data transformation project.

The almost systematic complexity of the sources in the target architecture will often require the implementation of adaptive mechanisms. The anticipation of future architectural developments may also justify the implementation of the solution we have described.

The next step will be to build a universal semantic layer which will oversee the SQL orchestrator (in progress ), for complete interoperability, of the sources but also of the semantic layer.

Rechercher dans ce blog

Le data lineage et l’usage des données pour transformer un système : simplifications / migrations

What if the real complexity of BI migrations lies in the sources?

What if the real complexity of BI migrations lies in the sources?

Commentaires

Enregistrer un commentaire

Posts les plus consultés de ce blog

Migration automatisée de SAP BO vers Power BI, au forfait.

Migration Talend-dbt : un passeport pour moderniser ses données

La 1ère action de modernisation d’un Système d'Information : Ecarter les pipelines inutiles ?