What if the real complexity of BI migrations lies in the sources?

In any data visualization solution migration project, we tend to think that the main obstacle lies in changing tools. Yes, but "not only" 😊!

Moving from SAP Business Objects to Looker, for example, could be a matter of "translating" dashboards and semantic layers.

We know how to do it very well 👇.

But our experience shows that there is a structuring aspect that adds complexity to complexity in the context of a technical migration: the management of data sources in the target. 

In the case of an SAP BO to Looker migration, data is rarely migrated 100% to GCP BigQuery. The data sources may remain on Oracle, for example, rely partly on custom SQL (freehand SQL), or combine several databases. There are many complex scenarios. 

 

We have built a technical solution that allows us to efficiently handle all of these scenarios. 

 

 

  • In the context we are discussing, we envisioned implementing an intermediate data layer in Parquet format, stored in Google Cloud Storage (GCS) . Parquet is a columnar, compressed file format, which is therefore particularly advantageous for analytical queries. 

 

  • This intermediate layer will be queried through an SQL orchestrator,  i.e., a distributed SQL query engine. It will query this optimized cache (built on the fly – or scheduled) taking into account the original Business Objects prompt values ​​to dynamically apply the Business Objects reporting logic. The results will then be sent to Looker (in the case mentioned). This infrastructure can be containerized to better handle increased loads. 
 

Technological decoupling:  we will remove the direct link between Looker and legacy databases (Oracle to take our example), by creating  a pivot format readable by all tools.

 

Efficiency:  Parquet files will be generated on the fly (i.e., on demand).  The source database will then no longer be accessed. They will act as a cache. This will prevent repeated requests to the source databases. Parquet files can still be scheduled if necessary.  

 

Load reduction   we will apply the BO filters upstream, as close as possible to the data providers , to limit the volumes to be processed.

 

Moving the intelligence  : the SQL orchestrator will merge the different sources and replicate the intelligence of even complex BO dashboards. This will allow for a real performance gain. 

This strategy will  not only allow us to deal with cases excluded from the initial scope of migration, it will also open the door to broad interoperability of dataviz sources: the tools will no longer connect to the databases, but to an agnostic abstraction layer.

 

In summary, the migration of the dataviz tool alone will not guarantee the complete success of a data transformation project.

The almost systematic complexity of the sources in the target architecture will often necessitate the implementation of adaptive mechanisms. Anticipating future architectural developments may also justify implementing the solution we have described.   

The next step will be to build a universal semantic layer that will oversee the SQL orchestrator (in progress⏳), for complete interoperability, of the sources but also of the semantic layer. 

Commentaires

Posts les plus consultés de ce blog

Power BI libère les utilisateurs… Mais comment garder la maîtrise de sa plateforme dans le temps ?

De la source à la cellule du dashboard : Cartographier le SI pour le reconstruire intelligemment

Migrer de SAP BO vers GCP Looker - Garder ses données en source ? Possible ?