The first action to modernize an Information System: Eliminate unnecessary pipelines?


The first action in modernizing an Information System:

 

Eliminate unnecessary pipelines?

A data system could be compared to a vast road network, composed of various roads, each created to meet a specific need at a given time.


As this network expands and ages, some paths (data pipelines) become underutilized or unused (replicated, obsolete).

The financial and organizational impacts are numerous: 

  • 60% of data in the Cloud is not being used, according to NTT.

Source:  IT  Social

  • According to Civo, for almost half of companies with more than 500 employees, the annual cost of the Cloud exceeds one million dollars, with growth rates that are difficult to sustain.

Source:  Channel News

 

The origin of these unnecessary data pipelines,

These "phantom roads"?  

Over time, Information Systems aggregate pipelines that have become obsolete:

  • Pipelines created for projects that have now been abandoned.
  • Duplicates of pipelines,  due to a lack of coordination between services. The advent of "data mesh" architectures appears to be a major accelerator of this situation.  
  • Obsolete pipelines  kept as a precaution ("you never know!"), or to cover some risk.


These "ghost lanes" consume a lot of resources in the Cloud (storage, processing, bandwidth), which could be used differently!

We have made progress on a software solution that addresses this natural drift, as old as physics: entropy, i.e. the "degree of disorder reflecting the natural tendency of things to evolve towards a state of chaos".

 

This drift is not inevitable. On the contrary, it is a race against time, because the systems have such a propensity for entropy that  only industrialized mechanisms can cope with it.

 

This response is one of the features of {openAudit} . It uses two mechanisms: 

 

Technically and continuously identify  pipelines to be decommissioned  

It is possible to accurately map these complex entanglements and identify unused pipelines. 

This process requires  two coordinated technical actions  , which we offer with our {openAudit}  software : 

Analysis of data usage: to identify "informational dead ends".

 

  • {openAudit}  will analyze the main technical stack  to find out all the data consumed in and out of the batch chains.
  • Data consumed by satellites (non-parsed applications) is also analyzed  to identify the completeness of useful information.
  • This dual analysis can be subtle and will be configured to take into account the target business  : regulatory information may be consumed very periodically, for example, while still having significant added value. 

 

Through a "mirror analysis", informational dead ends are factually defined in continuous time. 

Data Lineage: Tracing data streams to isolate unnecessary chains

 

  • Data lineage allows you to trace the pipeline back from unused data  to the first table that will be the source of information consumed in another branch.
  • From this branch, it is possible to remove the unnecessary chain fraction without consequence. 

Clean the Information System

 

The {openAudit}  run is operated continuously, which allows the decommissioning of all unnecessary flows to be organized over a long period with internal teams.

A further classification can be carried out by profession, tools, other, to prioritize the process.  

 

 

Modeling a harmonious system

We are currently developing an algorithm, which we have named   Harmony , "  that will allow for the automated modeling of a system to be as rational and efficient as possible, even when many proprietary technologies are in use (ETL, data visualization tools). More news to come!

And if you would like to discuss  automated migration topics  related to the following themes (or others), we would be happy to do so as well: 

 

Commentaires

Posts les plus consultés de ce blog

Power BI libère les utilisateurs… Mais comment garder la maîtrise de sa plateforme dans le temps ?

De la source à la cellule du dashboard : Cartographier le SI pour le reconstruire intelligemment

Migrer de SAP BO vers GCP Looker - Garder ses données en source ? Possible ?