| Switching from Teradata to BigQuery? Automate the process! |
|
|
|
|
|
The Teradata release - a technical challenge Teradata is an appliance chosen by countless organizations. Teradata's specialization in data warehousing and analytics has enabled the implementation of solutions with exceptional computing power and high scalability. But most of Teradata's historical users are now in a phase of switching to the Cloud. One of our clients has decided to switch its Teradata assets to Google Cloud Platform (BigQuery), as well as migrate a number of data visualization technologies (SAP BO, Power BI).
We are sharing with you the methodology implemented as part of this migration. |
Some key indicators regarding the platform to be migrated |
|
|
|
|
|
| Asset data: - 400,000 data "containers";
- 270,000 tables;
- 11,000 files.
|
|
|
|
|
|
| Script assets: - 122,000 views;
- 1,200 macros;
- 500 BTEQ injection processes;
- 600 BO universes;
- 100,000 web reports;
- 30,000 data manipulation processes distributed across 450 Stambia ETL projects.
|
|
|
|
|
|
| Usage statistics*: - 30% of the data used;
- 30% of tables/views/files used;
- 50% of the transformations used.
*which lead to a decision-making report or application use |
|
|
|
|
|
Processing: - 1,500,000 requests per day,
- 880,000 insert/update/merge
... |
|
| |
|
|
|
Three major challenges have been identified to ensure the success of this migration. |
- It was necessary to be able to define in continuous time the existing state on the source platform, with all its dependencies;
- It was necessary to be able to make a permanent inventory of the progress of the migration: what has been migrated, what needs to be migrated;
- The migration process needed to be shared with everyone to avoid misunderstandings.
Automating these tasks was essential. |
|
|
|
|
|
Step #1 Mastering the source platform by mapping it |
|
| |
|
|
|
| {openAudit} made it possible to control internal processes via physical data lineage, at the field level, in Bteq, but also in ETL/ELT, Views, Macros, and other scripts associated with feeding the flows. |
|
|
|
|
|
| {openAudit} helped to identify information usage patterns, through an analysis of audit database logs, for data consumption and injection. |
|
|
|
|
|
| {openAudit} analyzed the scheduling of tasks and linked it to data lineage, as well as to data usage. |
|
|
|
|
|
| {openAudit} highlighted the impacts on data visualization tools associated with Teradata (e.g., Power BI, SAP BO...), to glimpse the related complexity (business rules) and to be able to do truly end-to-end data lineage. |
|
|
|
|
|
| Step #2 Automating migration |
|
|
|
|
|
Through a series of mechanisms, {openAudit} has reproduced the essential processing in BigQuery : parsing, then production of an enriched standard SQL. Note that some encapsulations (Shell, others) may degrade the output. It should also be noted that the existence of an ETL/ELT process in the source system necessitates a transposition of the processes. For some of these, {openAudit} can accelerate the project. |
|
|
|
|
|
Step #3 Mastering deployment in GCP through mapping |
|
| |
|
|
|
| {openAudit} performed dynamic parsing of BigQuery, scheduled queries, scripts of views and load files such as Json, CSV, to enable intelligent construction of flows. |
|
|
|
|
|
| {openAudit} analyzed the logs in Google Cloud's Operations (Stackdriver) to immediately understand the uses of the information. |
|
|
|
|
|
| {openAudit} defined the task scheduling, to link it to data lineage and data usage. |
|
|
|
|
|
| {openAudit} has introspected certain "target" data visualization technologies that rely on GCP (Looker, Data Studio, BO Cloud, Power BI ...), in order to be able to reconstruct intelligence by comparing the responses. |
|
|
|
|
|
Furthermore, the connectors were able to be migrated to BigQuery (in the case of connectors with performance degradation via the datometry hyper-Q middleware). |
Conclusion We do not believe that a migration of such ambition can be organized by "kick off" and "deadlines", but in an intelligent process which is based on a real mastery of the source platform / and the target platform, via a continuous technical introspection of processes and uses, as well as a graphic representation of the Information Systems, which everyone can understand and use. Automating the migration will have undeniable added value in this context. |
|
|
|
|
|
|
|
Commentaires
Enregistrer un commentaire