A Data Lake on GCP with Data Fusion
With an ever-increasing number of organisations migrating their data platforms en route for the
cloud, there is also a demand in favor of cloud technologies that permit utilizing the on hand skill
sets within the organization at the same time as also making sure successful migration.
ETL developers time and again form a substantial part of data teams in lots of organisations. These
developers are knowledgeable within the use of GUI based ETL tools over and above complex SQL
and also have or are beginning to build up programming skills in languages like Python.
There are some broad requirements for the GCP data orchestration:
Control accessible ETL skill set existing in the organisation
Intake from mix sources such as on-premise SQL Server sources
Support multifaceted dependency managing in job orchestration, not just for the intake
jobs, but also conventional pre and post intake tasks.
Allow data discoverability at the same time as still ensuring suitable access controls
Architecture intended for the GCP data orchestration to meet above prerequisites is shown
underneath. The key GCP services mixed up within this architecture take account of services for
data integration, storage, orchestration and data finding.
GCP makes available an all-inclusive set of data and analytics services. There are manifold service
options available for each capacity and the choice of service necessitates architects and designers to
think about a few aspects that apply to their exclusive scenarios.
There are manifold ways to design the architecture by way of diverse service combinations and
what is explained here is simply one of the ways. Depending on top of your unique prerequisites,
priorities and considerations, there are other techniques to architect a data lake on top of GCP.
Data Integration Service
The decision tree underneath details the considerations mixed up in choosi