Abstract
Talend is an open-source software vendor providing data integration, data management, enterprise application integration, and big data software and services. Talend Integration Cloud is a Platform as a Service (PaaS) solution supporting customers throughout the entire process of collecting, organizing, enriching, analyzing, and acting on Big Data in their enterprises. Talend needed to launch the product without the large upfront commitment of its own data center. Since customers could run their own code in the PaaS environment, they needed a provider that could offer fine-grained access control to all compute, storage, and network resources.
CONTENT
PART 1 – INTRODUCTION
- GOOGLE BIGQUERY
- ON PREMISES SQL SERVER DATABASE
- TALEND
PART 2 – DATA LOADING INTO GOOGLE BIGQUERY
- Using Talend
1. Introduction
1.1 Introduction to Google BigQuery
- Serverless, highly scalable, and cost-effective multi-cloud data warehouse designed for business agility.
- Democratize insights with a secure and scalable platform with built-in machine learning
- Petabytes of data using ANSI SQL at blazing-fast speeds, with zero operational overhead
- Insights with real-time and predictive analytics
- BigQuery ML: BigQuery ML enables data scientists and data analysts to build and operationalize ML models on planet-scale structured or semi-structured data.
- BigQuery GIS: BigQuery GIS uniquely combines the serverless architecture of BigQuery with native support for geospatial analysis.
- BigQuery BI Engine: An in-memory analysis service built into BigQuery that enables users to analyze large and complex datasets interactively with sub-second query response time and high concurrency.
- Connected Sheets: Connected Sheets allows users to analyze billions of rows of live BigQuery data in Google Sheets without requiring SQL knowledge.
1.2 Overview of SQL Server Database
- A database in SQL Server is made up of a collection of tables that stores a specific set of structured data. A table contains a collection of rows, also referred to as records or tuples, and columns, also referred to as attributes. Each column in the table is designed to store a certain type of information, for example, dates, names, dollar amounts, and numbers.
- SQL Server Databaseis a version of Microsoft’s relational database management system (RDBMS).
- Integrate and secure machine learning applications with scalable performance.
- Run applications and deploy databases across multiple operating systems and platforms with compatibility.
- Reduce the risk of upgrades while using new SQL capabilities when you are ready though inbuilt database compatibility levels.
1.3 Overview on Talend
Talend is an open-source data integration platform. It provides various software and services for data integration, data management, enterprise application integration, data quality, cloud storage and Big Data. Talend first came into the market in 2005 as the first commercial open-source software vendor of data integration software.
2. DATA LOADING INTO BIGQUERY USING TALEND
2.1 Assumptions for the load
- You have configured Google BigQuery.
- You have connected to On premises database.
- You have installed Talend.
2.2 Job Creation
- Connect to Talend Open Studio and Create a project.
- On the “Job Design” section, Right click and click on “Create Job”
- Provide a specific name to the job and click on “Finish”.
2.2 Database Connection
- Expand the “Metadata” section.
- Right click on Db Connections > Create Connection.
- Select the Database Type from the dropdown list.
- Enter the Database connection credentials and test the connection.
- On Successful Validation, Click on “Finish”.
- Once the DB connection is setup, Right Click on it and select “Retrieve Schema”.
- This should fetch a list of all the objects in the schema.
- Select the table(s) that you want to migrate to BigQuery.
- Once Done, Click on “Next” and then “Finish”.
- Drag the DB connection that you just created to the designer pane.
- On the Component pop-up, Select “tDBInput(Microsoft SQL Server)” as we are using SQL Server as our source.
- Click on “OK”.
- Drag “tMap” and “tBigQueryOutput” from the component pane to designer pane.
- Select the SQL Server Component.
- On the component tab, Change the “Property Type” value change the value to the connection that we created.
- Enter the name of the Table in “Table Name” field.
2.3 Mapping
- Connect all the components.
- Click on the tMap component and click on ellipses icon against “Map Editor”.
- Drag all columns you want to load to the right pane.
- Once done, Click on Apply and OK.
- Select the “tBigQueryOutput”, On the component tab enter the GCP Project details.
- Enter the Google Storage details in the “Google Storage Configuration” block.
- Save the changes.
- Navigate to “Run” tab, click on “Run”.
- The load should start and once completed should show the number of row loaded.
- Navigate to BigQuery, Check the table has been created with all the records available.
Conclusion
Talend is one of the most used integration tool for its vast capabilities and open source license. It is also cabable of integrating modern cloud data warehouse infrastructures such as Google BigQuery, AWS Redshift, Snowflake etc. In this blog connection to BigQuery with SQL Server is performed.
Nice work!