Amazon Redshift data warehouse is an enterprise-class relational database query and management system.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service.
You can start with just a few hundred gigabytes of data and scale to a petabyte or more.
Amazon Redshift offers fast query performance using the same SQL-based tools and business intelligence applications that you use today.
Features of Amazon Redshift
Massively parallel processing
A large processing job is organized into smaller jobs which are then distributed among a cluster of processors.
Columnar data storage
column-oriented databases allow for increased speed when it comes to accessing large amounts of data.
Data compression
It takes lesser read operations to copy data from disk to memory
Query optimization compiled code
Redshift query optimizer is programmed to compile SQL in a fashion that supports parallel processing and data compression.
Customer Use Cases, Is it for You?
Traditional Enterprise DW
Reduce costs by extending DW rather than adding hardware
Migrate completely from existing DW systems
Respond faster to business
Companies with Big Data
Improve performance by an order of magnitude.
Make more data available for analysis
Access business data via standard reporting tools.
SaaS Companies
Add analytic functionality to applications.
Scale DW capacity as demand grows.
Reduce HW & SW costs by an order of magnitude.
1.2 Matillion Introduction
Matillion is an ETL/ELT tool built specifically for cloud database platforms including Amazon Redshift, Google BigQuery and Snowflake.
It is a modern, browser-based UI, with powerful, push-down ETL/ELT functionality.
Unlocks the power of your data warehouse.
Modern, beautiful, browser-based environment,
Launch and be developing ETL jobs within minutes.
Matillion extracts your data, loads it into your cloud data warehouse, then uses the power and flexibility of the cloud to transform your data into meaningful business insights.
Advantages of Matillion
Simplicity
Intuitive UI and approach to data transformation makes complex tasks simple.
Speed
Fast time to value, from launch to develop to production.
Scale
Built to take advantage of the power and features of Amazon Redshift, Snowflake, and Google BigQuery.
Savings
Pay as you go with no long-term commitments.
2. DATA LOADING INTO REDSHIFT USING MATILLION
Assumptions for the load:
You have a configured Amazon Redshift cluster.
You have a configured S3 bucket.
You have configured Matillion for Amazon Redshift.
Logging in to Matillion
Enter your URL of Matillion.
On the login page, Enter your “User ID” and “Password”.
Click on Log In.
On the Join Project Page, Click on Create Project.
On the Project Details page, Enter a name for “Project Group” and “Project Name”
Click on Next
On the next page, We will need to provide our AWS connection details.
Enter a arbitrary name for environment in the “Environment Name” field.
Click on Manage and enter the AWS credentials when ever required.
Click on Next.
In the Redshift Connection page, Enter the Endpoint, Port, DB Username and Password of Redshift database.
Click on Next.
On the Redshift Default page, Enter the Redshift Database, Schema and S3 Bucket name.
Click on Test and make sure you get “Success” message.
Click on Finish.
On the left pane, Right-Click on the folder.
Click on Add Orchestration job.
Now we will see a start component on the right pane.
Drag the “S3 Load Generator” from the Components pane and drop it near the start element.
Click on the ellipse on the right corner of the page and select the file you want to load from S3 bucket.
Click on Get Sample to see the sample data.
Click on Next.
Review the file and column details.
Click on Next.
On The validation page, Click on Test.
Once successful, Click on Finish.
Two elements (Create Table Elementa and S3 Load) should appear now.
Connect the Start element with Create Table element.
Enter a name for New Table Name.
On the Create/Replace option, Select a option as per preference.
Click on S3 Load element.
Enter the name of target table same as you entered in the last step
Right-click on the right pane and click on Run Job.
In the task pane, You should be able to monitor the task status.
Once completed you should see a completed status (Green tick).
Thank you for reading this blog. Hope you liked it. For more enquiries contact pdash@prisoft.com