Updated: Feb 17
Organizations with huge amounts fof data thrive on advanced Gn analytics are constantly f F l FCC ooking for different r to manage the increasing volumes of data. Often a successful company is based on how the data has been effectively managed and how the business has taken strategic decisions based on their available statistical data.
Snowflake Data cloud is powered by an advanced data platform provided as SaaS (Software as a Service).
Snowflake enables every organization to mobilize their data to Snowflake data cloud. Whilst data migration below are the key areas to be focused on:
1. Data storage
2. Data Processing
3. Cloud Analytic Solutions
When data is loaded to Snowflake, it is organized into compressed and columnar format as that of HANA database. The data and meta data of the data is stored in predefined structural format in Snowflake, which can be easily queried on the database.
Snowflake processes data thru queries using 'Virtual Warehouses'. Every Virtual Warehouse is composed of several nodes, which can be utilized parallelly. A single Database can have multiple Virtual Warehouses and each Virtual Warehouses
Cloud Analytic Solutions:
The cloud layer has a collection of services that help in users to process request from login to query results.
This includes Infrastructure management, meta data management, authentication and query optimization.
Stages in Snowflake Implementation
Below is a pictorial representation of Snowflake implementation.
Best Practices for Snowflake Implementation:
Below are few of the key considerations during a Snowflake implementation.
Quicker data load timing meaning getting greater business value. With the hug volumes of data business would like to have the data loaded to different tables and schemas in an organized format. Each data base should have to have minimum of 3 layers.
Staging layer consists of data ingested as is from different sources.
Transformation layers consists of different business rules/logics and
Presentation layers used for user interface based on the reporting solutions.
Additional layers for data security may also be included
After data loads into the staging area, Snowflake can leverage several tools to cleanse the data and store it in an organized format.
Complex pipelines can be break down into smaller pieces to write data into intermediate tables.
By leveraging Snowflake API, data scientists can use Spark, Python to analyze data more statistically.
Selecting a optimal data warehouse is a key factor for many organizations. Data engineers have various options to determine the right combinations of the data warehouse. Snowflake cater warehouse in different shirt size based upon the usability of the warehouse.
Snowflake is quiet flexible in scaling up/down of the resources in matter of secs through SQL queries.
Best practices to choose a data warehouse:
Consider Virtual Warehouse Options. Virtual warehouse are a cluster of compute resources in Snowflake which provide users to execute DML operations thru simple SQL statements. Best practice includes to use various shirt sizes before actually deciding the correct size. Snowflake has a default option of 'auto suspend' feature depending on the usage of warehouse to reduces the maintenance costs. The costs of warehouse is calculated based on how long the warehouse runs continuously. For instance if a X-Small warehouse is used for 60 seconds in a go, 0.017 credits are used and same 60 seconds is used on a X-Large warehouse 0.267 credits are used. Also we need to consider the fact the the warehouse size is not directly proportional to the data loading performance. Because there are several other factors come into picture for the data load performance, for instance the geographical place where the Virtual data warehouse is situated and the location of the on-premise data and also the several mapping rules during data ingestions also contribute the data load performance. It is good practice to scale up the warehouse size whilst loading historical data and scaled down during BAU for the incremental loads.
Fig(1). Warehouse Size. Photo from Snowflake official documentation website.
Scale up of resource is quick and can also be automated. Below is simple table which illustrates Sizing with respect to the activity.
Resource Monitoring: Resource monitoring is key objective of an Operations team during sustain phase. Monitoring has below attributes:
Credit quota - Assign Snowflake credits to monitor for regular intervals. For instance, set a credit limit of 100 credits, if the usual usage is 70 credits.
Schedule level - Schedule regular checks relative to the start date. Like weekly, monthly and so on.
Actions: Notify and Suspend warehouses which reach a specific assigned credit limit.
The syntax on how we can create a resource monitor.
It is essential for all organizations to adhere to best practices to ensure that the implementation of Snowflake is seamless and successful. Also data engineers to comply the development standards to minimize the operational costs in a long go. However, as Snowflake Cloud services are quiet flexible and robust, which enables organizations to easily scale-up or down the resources at any juncture incase of fall.