Optimizing Storage Costs with Cloudflare R2 for Data Ingestion into Snowflake

Optimizing Storage Costs with Cloudflare R2 for Data Ingestion into Snowflake

In an increasingly cost-conscious business landscape, one area ripe for exploration is migrating data lakes away from hyperscalers. Egress fees, which are charges incurred when transferring data out of a cloud provider's network, are a significant cost driver within hyperscalers. Cloudflare not only offers lower storage costs, but also eliminates egress fees altogether. This can result in substantial savings for companies, depending on their data usage patterns. Moreover, Cloudflare provides a generous free tier of up to 10 GB per month.

Snowflake, the leader in next-gen cloud computing, offers faster and more cost-effective solutions than its competitors, especially when compared to hyperscalers. Although Snowflake is designed for high data consumption, it is often more affordable for small businesses compared to using a hyperscaler environment. This will be discussed further in a future blog post.

Recently, I stumbled upon a blog by Felipe Hoffa that detailed using Cloudflare R2 as the data storage location for external tables in Snowflake (https://medium.com/snowflake/snowflake-tables-on-cloudflare-r2-b5496c4ae9ec). I had a project that required less than 5 GB of storage per month, and I wanted to avoid paying for storage if possible. Therefore, I decided to experiment with Cloudflare's free tier as my data lake. The results were nothing short of impressive.

Here's a step-by-step guide to what I did:

1. Create FREE Cloudflare account

2. Enable Cloudflare R2 (a credit card is required, but it's free as long as you stay within the free tier limits)

3. Created API token to the bucket (will need this when creating the Snowflake Stage)\

4. Create FREE snowflake testing account

  • https://signup.snowflake.com

5. Since s3 compatible storage is a preview, need to put in a ticket with Snowflake support to add the endpoint for usage.

6. Create Snowflake Table

7. Create File Format

8. Create External Stage (with API token information)

9. Stage file in Cloudflare bucket

  • I just did a drag & drop

10. COPY INTO command to load the data into my RAW table location

Here are my results!

These are the current vendors supporting S3-compatible storage:

  • Cloudflare

  • Cloudian

  • Dell

  • Hitachi Content Platform

  • MinIO

  • NetApp (StorageGRID)

  • PureStorage

  • Scality

By leveraging Cloudflare R2 as a data lake for ingestion into Snowflake, businesses can optimize their storage costs and potentially benefit from significant cost savings.

Contact Moser Consulting for more information.

Andy Wickman

Seasoned technology leader with over 20 years of experience in the IT industry, has consistently demonstrated success in various leadership roles. With a strong background in databases and a proven track record of delivering projects on time, Andy has a keen ability to identify and execute corporate strategic goals.

A forward-thinking innovator known for strong problem-solving skills and unwavering work ethic, enable him to effectively manage multiple complex projects for a diverse range of customers. Extensive experience in the IT domain allows him to provide valuable insights and share his technical expertise with clients and senior management alike.

Andy’s ability to "coach up" staff, coupled with his strong business acumen and technical vision, have contributed to his success in improving processes across the board. His effective people skills have also made him a sought-after leader and collaborator in the industry.

In his technical blog posts, Andy shares his wealth of knowledge and experience, providing readers with valuable insights into the rapidly evolving world of technology and database management. As a self-motivated professional who requires minimal supervision, Andy continues to pave the way for innovation and progress in the IT sector.

Previous
Previous

Tuning a Snowflake Database for Improved Performance and Cost Savings

Next
Next

What is Snowflake? Exploring the Database of the Year and Its Popularity