Snowflake: Simplifying Machine Learning

Simplifying Machine Learning

Introduction: Snowflake's Strategy

Snowflake's mission revolves around making data accessible, usable, and valuable to everyone. One of the pillars that have made Snowflake an industry leader is its unwavering commitment to being easy to use and turnkey. A testament to this is a quote from Snowflake's CEO, Frank Slootman, from the Q2 '23 Earnings call:

Where you see huge differences is in the total cost of ownership, and that is not the cost of compute and storage. In other words, what is the cost to run that technology? This is where Snowflake has a huge advantage, and our customers know that. It’s just reduced skills sets, far fewer people, and not having to touch the complexity of the underlying platforms. We’re more descendants of the Apple and Tesla then being the descendants of Hadoop, like some people are in the marketplace, right? So we have really abstracted the complexity. And that’s what generates these TCO advantages. But the raw cost of computing and storage, there’s not that much opportunity to be had there.

Building on this strategy, Snowflake continues to innovate and streamline even the most complex of tasks. Looking ahead, we can expect the following items to be shaped by Snowflake's signature user-friendly approach:

Containers: Offering isolated environments for apps.

  • Low Management: Reducing operational overheads.

Unstructured Data: Simplifying the management of diverse data.

  • DocumentAI: Advanced document processing and insights.

Machine Learning: User-friendly ML tools and functionalities.

  • ML SQL Functions: Embedding ML capabilities in SQL.

AI

  • AI with NVDA: Collaboration for cutting-edge AI tools.

  • Microsoft: Partnership to bring MicrosoftAI directly to the Data Cloud

  • LLMs over Company Data: Expanding data reach and utility.

Data Applications: Making data-centric apps more accessible.

  • Streamlit

  • Native Application Framework: Seamless integration for app development.

Machine Learning SQL Functions in Open Preview

There are many reasons why usability testing matters and is so important for any project or application. UX designers are often reminded of the phrase “You are not the user”. This is an important phrase to remember, because even if a design makes perfect sense to the designer, that does not mean that the user will have a good experience with it. This is also the whole goal of User Experience design, making sure that your application provides the best user experience to the widest range of users. It is also important for early issue detection. Think of someone proofreading a paper you wrote and catching a grammar mistake, even though you did not notice it. The same is true for usability testing. Often, we are too close to the design and a fresh pair of eyes can help us find things we would have never noticed.

Designing a Usability Test

Snowflake's ML SQL Functions currently in open preview are transforming the way we view SQL and ML. The three forerunners are:

1.     Forecasting: Predict future values based on past data. Ideal for sales predictions, stock trends, and more.

2.     Anomaly Detection: Identify unusual patterns in data that don't conform to expected behaviors. Useful in fraud detection, system health monitoring, etc.

3.     Contribution Explorer: Understand contributing factors to a particular outcome. It’s like getting a 'why' to every 'what'.

Requirements & Limitations

As with any tool in development, there are requirements and limitations. Here are the current constraints for these functions:

  • Maximum of 500,000 rows for model training.

  • Minimum of 12 rows for model training.

  • 1-second minimum granularity.

  • Seasonal components have a 1-minute minimum granularity.

  • Timestamps must have FIXED intervals.

  • Season length of autoregressive features tied to input frequency.

  • Existing models cannot be updated; a new one must be trained.

  • Outliers can influence the algorithm. Users may need to remove if undesired.

  • Model cloning across accounts is not possible.

Getting Started with ML SQL Functions

Diving into these functions involves a systematic process:

  1. Prepare Data: Organize and clean your data to ensure its readiness.

    • MOST IMPORTANT STEP

  2. Create Model: Set up the foundation for your machine learning model.

  3. Train Model: Use your data to train and refine the model.

  4. Harvest Data: Extract insights and results.

Example:

I have a dataset with the closing price data for all the stocks in the Nasdaq & DOW. I want to run predictive analysis over the dataset for the next 2 months. I want to train the model on data beginning on 1/1/2019.

PREPARE DATA

Views are your friend in this step. This is where the data is further prepared for ML. Doing things to meet the requirements. For this stock dataset, there are a few things to handle:

  • 1.     There are tickers with less than 12 rows (new IPO or stock that came off the market within 12 days of the beginning).  

    • Exclude these records thru a view

  • 2.     There is a date column but I need this to be a timestamp data type

    • Change the datatype to a timestamp in a view

  • 3.     Weekend and holiday data does not exist. Need to meet the FIXED intervals by mocking up data for those dates.

    • Have missing data show as previous close price thru a view

  • 4.     When training on larger sets, its important he final view be ordered by the TIMESTAMP column

CREATE MODEL

Now the hard work is done. We create the model.

TRAIN MODEL

Train the model for 60 forecasting periods. This step can take a long time but upping the warehouse can reduce that time.

HARVEST DATA

If using straight SQL, use the RESULT_SCAN function to put the results from the previous step into a table for further analysis.

Conclusion

Snowflake continues to shape the future of data analysis and machine learning by introducing powerful yet user-friendly tools. As we anticipate further innovations and improvements, it's clear that with Snowflake, machine learning truly is for everyone.

Dive in, explore, and harness the power of data like never before!

Andy Wickman

Seasoned technology leader with over 20 years of experience in the IT industry, has consistently demonstrated success in various leadership roles. With a strong background in databases and a proven track record of delivering projects on time, Andy has a keen ability to identify and execute corporate strategic goals.

A forward-thinking innovator known for strong problem-solving skills and unwavering work ethic, enable him to effectively manage multiple complex projects for a diverse range of customers. Extensive experience in the IT domain allows him to provide valuable insights and share his technical expertise with clients and senior management alike.

Andy’s ability to "coach up" staff, coupled with his strong business acumen and technical vision, have contributed to his success in improving processes across the board. His effective people skills have also made him a sought-after leader and collaborator in the industry.

In his technical blog posts, Andy shares his wealth of knowledge and experience, providing readers with valuable insights into the rapidly evolving world of technology and database management. As a self-motivated professional who requires minimal supervision, Andy continues to pave the way for innovation and progress in the IT sector.

Previous
Previous

Creating a Web App with Snowflake and Streamlit

Next
Next

Can AI take over Data and Analytics?