Snowflake, the popular cloud data platform, is making significant strides into the world of artificial intelligence (AI) with the introduction of Snowflake Cortex. This groundbreaking framework empowers users to seamlessly integrate generative AI and large language models (LLMs) directly within their Snowflake environment. Launched in public preview on May 22, 2023, Cortex promises to revolutionize data analysis, streamline workflows, and open up exciting new possibilities for extracting value from your data. Let’s dive in and see how Cortex can transform your data operations.
1. What is Snowflake Cortex, and Why Should You Care?
Snowflake Cortex is a comprehensive framework designed to bridge the gap between your data and the vast potential of AI. It offers a suite of tools and functionalities that enable you to:
Build Generative AI Applications: Leverage LLMs directly within Snowflake to create intelligent applications for tasks like text summarization, content generation, and even chatbots.
Enhance Data Analysis: Gain deeper insights from your data using LLMs for advanced analytics, natural language processing, and pattern recognition.
Simplify Data Pipelines: Automate data preparation, cleaning, and transformation using AI-powered capabilities.
Strengthen Data Security: Benefit from Snowflake’s robust security measures while utilizing AI models.
With Cortex, you can harness the power of AI without the need for complex integrations or external tools. It’s all about making AI accessible and actionable within the familiar Snowflake ecosystem.
2. Getting Started with Snowflake Cortex — It’s Easier Than You Think!
One of Cortex’s standout features is its user-friendly approach. You can interact with LLMs using standard SQL queries, which opens up AI capabilities to a broader audience of data professionals. Here’s a simple example:
SELECT SNOWFLAKE.CORTEX.COMPLETE('snowflake-arctic', 'What are large language models?');
I believe this is Snowflake’s superpower. A developer can start asking questions to analyze their data existing in the Snowflake tables.
In this example, we’re querying the “snowflake-arctic” LLM to provide a response to a question about Snowflake’s benefits. The results are seamlessly returned as part of your SQL query. The ease of getting started is a major plus!
SELECT SNOWFLAKE.CORTEX.COMPLETE('mistral-large',CONCAT('Critique this review in bullet points: <review>', content, '</review>')) FROM reviews LIMIT 10;
In one of our early use cases, I have the email messages that are sent by all of our customers. I would like to know what business our customers are doing, and I figured that the topics in the email is a good way to tell. After some trials, I found the subject lines of those emails are the best data for this purpose, because email subjects are much shorter than their bodies and I can concatenate each customer’s all the subject lines together into one long text blob.
SELECTSNOWFLAKE.CORTEX.COMPLETE( 'snowflake-arctic', CONCAT('Use 3 to 5 words to describe the email sender\'s business or the common topics, based on all the email subject lines below, separated by `|`: <subjects>', subjects , '</subjects> \nThe email sender\'s business or the common topics based on the email subject lines are ') ) AS BUSINESS_INTERESTFROM EMAILS;
By applying an LLM to your email subjects, you can quickly extract insights about potential leads, customer needs, or industry trends.
3. Error Handling with Cortex: complete() vs. try_complete()
Cortex offers two functions for interacting with LLMs:
complete(): Provides a direct response from the LLM but may throw an error if the model encounters issues.
try_complete(): Returns a response if successful or NULL if an error occurs.
The try_complete() function is ideal for scenarios where you want to gracefully handle potential errors and avoid disruptions in your data pipelines.
4. Handling Failures and Batch Processing
In real-world scenarios, you might encounter failures when interacting with LLMs due to various factors. To ensure robust processing, it’s recommended to wrap your Cortex calls in a loop and process your data in batches.
CREATE OR REPLACE PROCEDURE run_sql_with_retries(iterations INT, starting INT)
RETURNS VARCHAR NOT NULL
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'run_queries'
AS $$
import snowflake.snowpark.functions as F
def run_queries(session, iterations, starting):
error_log = [] # List to store error messages
for i in range(1, iterations + 1):
offset = 20*(i+starting)
try:
# Example SQL statement with the 'i' parameter
query = f"""
UPDATE EMAILS
SET business = SNOWFLAKE.CORTEX.TRY_COMPLETE(
'snowflake-arctic',
CONCAT('Use 3 to 5 words to describe the email sender\'s business or the common topics, based on all the email subject lines below, separated by `|`: <subjects>',
subjects , '</subjects> \nThe email sender\'s business or the common topics based on the email subject lines are ')
) AS BUSINESS_INTEREST
limit 20 offset {offset}
);"""
session.sql(query).collect() # Execute and fetch results
print(f"Iteration {i} successful") # Optional logging
except Exception as e:
error_log.append(f"Iteration {i} failed: {query}") # Log error
if error_log:
# Log the collected error messages
error_df = session.create_dataframe(error_log, schema=["error_message"])
error_df.write.mode("append").save_as_table("your_error_log_table")
return "Some iterations failed. Check your_error_log_table for details."
else:
return "All iterations completed successfully."
$$;
-- executing the procedure
call run_sql_with_retries(30,31);
5. Pricing and the Free Arctic Model
Like most of the LLM providers today, Snowflake Cortex pricing is based on the compute resources used and the number of tokens processed by the LLM. The highest priced model is Reka-core, while the lowest priced model is $0/1M tokens.
To encourage experimentation and adoption, Snowflake is offering the “snowflake-arctic” LLM model for free until July 1, 2024. Make sure you subscribe to my articles, so that you will not find this article until the free time period is over.
Conclusion
Snowflake Cortex is a game-changer, bringing the power of AI directly into your data warehouse. Its ease of use, versatility, and integration with Snowflake’s security features make it an attractive option for businesses looking to unlock new insights and streamline data operations. By following the examples and best practices outlined in this article, you can start leveraging Cortex to gain a competitive edge in your industry.
Comments