Snowflake's Baris Gultekin on Unlocking the Value of Data With Large Language Models - Ep. 231

Summary notes created by Deciphr AI

https://podcasts.apple.com/us/podcast/snowflakes-baris-gultekin-on-unlocking-the-value-of/id1186480811?i=1000666122486
Abstract

Abstract

In this episode of the Nvidia AI podcast, host Noah Kravitz converses with Bhares Gultekin, head of AI at Snowflake, about the company's pioneering role in AI and data management. Gultekin outlines Snowflake's evolution from a data warehouse provider to an AI-driven data cloud, emphasizing their innovative AI products like Snowflake Cortex and Arctic. He discusses the importance of integrating AI with data strategies, highlighting customer use cases such as Bayer and Siemens leveraging AI for data analysis and research. Gultekin also reflects on his journey from Google to Snowflake, emphasizing the transformative potential of AI in enterprise applications.

Summary Notes

Introduction to Snowflake

  • Snowflake is an AI data cloud company focused on making data easily accessible and processable for companies.
  • The company has evolved from providing a data warehouse to a data cloud, and now focuses on unlocking more value from data using AI.

"Snowflake is the AI data cloud. We started the journey over a decade ago, focusing on the how do we make data a lot more easily accessible?"

  • Snowflake's journey began with separating data storage from compute, which was a significant innovation in the data space.

Key Terms in Data Management

  • Data Warehouse: Allows efficient large-scale analysis of structured data.
  • Data Lake: Expands the data warehouse concept to include unstructured data.
  • AI Data Cloud: Integrates AI to unlock more value from data.

"Data warehouse allows our customers to very efficiently run massive scale analysis across large volumes of data."

  • Data warehouses are essential for managing and analyzing large volumes of structured data.

Snowflake's AI Strategy

  • AI strategy is deeply intertwined with data strategy.
  • Snowflake provides an AI platform that brings AI compute close to the data, enhancing efficiency and governance.

"There is no AI strategy without a data strategy, is it? AI is fueled with data."

  • AI requires robust data management strategies to be effective.

"We built an AI platform. And with this AI platform, our customers can run natural language analysis, can build chatbots, can talk to their data in natural language."

  • Snowflake's AI platform supports various AI applications, including natural language processing and chatbots.

Product Offerings: Snowflake Cortex

  • Snowflake Cortex: A managed service running large language models within Snowflake, providing easy access to AI for customers.
  • Cortex aims to make AI easy, efficient, and trusted by running it close to the data and providing secure, efficient models.

"Snowflake Cortex is our managed service. It is our offering where we're running a series of large language models inside Snowflake."

  • Cortex simplifies AI deployment by integrating it directly with data storage and governance.

"With Cortex, it is incredibly easy, because AI is running right next to where the data is."

  • Running AI close to the data eliminates the need for complex data pipelines and enhances security.

Use Cases and Industry Applications

  • 2023 was marked by proof of concepts and demos, while 2024 is seeing real production use cases.
  • Example: Bayer uses Snowflake to enable internal teams to query structured data, moving beyond traditional dashboards.

"We're working with global pharmaceutical company Bayer in building an experience where Bayer's internal teams and sales organizations, marketing organizations, can ask questions of their structured data."

  • Bayer's use case highlights the shift from dashboards to more interactive, AI-driven data querying.

Customer Concerns and Challenges

  • Customers are interested in leveraging AI to improve processes and explore new ways of working.
  • Concerns include understanding AI capabilities and integrating AI with existing data strategies.

"What our customers are interested in is they are trusting Snowflake with their data. How can they now get the most out of this data?"

  • Trust and effective utilization of data are key concerns for customers.

"In 2023 was, I'd say, the year of proof of concepts, got their hands on AI and then started building demos. And this year, we're starting to see these turn into real production use cases."

  • The transition from proof of concepts to production use cases is a significant trend in AI adoption.

Democratizing Data Access

  • Traditional dashboards are rigid and often lead to more questions.
  • Empowering business users to ask questions of their data using natural language.
  • Companies like Bayer find it valuable to democratize access to data.

"So now we give that power to not only the analysts, but to business users. So business users ask questions of their data drill in natural language, and that's super powerful."

  • This quote highlights the shift from data being accessible only to analysts to being accessible to all business users via natural language queries.

Research Chatbot for Siemens

  • Siemens has developed a research chatbot.
  • The chatbot has access to 700,000 pages of research.
  • Enhances productivity by making data easily accessible.

"They have a large research organization. They've just recently built a research chatbot that has 700,000 pages of research that's now unlocked and available for this research organization."

  • The quote underscores the scale of data made accessible and the productivity gains for the research team.

Technical Process for Data Utilization

  • Different processes are used depending on the customer.
  • Main concerns are quality hallucinations, data security, governance, and cost.
  • Cortex search is a new product offering designed to address these concerns.
  • Hybrid search combines vector search with traditional keyword-based text search.

"Usually three big things emerge. One is they are concerned about quality hallucinations. The second one is they're concerned about security of their data, governance of that system, and finally, the cost."

  • This quote outlines the primary concerns customers have when moving from proof of concept to production.

"We've tuned Cortex search to be the highest quality in terms of a rag solution. We implemented a custom rag solution, we have our own embedding model, and we've built a hybrid search engine that can provide high quality."

  • This quote explains the technical measures taken to ensure high-quality search results and reduce hallucinations.

Reducing Hallucinations in Language Models

  • Model tuning and hybrid search are crucial.
  • Hybrid search helps determine the relevance of documents to questions.
  • LLMs tend to hallucinate when not grounded on data.

"Usually LLMs tend to hallucinate when they are not grounded on data. So the system can know that the match to that question is low. And rather than trying to answer the question, it should just reject it."

  • The quote explains how grounding the model in data helps reduce hallucinations.

Arctic Language Model

  • Arctic is a family of language models developed by Snowflake.
  • Combines mixture of experts model with dense architecture.
  • Focuses on enterprise intelligence, coding, and SQL.
  • Achieves high benchmarks while being cost-effective.

"Arctic is our own language model. It's actually a family of language models. We have the Arctic LLM as well as an embedding model and a document model."

  • This quote introduces the Arctic language model and its components.

"By combining both a mixture of experts model with a dense architecture, we were able to have a very efficient and high quality model."

  • The quote explains the unique architecture of the Arctic model that makes it efficient and high-quality.

Mixture of Experts Model

  • Dense models use all parameters during inference and training.
  • Mixture of experts model uses a larger set of parameters but only activates a subset.
  • More efficient in terms of accuracy and cost.

"In the dense model, all of the parameters are active and they're being used when you're doing inference. Also during training, all of these parameters are active. Mixture of experts model has larger set of parameters, but only a subset of them gets used."

  • This quote explains the difference between dense models and mixture of experts models.

"So you can hone in on the accuracy that you're looking for, but then also it's more efficient because you're turning things on and off as you need them instead of just leaving all the lights on."

  • The quote highlights the efficiency and accuracy benefits of the mixture of experts approach.

Cortex Analyst Product

  • Recently announced Cortex Analyst product.
  • Aims to unlock large amounts of data and make it easily accessible.
  • Converts natural language into SQL, which is a difficult task.

"Allowing more people to have easy access to that data is really important. So far, data teams have to run SQL analysis to get insights from these datasets."

  • This quote emphasizes the importance of making data more accessible and the challenges involved in converting natural language into SQL.

Snowflake's Text to SQL Experience

  • Snowflake has developed a state-of-the-art text to SQL experience.
  • This technology allows business users to ask natural language questions, which are then converted into SQL to generate answers.
  • The complexity of data, including tens of thousands of tables and hundreds of thousands of columns, is managed efficiently.

"We work really hard to have the world's best text to SQL experience, and then we've achieved it."

  • Snowflake's text to SQL system simplifies querying for business users by converting natural language questions into SQL.

"Something like, how is my revenue growing in this region for this product now becomes an easy question to ask for a business user."

  • The technology enables straightforward querying about business metrics, improving accessibility for non-technical users.

Baresh Gultekin's Journey in AI

  • Baresh Gultekin, Head of AI at Snowflake, has a long history in AI, starting at Google.
  • He contributed to the development of Google Now and Google Assistant.
  • His work focused on making technology provide helpful, context-aware information.

"I started this journey at Google a long time ago, and at some point around 2010, 2011, we started building Google Now."

  • Baresh's early work involved creating a product that could provide timely, context-aware information.

"We were able to give helpful information like, hey, there's traffic on your commute and you should just take this alternate route or your flight is delayed."

  • Google Now aimed to deliver practical, real-time information, enhancing user convenience.

"After that, I worked on Google Assistant. And Google Assistant is again, exciting because it understands language. It can respond in natural language."

  • The evolution from Google Now to Google Assistant represented a significant advancement in natural language understanding.

Trust and Verification in Generative AI

  • Generative AI models have creative capabilities but require careful use for factual information.
  • Users need to differentiate between creative and factual queries.
  • Ensuring grounding in data is crucial to prevent hallucinations.

"The types of use cases that are great are when you ask the language models to generate something, to generate content."

  • Generative models excel in creative tasks, such as content generation.

"If my question is a factual question, then I know to be careful."

  • Users should exercise caution when using generative models for factual information.

"If an LLM is provided with grounding, if an LLM is provided with the data, it does not hallucinate."

  • Proper grounding in data can prevent hallucinations in large language models (LLMs).

Snowflake's Approach to Grounding and Evaluation

  • Snowflake emphasizes building systems with proper grounding to ensure accuracy.
  • The company acquired Truera to enhance ML and LLM observability and evaluation.
  • Snowflake provides tools to help customers evaluate model quality and grounding.

"We work hard to provide technology to help our customers, to make their systems, their products, their chatbots, a lot more grounded with the data that they provide."

  • Snowflake aims to ensure that customer systems are grounded and accurate.

"We've acquired a company called Truera just recently. Truera is a company that focuses on ML LLM observability."

  • The acquisition of Truera enhances Snowflake's capabilities in evaluating and ensuring model quality.

Partnerships and Collaborations

  • Snowflake collaborates with major tech companies like Nvidia, Meta, Mistral, and Reca.
  • These partnerships help Snowflake build specific solutions and maintain openness.
  • Snowflake provides high-quality proprietary models and ensures transparency in model training data.

"We have very close partnerships with Nvidia, with Meta, as well as Mistral and Reca, the large language model providers."

  • Snowflake's partnerships with leading tech companies are crucial for developing advanced AI solutions.

"We work very closely with our partners in helping us build specific solutions."

  • Collaboration with partners helps Snowflake create tailored, high-quality solutions.

Snowflake's Global Presence and Infrastructure

  • Snowflake operates globally with around 40 offices worldwide.
  • The company runs its services on major cloud platforms like AWS, Azure, and Google Cloud.

"Snowflake, as I understand it, is a global company, has more than around 40 offices worldwide."

  • Snowflake's extensive global presence supports its wide-reaching operations.

"We're running on the three clouds: AWS, Azure, and Google Cloud."

  • Utilizing major cloud platforms ensures robust and scalable infrastructure for Snowflake's services.

Current Use Cases of AI in Snowflake

  • Production Use Cases: Customers are building practical, production-level use cases with AI integrated into Snowflake.
    • Running large-scale data analysis.
    • Using English language for data categorization and information extraction.
    • Example: Sigma, a BI provider, runs analysis on sales logs to understand win/loss reasons in sales calls.

"We're seeing a lot of super simple, just using English language to be able to create categorization, extract information, and make sense of a lot of data."

  • High-Quality Chatbots and BI Use Cases: Use of AI to enhance chatbots and business intelligence (BI) applications.
    • Chatbots interacting with structured data for BI purposes.

"The bread and butter high-quality chatbots, as well as being able to talk to your structured data for BI type use cases."

  • Rapid Evolution: The AI industry is evolving at an incredibly fast pace, with new advancements happening weekly.
    • The next significant phase involves agentic systems.

"Week over week, we get a new announcement, something new, exciting."

  • Agentic Systems: The future lies in agentic systems capable of reasoning, self-healing, and taking actions.
    • These systems can collaborate and communicate with each other.

"The next big phase that is coming that's already kind of getting traction is the world of agents... this ability to reason, this ability to self-heal, the ability to take action, for agents to talk to each other, collaborate."

Implementation of Agentic Systems

  • Current Status: Agentic systems are currently behind the scenes in Snowflake.
    • Text-to-SQL BI experiences are part of these systems, using a series of tools to deliver products.

"Right now, the agentic systems that we've built are kind of behind the scenes. The text to SQL BI experience that uses a series of tools to deliver the product."

Key Learnings and Experiences

  • Simplifying Complex Processes: Early prototypes showed that integrating compute and AI directly with data simplifies processes significantly.
    • Example: Replacing a complex two-month pipeline with a single line of code.

"Our early prototype was able to replace the full thing with literally a single line of code faster."

  • Challenges in Production Systems: Building production systems, especially with structured data, is challenging.
    • Generating SQL and ensuring high-quality responses are particularly difficult.

"Demos are easy to build, but production systems are hard, especially when it comes to working with structured data. Generating SQL is difficult."

Advice for Aspiring AI Enthusiasts

  • Follow Your Passion: Everyone has a unique path, and it's important to connect with what you're drawn to.
    • AI can seem complex, but diving in and experimenting is the best approach.

"Everyone has their own unique path, and everyone is drawn to something, and it's important to be able to connect to what you're drawn to."

  • Ease of Use: Modern AI systems are user-friendly and powerful, often just an API call away.
    • Creativity will drive the development of new technologies.

"Even though AI sounds intimidating... however, the use of the AI is going to unlock. It's incredibly easy. All of these systems are now an API away and they're incredibly powerful."

Resources and Further Learning

  • Snowflake's Offerings: For those interested in Snowflake's AI capabilities and solutions, the primary resource is their website.
    • Snowflake provides solutions for data analysis and AI integration.

"Our website, snowflake.com, if you are trying to figure out how do I use AI just in seconds and bring my data, analyze my data. We have a solution for you."

Conclusion

  • Ongoing Development: Snowflake's AI journey is just beginning, with many exciting developments to come.
    • Future conversations will delve deeper into these advancements.

"The Snowflake story is a great one and it seems like it's just getting started."

What others are sharing

Go To Library

Want to Deciphr in private?
- It's completely free

Deciphr Now
Footer background
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai

© 2024 Deciphr

Terms and ConditionsPrivacy Policy