Charles Giancarlo, Pure Storage | AI Infrastructure Silicon Valley - Executive Series

Summary notes created by Deciphr AI

https://youtu.be/PPIgJNkm0N4?si=H9EU2KpYBG9l5iTO
Abstract

Abstract

Charlie Giancarlo, CEO of Pure Storage, discusses the resurgence of hardware systems and the transformative impact of AI on compute, networking, and storage. He highlights the hybrid cloud model's dominance, emphasizing the need for on-prem solutions to mimic cloud functionality. Giancarlo underscores Pure Storage's innovations, particularly in virtualizing storage, which enhances performance and cost efficiency. He also notes the critical role of AI in data management and security, and previews future advancements, including potential replacements for spinning disks in hyperscaler environments. Pure Storage's continued growth and market share expansion reflect its strategic focus and technological leadership.

Summary Notes

Evolution of Hardware Systems and AI

  • Hardware systems are experiencing a resurgence in importance, focusing on maximizing performance, energy efficiency, and capabilities.
  • The current excitement around AI is driving innovations in hardware systems.
  • AI is creating opportunities in machine learning, training environments, inference, retrieval augmented generation, and product simplification for customers.

"Hardware systems are cool again right, uh we've gone back, uh from you know very high level applications with lots of words and sound bites but down to... how do we squeeze more performance, more energy, more capabilities out of the systems that we're put in place."

  • Emphasizes the shift back to focusing on hardware performance and efficiency in the context of AI advancements.

"What we're seeing in the AI space is that it's opening up multiple layers of opportunity, one is in the entire machine learning environment... but I think in our world that's going to be secondary to what Enterprises do with AI and how they use it for inference and retrieval augmented generation."

  • Highlights the various layers of opportunity AI presents, particularly in enterprise applications beyond just machine learning.

Hybrid Cloud Environment

  • The trend is shifting from an all-cloud approach to a hybrid environment where some workloads remain on-premises due to cost, performance, and security concerns.
  • Enterprises aim to make their on-premises environments operate like the cloud, ensuring transparency and efficiency.

"Last decade the view was everything was moving to the cloud and eventually it would just be Cloud. This decade it's very, very clear to everybody and to the Enterprises that it's going to be a hybrid environment."

  • Indicates the shift in enterprise strategy from a pure cloud approach to a hybrid model.

"What everybody wants is to act like the cloud. They want to be able to have their environment behave like the cloud so a lot of the challenge for companies that supply Enterprises is how do we make their environment their data centers operate more like the cloud."

  • Explains the demand for cloud-like operations in on-premises environments to achieve transparency and efficiency.

Cost Implications of Cloud vs. On-Premises

  • Large organizations have the potential to operate at lower costs than the cloud, but this depends on their willingness and capability.
  • For midsize businesses, the cloud is generally less expensive.
  • Production workloads in the cloud can be very costly for large enterprises.

"You can't separate the cloud itself from the business model that supports the cloud... any large organization has the economics to be able to operate at lower cost than the cloud."

  • Discusses the economic considerations for large organizations when choosing between cloud and on-premises solutions.

"For midsize business, the cloud is going to be less expensive. For any large-scale Enterprise business Global 2000, then it's going to be well where how do they want to balance it because the economics for production workloads is very expensive in the cloud."

  • Highlights the cost dynamics for midsize businesses versus large enterprises regarding cloud usage.

Data Security and AI

  • Data security and role-based access controls are significant challenges in the widespread adoption of AI in business.
  • Enterprises are cautious about sharing data with cloud-based models due to security concerns.
  • These issues may be mitigated in the next few years, but currently, they impede the adoption of AI.

"One of the many challenges still with AI in terms of its use every day by business is data security role-based access controls which there's not a good answer for right now."

  • Points out the current challenges in data security and access controls that hinder AI adoption in enterprises.

"A lot of companies rightfully so are retaining their own data and being very, very careful with what they allow, especially cloud-based models, but frankly any model to be able to get access to."

  • Emphasizes the cautious approach enterprises are taking towards data security in the context of AI and cloud models.

Storage and Networking for AI

  • Storage and networking requirements change based on the type of AI workload, such as preprocessing, training, and real-time iteration.
  • High read/write intensity is crucial for AI models, particularly during training and real-time iteration phases.

"I just wrote a research piece around... storage and networking for Gen I want to get your thoughts on my storage the read write intensity changes based upon things so pre-processing read WR intensities High training a lot of reading."

  • Discusses the varying storage and networking needs for different AI workloads.

"Writing is extraordinarily important... take a large AI model, it may run days."

  • Highlights the importance of writing in the context of training large AI models, which can take days to complete.

Checkpointing in High-Performance Computing

  • Checkpointing is a critical process in high-performance computing (HPC) to prevent data loss during long computational tasks.
  • It involves periodically saving the state of the system's memory to storage, allowing tasks to resume from the last checkpoint in case of a failure.
  • Checkpointing must be performed quickly and efficiently to minimize downtime and avoid bottlenecks.

"What they do is every period of time, could be five minutes, could be half an hour, they what's called checkpoint. They take all the data that's in memory of all the systems and they write it."

  • Explanation: This quote explains the basic concept of checkpointing, where data in memory is periodically saved to storage to prevent loss during long computational tasks.

"If you can't write at very high speed, you just can't provide those systems are just not going to work. The bottlenecks, they'd be serious bottlenecks in writing."

  • Explanation: High-speed writing is crucial for efficient checkpointing. Slow writing speeds can create significant bottlenecks, hindering the performance of HPC systems.

Changes in Storage for AI and Analytics

  • Traditional AI and analytics involved replicating data from production systems to dedicated systems like data lakes or data warehouses.
  • This process is costly and inefficient, leading to redundant storage and outdated data.
  • Modern AI, especially Retrieval-Augmented Generation (RAG), requires real-time access to data, making traditional methods less viable.

"What you would do is you would take your data that is stored on your different production systems and you would replicate it or you would copy it onto a dedicated system typically called a data lake or a data warehouse."

  • Explanation: This quote describes the traditional method of handling data for AI and analytics, which involves duplicating data onto dedicated systems.

"Why should you have to replicate and have twice the amount of storage or three times the amount of storage to be able to do that and then it's old data?"

  • Explanation: The inefficiency and redundancy of traditional data handling methods are highlighted, questioning the necessity of duplicating data and dealing with outdated information.

Modern Data Stack and Virtualization

  • The modern data stack is evolving from physical, fixed environments to virtual, federated systems.
  • Developers now prefer abstracted, virtual stacks where resources like compute, networking, and storage are dynamically allocated and managed.
  • Virtualization has been common in compute and networking but is now being extended to storage.

"What they want is yes, you have to build a stack, but they want to be a virtual stack and they want it Federated."

  • Explanation: This quote emphasizes the shift towards virtual and federated stacks, where resources are abstracted and managed dynamically.

"Compute has been virtualized since the let's say almost two decades now starting with VMS right virtual machines and now with containers in Kubernetes even more exciting more virtualized more Dynamic."

  • Explanation: Virtualization in compute has evolved over decades, starting with virtual machines and advancing to containers and Kubernetes, offering more dynamic and efficient resource management.

Fusion and Virtualized Storage

  • Fusion is a technology that virtualizes storage, making it appear as a pool of resources rather than a dedicated array.
  • This approach allows applications to write to an API, with data being stored based on policies without worrying about physical locations.
  • Virtualized storage extends beyond single physical boxes and data centers, providing a cloud-like storage experience within enterprises.

"Storage has never been virtualized before. This is what we're doing with something we call Fusion that again allows instead of it being a dedicated array now it just looks like a pool of storage."

  • Explanation: Fusion virtualizes storage, transforming it from dedicated arrays to a pool of resources, simplifying data management and access.

"The computer, the application has an API, it writes to the API, it gets written somewhere based on policies, and the developer doesn't have to worry about where it might be."

  • Explanation: With Fusion, applications interact with an API for storage, and data is managed based on predefined policies, freeing developers from concerns about physical storage locations.

Evolution of Storage and Systems

  • The evolution of storage technology, particularly with flash storage, has significantly improved performance and efficiency.
  • Companies like Pure Storage have transitioned from being storage-centric to systems-centric, focusing on comprehensive data management solutions.
  • The goal is to create seamless, efficient systems where data is readily accessible and managed effectively.

"The evolution of storage just gets better. Flash changed the game continued and even just as we see you're not a storage company, you're a systems company."

  • Explanation: The quote highlights the continuous improvement in storage technology and the shift of companies like Pure Storage towards broader systems solutions.

"Everything you're talking about is just the data needs to be somewhere exactly. I mean, you're going to make a pool."

  • Explanation: This quote underscores the fundamental need for efficient data storage and management, aligning with the concept of creating a virtualized pool of storage resources.

Making Hardware Disappear

  • The goal is to make hardware invisible to the developer, abstracting it away.
  • At the end of the day, everything runs on hardware, but the modern layer aims to make it look like a developer platform.
  • This platform is horizontally scalable with built-in governance and an open catalog.

"We actually want to make the hardware disappear. At the end of the day, everything is hardware; it's got to run on something."

  • Emphasizes the importance of abstracting hardware complexities away from developers.

AI and Partnerships

  • AI is pushing the business forward, with significant partnerships such as Nvidia.
  • Enterprises are moving quickly to adopt AI, recognizing data as intellectual property.
  • Private AI and Sovereign AI are hot topics, with development happening on-premise.

"AI is pushing you guys hard into the development. You got a partnership with Nvidia."

  • Highlights the influence of AI and strategic partnerships on business growth.

AI Environments and Deployments

  • Examples of large-scale AI environments include Meta's research supercluster with 24,000 GPUs and half an exabyte of storage.
  • Partnership with Nvidia includes deployments in RAG (Retrieval-Augmented Generation) and inference designs for domain-specific data.

"Meta for a long time has been a longtime customer in their AI environments. Still, I think the largest in the world is the Meta research supercluster."

  • Demonstrates the scale and significance of AI environments and partnerships.

Adoption of AI in Different Verticals

  • Adoption varies across different verticals like Pharmaceuticals, Medical Technology, and Finance.
  • Parameter-based AI environments may not require massive GPU resources.
  • LLMs (Large Language Models) can enhance the usability of parameter-based AI by allowing non-experts to ask questions and get answers.

"In many of these environments, they don't need massive amounts of GPUs, especially for parameter-based environments."

  • Explains the varied requirements and adoption rates of AI across different industries.

Managing Storage with AI

  • Transition from managing individual storage arrays to a cloud of storage requires AI.
  • AI can help storage administrators manage and diagnose storage performance issues.
  • LLMs can assist in querying the storage cloud for performance and diagnostics.

"By putting AI in front of our analytics tools, your storage administrator or IT manager can ask general questions about the entire cloud of storage."

  • Highlights the role of AI in managing and optimizing storage infrastructure.

Competitive Advantages of Being Independent

  • The company thrives independently due to several core technologies and sustainable competitive advantages.
  • Focus on writing software for raw flash rather than SSDs provides performance and cost benefits.
  • Excess performance allows for superior data reduction, saving money, power, space, and cooling.

"We wrote our stack, our software fundamentally to work with raw flash. Flash is going to take over storage; there's just no question about it."

  • Emphasizes the strategic decision to focus on raw flash for competitive advantages.

Evergreen Model

  • The evergreen model ensures that sold products remain new and do not need replacement.
  • Every few years, all components in the product are replaced, maintaining its newness and performance.

"We can consistently make a sold product new so that it never needs to be replaced."

  • Describes the evergreen model's role in maintaining product longevity and customer satisfaction.

Evergreen Forever Support

  • Customers typically replace hardware every 5 years, but Pure Storage offers a different model.
  • Pure Storage provides continuous updates and improvements without the need for full hardware replacement.

"With almost every piece of hardware including storage, customers are used to getting about a 5-year life out of it and then basically throwing it away and buying it all over again."

  • Traditional hardware lifecycle involves frequent replacements.

"Our Evergreen Forever support, you want the best products all the time for the customer."

  • Pure Storage's Evergreen model ensures customers always have the latest technology without full replacements.

Software Defined Networking and Storage

  • Early definitions of software-defined networking involved using open-source software on generic hardware.
  • True software-defined networking is defined by engineers through code, not just by the software purchased.
  • The same concept applies to software-defined storage, which should be defined by code written by engineers.

"The first definition of software-defined networking was that I'm going to take open-source software, put it onto generic hardware, and that's software-defined."

  • Initial understanding of software-defined networking was limited to using open-source software on any hardware.

"Software-defined means that an engineer can define the way my networking works with code."

  • True software-defined networking involves engineers defining network operations through coding.

"Software-defined storage... means that an engineer can write a few lines of code and completely change the way the storage works."

  • Software-defined storage should allow engineers to modify storage operations through coding.

Virtualizing Storage

  • Virtualizing storage means the storage behavior is set by IP and defined by developers.
  • This approach does not depend on the underlying hardware.
  • Kubernetes plays a significant role in this virtualization process.

"We're virtualizing storage by which I mean that the storage will behave in a way that is set by IP and defined by the way that a developer wants the storage to be able to operate."

  • Virtualizing storage involves defining its behavior through IP and developer settings.

"Portworks gives us the ability to do that, but everything that we do is Kubernetes-based now."

  • Kubernetes is central to Pure Storage's virtualization efforts.

Infrastructure as Code

  • Infrastructure as code involves defining infrastructure through application code, not just management tools like DevOps and Terraform.
  • This concept is about creating virtual environments and stacks through software.

"Infrastructure as code... means I want to define software in my application to form my infrastructure at will."

  • Infrastructure as code is about defining infrastructure directly through application code.

Generative AI and Virtualization

  • Generative AI requires well-prepared data and information.
  • Virtualization of storage is in its early stages, with customers still adapting to the concept.
  • Full integration of AI in business environments is still years away.

"Where we are right now in the virtualization of storage is we're right at the very beginning."

  • The concept of virtualizing storage is still new to customers.

"We're still quite a few years away from allowing AI to be really useful in a flexible way in a business environment."

  • Full, flexible business integration of AI is not yet realized.

Pure Storage's Market Position and Growth

  • Pure Storage has seen significant growth, with an 18% year-over-year increase last quarter.
  • The company is now the second-highest in market share for all-flash storage and aims to be number one.
  • Pure Storage is positioned to replace spinning disks in hyperscaler data centers, offering better space, power, and cooling efficiency.

"Last quarter we grew 18% year-over-year, continuing to take market share very significantly."

  • Pure Storage is experiencing significant growth and market share gains.

"We are now the second-highest market share in all-flash storage."

  • Pure Storage holds a strong position in the all-flash storage market.

"We will see our first design win for replacing disks within a hyperscaler in their customer-facing services."

  • Pure Storage is set to replace spinning disks in hyperscaler data centers, enhancing efficiency.

Advantages of Replacing Spinning Disks

  • Spinning disks are being replaced due to their frequent failures and inefficiencies.
  • Pure Storage offers solutions that are more space, power, and cooling efficient, making them more cost-effective in the long run.

"90% of the storage infrastructure in hyperscalers is spinning disk because it's been cheap."

  • Spinning disks are prevalent due to their low cost but have significant downsides.

"We're now at a parity system-level price point and we're one-tenth of the space, power, and cooling."

  • Pure Storage solutions offer better efficiency and cost-effectiveness compared to spinning disks.

"Friends don't let friends use spinning disks in clusters."

  • A shift away from spinning disks is advocated due to their inefficiencies.

What others are sharing

Go To Library

Want to Deciphr in private?
- It's completely free

Deciphr Now
Footer background
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai
Crossed lines icon
Deciphr.Ai

© 2024 Deciphr

Terms and ConditionsPrivacy Policy