20VC The Biggest AI Leaders on What Matters More; Model Size or Data Size & Where Does The Value in AI Accrue; to Startups or to Incumbents

Abstract

In the 20vc podcast hosted by Harry Stebings, AI experts debate the significance of model size versus data size and the accrual of value in AI between startups and incumbents. Noam Shazir of Character AI emphasizes the importance of computational power and training duration over simply data or model size. Chris from Runway ML suggests that while larger models have advantages, the specificity of models can be crucial, and a single model approach may not dominate. Emad from Stability AI advocates for culturally relevant, high-quality national datasets to feed AI models. The discussion also touches on the competitive edge of startups versus incumbents, with varying opinions. Sarah Guo from Conviction Capital underscores startups' speed advantage, while Clem from Hugging Face sees a unique opportunity for startups in model innovation. Conversely, Duay Keeler of Contextual AI and Richard Socher from You.com acknowledge that incumbents have data advantages but also face innovator's dilemmas, potentially leaving room for startups to disrupt. Tom Tunguz and Emad both recognize incumbents' distribution power, but Tunguz believes superior execution can propel startups to success.

Summary Notes

Model Size vs. Data Size

The debate centers on which is more important for AI development: the size of the model or the amount of data.
Noam Shazeer emphasizes the importance of computation in training larger models for extended periods.
Chris discusses the significance of model size in multimodal tasks but also the potential for smaller, more specialized models.
There is skepticism about the existence of a "single model to rule them all," with a belief in the diversification of models for various applications.

"Yeah, probably. The size of the model is the bigger challenge. We can get a lot of data, but actually the number one thing that's important is how much computation you do to train it."

This quote by Noam Shazeer highlights the importance of computational resources in training AI models. It suggests that while data is essential, the computational effort required to train larger models is a significant challenge.

"I think size of model matters in the sense that we've seen that larger models, parameter wise, are going to get better at doing more things in multimodalities."

Chris indicates that larger models with more parameters tend to perform better on tasks involving multiple modalities, such as processing both text and images.

"I don't think there's going to be a single model to rule them all. That's like saying that the Internet would only have one e-commerce site."

Chris analogizes the idea of a universal model to the concept of having only one e-commerce site on the internet, implying that just as the internet supports a variety of e-commerce platforms, the AI field will likely support a diversity of models tailored to different tasks.

Value Accrual in AI: Startups vs. Incumbents

The discussion explores where value will accumulate in the AI industry: with new startups or established companies.
There is a belief that models are not a defensible moat in themselves, but the speed of learning and iteration by the teams behind them is crucial.
The conversation moves towards the concept of model lifespan and the necessity of continuous updates.

"Models eventually don't matter. What matters most is the people building those models and how fast can you change and learn from those models."

Chris argues that the long-term value does not reside in the models themselves but in the capabilities of the teams that build and iterate on these models quickly.

"It's totally not wrong. It's just not mutually exclusive. You need a large model, and you need a lot of training data for that model..."

Chris addresses the false dichotomy between model size and data size, suggesting that both are necessary and not mutually exclusive for developing effective AI systems.

The Importance of Computation in AI Training

Noam Shazeer discusses the intensive computational demands of training advanced AI models.
The cost of computation is highlighted as a primary constraint in developing more intelligent systems.
The evolution of AI capabilities is tied to the availability of better hardware and longer training periods.

"Computation. So the model we're serving now, we trained last summer and spent about $2 million worth of compute cycles doing it."

Noam Shazeer identifies computation as the most significant constraint for AI models, illustrating this with the substantial cost associated with training their current model.

"We will do a lot better in the near future. But if we get a lot more better hardware, which we are getting, and spend longer training the thing, we can train something smarter."

This quote by Noam Shazeer suggests optimism for future advancements in AI, contingent on improvements in hardware and extended training durations, which would enable the training of more intelligent models.

Model Specialization and Verticalization

Chris discusses the trend towards specialized models that are optimized for specific tasks.
The conversation touches on the potential for verticalization of AI models in different domains, such as language, image, and video.
Contrasts are drawn between the idea of specialized models and the concept of a single, all-encompassing model.

"There's so many opportunities to build different type of models and different ways of working with those models, and it's still very early to be so specific to say, oh, we're going to only use that thing or that other thing."

Chris emphasizes the vast potential for creating a variety of models tailored to different applications, indicating that the field is too nascent to settle on one model or approach.

Model Lifespan and Continuous Learning

The concept of model lifespan is questioned, with an emphasis on the need for continuous updates and improvements.
The discussion suggests that the ability to rapidly adapt and learn from AI models is more valuable than the models themselves.
Chris rejects the notion of models as a sustainable competitive advantage ("mode") in the tech industry.

"I think models are not a mode. Models eventually don't matter. What matters most is the people building those models and how fast can you change and learn from those models."

Chris argues that models, in isolation, do not provide a sustainable competitive edge. Instead, the focus should be on the teams behind the models and their ability to adapt and improve rapidly.

Breakthroughs in AI and Prompt Engineering

The conversation shifts to the breakthrough concept of prompt engineering in AI.
The significance of language modeling as a task is highlighted, given its difficulty and the extensive knowledge required to predict the next word accurately.
The potential of large models with many parameters to learn complex functions and abilities is discussed.

"Long story short, if you now give this linear regression model billions and billions of training data, it's not going to learn magically anything but a simple linear line."

Chris explains that simply having a large amount of training data does not enable a basic model to learn complex patterns, emphasizing the need for models with sufficient parameters to capture complexity.

"But if you give the model billions and billions of parameters, it can learn all kinds of very complex predictive functions and abilities."

This quote by Chris highlights the potential for large models with many parameters to learn and perform complex tasks, reinforcing the importance of model size in AI development.## Infusion of World Knowledge into Language Models

Large language models (LLMs) learn world knowledge by processing vast amounts of text data.
The example given illustrates how LLMs can predict the word "Boston" over "Yale" due to Boston's larger population and thus higher probability of people driving there.
LLMs require a significant number of parameters to effectively learn and store this knowledge.

"You infuse world knowledge into that large language model, but only if you have enough parameters to learn it all."

The quote emphasizes the necessity for LLMs to have a substantial number of parameters to capture and utilize the extensive world knowledge available in text data across the internet.

Democratization of Data Access

There is a debate on whether access to training data is democratized or if incumbents have an advantage.
Unsupervised data, like raw internet text, is widely accessible, while private databases hold a lot of exclusive data.
Companies with existing data can more easily train AI for specific tasks such as customer service.
Foundational models have made it easier for small companies to build initial solutions, which can be improved by adding specific data.

"But there are still a lot of data sets out there that are not out there. They're actually stored in private databases."

This quote highlights the existence of private databases that contain valuable data sets not publicly available, which can provide an advantage to companies that own them.

The Value of AI Startups Built on Foundational Models

The value of AI startups using foundational models is debated within the venture community.
Some startups may have a thin value layer, but others incorporate complex systems that are not readily apparent.
Companies can build significant moats through distribution, partnerships, sales funnel processes, and more.
Large language models require additional systems to provide factual, up-to-date information, which adds complexity to the AI solutions.

"That's total bullshit you're asking very good question that often have a more subtle answer than what would fit in a tweet."

This quote disputes the oversimplified view that all AI startups are merely thin layers over foundational models, pointing out the nuanced and complex nature of creating effective AI solutions.

Importance of Data Model vs. Data Size

The growth of models may plateau, but data size is increasingly important.
Training smaller models on more data can yield better results than larger models with less data.
The ideal scenario would involve training the largest possible model with infinite data and compute resources.

"If you train a smaller model on more data for longer, then you get a better model."

This quote summarizes the finding that data size and quality can be more crucial than the sheer size of the model for achieving optimal performance.

Trade-offs in Model Training

There is a trade-off between model size and the quantity of data used for training.
The optimal point of training is underestimated, and data quality is proving to be more significant than previously thought.

"It really is a function of the number of GPUs that you have available."

The quote indicates that the resources available, such as GPUs, determine the feasibility of training models, influencing the balance between model size and data quantity.

The Need for Better Data and National Data Sets

High-quality national data sets are advocated to improve the performance of language models.
Models should be trained on diverse, high-quality data to understand local contexts and provide better service.
There is a call for a six-month pause to organize and improve data sets before widespread adoption and investment in AI technologies.

"We need to feed these models better data and other stuff should be no more webscape data near."

The quote emphasizes the urgent need to improve the quality of data fed into models to enhance their performance and reliability.## Importance of Contextual and Cultural Data Sets

OpenAI introduced bias filters to ensure diversity in AI-generated content.
National, cultural, and personal data sets are crucial for AI to customize to individual stories and contexts.
The current AI models will likely be obsolete within a year due to rapid advancements.

"You need national data sets, you need cultural data sets, you need personal data sets that can interact with these base models and customize to you and your stories."

This quote highlights the necessity for diverse and personalized data sets to ensure AI models can be tailored to individual contexts and remain relevant over time.

Value Accrual in AI: Startups vs. Incumbents

Sarah Guo emphasizes the speed of startups as their primary advantage.
Data moats are less of a barrier due to the availability of data and the creativity of entrepreneurs.
Clem from Huggingface believes AI startups can outperform incumbents by creating new architectures and optimizing models.

"Classically, the only real advantage startups have is speed."

Sarah Guo points out that the agility of startups allows them to adapt quickly, which is critical in the rapidly evolving AI landscape.

"This is really hard to do for the incumbents."

Clem indicates that incumbents struggle with innovation in AI, particularly in developing and optimizing new models, which gives startups a competitive edge.

The Role of Proprietary Data

Startups need to generate a data flywheel for a competitive edge.
Large language models are data-efficient, allowing for innovation with less data.
Proprietary data, like transcriptions from unique sources, can provide a significant advantage.

"You want to start with a lot of data and then have a way to generate lots more data, and that data is going to be your moat."

This quote by Duay Keeler stresses the importance of proprietary data in establishing a strong foundation and ongoing growth for AI startups.

"GPD four might end up disrupting, not knowledge workers necessarily, but it might just disrupt like mechanical Turk and is just an annotator on steroids."

Duay Keeler suggests that AI, like GPT-4, could revolutionize data annotation, leading to more specialized and cost-effective models for startups.

Innovation vs. Distribution

Richard Socher believes that distribution challenges are ongoing and require constant innovation.
Startups and incumbents will both contribute to AI advancements, but incumbents may struggle with significant changes due to financial dependencies.
Startups must remain innovative and form partnerships to compete with large incumbents.

"The truth is distribution won't be ever fully solved and it's a constant uphill battle."

Richard Socher acknowledges the perpetual challenge of distribution for startups, implying the need for continuous effort and strategy.

"It's been very carefully tuned. Every shade of blue has been a b tested to death."

Richard Socher uses Google's optimization of their search page to illustrate the difficulty incumbents face when implementing major changes, highlighting the opportunity for startups to innovate.

Perspectives on AI Value Accrual

Alex from Nabla acknowledges the distribution advantages of incumbents but points out their slowness as a disadvantage.
Notion and Adobe are examples of incumbents that have adapted quickly, suggesting some larger companies can compete effectively.

"So incumbents have a huge advantage through distribution."

Alex recognizes the significant head start that incumbents have due to their established user base and market reach.

"But then who else was as fast as Adobe?"

Alex questions the ability of most incumbents to move as rapidly as Adobe, indicating that speed remains a critical factor in the AI arena and a potential advantage for startups.## AI Integration in Current Products

Incumbents often add AI features to existing products rather than creating entirely new paradigms.
The process is likened to sprinkling "AI dust" on traditional software.
A new paradigm shift in knowledge building tools could disrupt current giants like Notion and Google Docs.

"And this is what all the one you mentioned did, the notion you still have a document, you edit, you have a cursor, you write, hello, by the way, you can call Chat GPT. And to summarize a paragraph, it's what I call spreading a little bit of AI dust on the magic dust on your existing product."

The quote suggests that current companies add AI functionalities to their existing products in a minimalistic way, without rethinking the entire product design to fully leverage AI's capabilities.

Google's Innovation Constraints

Google's cost structure for queries is more expensive with technologies like Chat GPT.
Google did not predict the usefulness and power of Large Language Models (LLMs) before they were trained at scale.
Internal incentives and legal concerns at Google and Meta prevent significant investments in uncertain AI ventures like Chat GPT.

"The reason is nobody could predict that LLMs would be so useful and powerful before you train one at this scale. And who in the google.org chart had the incentive to invest $500 million? And just to see this without any business benefit for the company."

This quote explains that the lack of foresight into the potential of LLMs and the absence of incentives within the company's structure were barriers to Google's investment in AI at the scale that OpenAI did with Chat GPT.

Startups vs. Incumbents in AI

Startups face the challenge of competing with incumbents that have greater distribution and resources.
The belief in the startup ecosystem is that superior execution can overcome the advantages of incumbents.
Notion and Snowflake are examples of startups that succeeded despite incumbent competition.

"I think if you're a venture capitalist or if you're a startup founder, you have to believe, I think it's in your fabric that no matter how big the incumbent is or the advantages that they have, that if you have really great execution, you can still win and you can win big."

The quote emphasizes the startup mindset that execution is key to success, even when facing larger, well-established competitors.

Value and Moats in AI

Value creation in AI does not necessarily stem from being the first to innovate.
Infrastructure and application layers in AI both represent significant market opportunities, but the concentration of enterprise value differs.
Only a few companies are expected to dominate as foundational model companies in AI.

"Again, we know that value and moats are not necessarily innovation first."

This quote suggests that creating a competitive advantage in AI does not always require being the first mover, but rather building a strong position in the market.

Foundation Model Companies

A small number of companies are predicted to lead in training AI foundation models.
These companies include Nvidia, Google, Microsoft, OpenAI, Meta, and Apple.
Anthropic is recognized for their work but faces challenges in scaling compared to giants like Google.

"I think it's going to be us. Nvidia, Google, Microsoft, OpenAI and Meta and Apple probably are the ones that train these models."

This quote lists the companies anticipated to be the key players in developing foundational AI models, indicating a consolidation of power in the industry.

Coda's AI-Powered Work Assistant

Coda offers an AI work assistant to help teams complete tasks and make progress.
The AI can assist in various functions, including tagging feedback, drafting documents, and summarizing discussions.

No verbatim quote provided for this section.

Navan's Business Travel Incentive

Navan offers a platform that rewards employees with personal travel credit for saving money on business travel.
The service aims to reduce corporate travel expenses while incentivizing cost-saving behaviors.

No verbatim quote provided for this section.

Public's Treasury Accounts

Public provides a simple way to invest in 26-week Treasury bills with a 5.5% yield.
The service offers flexibility similar to a bank account, with the security of a government-backed investment.

No verbatim quote provided for this section.

Feedback on Podcast Format

The host, Harry Stebbings, encourages feedback on the compilation episode format.
The episode brings together various perspectives on a specific topic.

No verbatim quote provided for this section.

What others are sharing

Go To Library

Andrew Ng: Building Faster with AI

The Fitness Scientist: "Even A Little Alcohol Is Hurting Your Health!" Kristen Holmes

First Acquisition in March, $200m by Year End | Jordan Dubin Interview