Aug 29, 2024 • 3 min Read

Are smaller models the path to AGI?

Smaller, efficient models with active learning may be the key to AGI, not just scaling up.

Sudarshan Kamath

Data Scientist | Founder

The Scaling Fallacy in the Quest for Artificial General Intelligence

There is a common misconception in the AI community that the key to achieving Artificial General Intelligence (AGI) lies in scaling up the size of Large Language Models (LLMs). The belief is that the larger the model—meaning more parameters—the closer we get to true intelligence. With GPT-4 speculated to have around 1 trillion parameters, and hints that GPT-5 will be even larger, this view seems to be driving the current development trajectory. But if we compare the size of these models to the number of neurons in biological brains, the correlation between size and intelligence appears misleading.

Understanding Parameters and Neurons

Consider these comparisons:

GPT-4: Speculated to have around 1 trillion parameters.
Human Brain: Approximately 86 billion neurons.
Elephant Brain: Around 257 billion neurons.
Dolphin Brain: Approximately 10.5 billion neurons.
Chimpanzee Brain: About 6-7 billion neurons.

At first glance, these numbers might suggest that GPT-4, with its enormous parameter count, is more 'intelligent' than animals with far fewer neurons. However, the reality is more nuanced. Despite its size, GPT-4 lacks many fundamental capabilities that even a chimpanzee possesses, such as the ability to understand and interact with the physical world, recognize and respond to emotional cues, and exhibit basic planning and problem-solving skills.

The Limitations of Scaling LLMs

The discrepancy between parameter count and true intelligence highlights the limitations of current LLMs. These models are incredibly powerful at processing and generating language based on vast amounts of data, but their understanding is superficial. They can simulate conversation and generate text that appears coherent and meaningful, yet they lack a deeper understanding of context, goals, and real-world interaction. This is because LLMs, as they are currently designed, do not incorporate mechanisms for planning, prioritization, or understanding long-term goals. They operate without any innate model of the physical world or an understanding of cause-and-effect relationships beyond what is encoded in their training data.

Beyond LLMs: Exploring New Architectural Paradigms

While LLMs have shown great promise, it's becoming increasingly clear that they are only part of the solution. Emerging architectural paradigms like Joint Embedding Predictive Architecture (JEPA) provide a better world model by creating more accurate and coherent representations of the environment. However, JEPA and similar architectures do not inherently incorporate mechanisms for planning, prioritization, or goal-directed behavior. Instead, they offer a richer understanding of the world, which can serve as a foundation upon which other systems might build to incorporate these advanced cognitive functions.

Active Learning - Continuous Training Cycles

In human cognition, much of our intelligence stems from our brains constantly updating plans based on new information and sensory inputs, recalibrating priorities in real-time. LLMs, in contrast, lack this ability. They operate based on a predefined understanding encapsulated during training and cannot dynamically adapt their goals based on real-world changes. Constant learning based on new experiences, and training a million times based on real-world data is called active learning and this is missing in current AI systems..

The Path to True General Intelligence

At smallest.ai, we believe that AGI will not be achieved simply by scaling up LLMs. Instead, it will emerge from systems that can think and learn more like humans—systems that can run on edge devices, constantly taking in sensory inputs, and updating their understanding of the world. These systems would incorporate automated active learning loops, allowing them to refine their knowledge and adjust their plans and priorities based on new experiences.

The Future: Small Models with Big Impact

By focusing on smaller, more efficient models that incorporate planning, prioritization, and sensory-driven learning, we can develop systems that exhibit signs of general intelligence with far fewer parameters than today's LLMs. These models would not only be more scalable and sustainable but also more capable of interacting with and understanding the world in a truly intelligent manner. Such systems would be able to operate closer to humans, integrating seamlessly into everyday environments and tasks.

Conclusion

Humans learn through multiple sensory interactions. Training needs to be a continuous process, models need to always learn while they are deployed on edge. Larger models are a temporary distraction from achieving AGI. The key lies in on-edge active learning that makes the models develop a better understanding of the real-world.

Instead of having one large intelligent model, the future belongs to billions of smaller hyper-intelligent models that have specialized themselves for particular use-cases.

This is exactly how humans have evolved.

Recent Blog Posts

Interviews, tips, guides, industry best practices, and news.

Aug 29, 2024 • 3 min Read

Are smaller models the path to AGI?

Sudarshan Kamath

The Scaling Fallacy in the Quest for Artificial General Intelligence

Beyond LLMs: Exploring New Architectural Paradigms

The Path to True General Intelligence

Conclusion

Recent Blog Posts

Smallest AI vs Observe.AI: Why Full-Stack Voice Infrastructure Wins

Why Smallest AI beats Observe.AI: modular voice architecture, Lightning V2 TTS, transparent pricing, and on-premise deployment options. Complete 2025 review.

Smallest AI vs Poly AI: Best Voice Agent Alternative 2025

Discover why Smallest AI outperforms Poly AI with 100ms latency, modular architecture, and real-time voice interruption. Compare features, pricing & use cases for 2025.

Evaluating Lightning ASR Against Leading Streaming Speech Recognition Models