Aug 29, 20243 min Read

Are smaller models the path to AGI?

Smaller, efficient models with active learning may be the key to AGI, not just scaling up.

cover image

Sudarshan Kamath

Data Scientist | Founder

cover image

The Scaling Fallacy in the Quest for Artificial General Intelligence

There is a common misconception in the AI community that the key to achieving Artificial General Intelligence (AGI) lies in scaling up the size of Large Language Models (LLMs). The belief is that the larger the model—meaning more parameters—the closer we get to true intelligence. With GPT-4 speculated to have around 1 trillion parameters, and hints that GPT-5 will be even larger, this view seems to be driving the current development trajectory. But if we compare the size of these models to the number of neurons in biological brains, the correlation between size and intelligence appears misleading.

Understanding Parameters and Neurons

Consider these comparisons:

  • GPT-4: Speculated to have around 1 trillion parameters.
  • Human Brain: Approximately 86 billion neurons.
  • Elephant Brain: Around 257 billion neurons.
  • Dolphin Brain: Approximately 10.5 billion neurons.
  • Chimpanzee Brain: About 6-7 billion neurons.

At first glance, these numbers might suggest that GPT-4, with its enormous parameter count, is more 'intelligent' than animals with far fewer neurons. However, the reality is more nuanced. Despite its size, GPT-4 lacks many fundamental capabilities that even a chimpanzee possesses, such as the ability to understand and interact with the physical world, recognize and respond to emotional cues, and exhibit basic planning and problem-solving skills.

The Limitations of Scaling LLMs

The discrepancy between parameter count and true intelligence highlights the limitations of current LLMs. These models are incredibly powerful at processing and generating language based on vast amounts of data, but their understanding is superficial. They can simulate conversation and generate text that appears coherent and meaningful, yet they lack a deeper understanding of context, goals, and real-world interaction. This is because LLMs, as they are currently designed, do not incorporate mechanisms for planning, prioritization, or understanding long-term goals. They operate without any innate model of the physical world or an understanding of cause-and-effect relationships beyond what is encoded in their training data.

Beyond LLMs: Exploring New Architectural Paradigms

While LLMs have shown great promise, it's becoming increasingly clear that they are only part of the solution. Emerging architectural paradigms like Joint Embedding Predictive Architecture (JEPA) provide a better world model by creating more accurate and coherent representations of the environment. However, JEPA and similar architectures do not inherently incorporate mechanisms for planning, prioritization, or goal-directed behavior. Instead, they offer a richer understanding of the world, which can serve as a foundation upon which other systems might build to incorporate these advanced cognitive functions.

Active Learning - Continuous Training Cycles

In human cognition, much of our intelligence stems from our brains constantly updating plans based on new information and sensory inputs, recalibrating priorities in real-time. LLMs, in contrast, lack this ability. They operate based on a predefined understanding encapsulated during training and cannot dynamically adapt their goals based on real-world changes. Constant learning based on new experiences, and training a million times based on real-world data is called active learning and this is missing in current AI systems..

The Path to True General Intelligence

At smallest.ai, we believe that AGI will not be achieved simply by scaling up LLMs. Instead, it will emerge from systems that can think and learn more like humans—systems that can run on edge devices, constantly taking in sensory inputs, and updating their understanding of the world. These systems would incorporate automated active learning loops, allowing them to refine their knowledge and adjust their plans and priorities based on new experiences.

The Future: Small Models with Big Impact

By focusing on smaller, more efficient models that incorporate planning, prioritization, and sensory-driven learning, we can develop systems that exhibit signs of general intelligence with far fewer parameters than today's LLMs. These models would not only be more scalable and sustainable but also more capable of interacting with and understanding the world in a truly intelligent manner. Such systems would be able to operate closer to humans, integrating seamlessly into everyday environments and tasks.

Conclusion

Humans learn through multiple sensory interactions. Training needs to be a continuous process, models need to always learn while they are deployed on edge. Larger models are a temporary distraction from achieving AGI. The key lies in on-edge active learning that makes the models develop a better understanding of the real-world.

Instead of having one large intelligent model, the future belongs to billions of smaller hyper-intelligent models that have specialized themselves for particular use-cases.

This is exactly how humans have evolved.