Jan 26, 2026
Smallest AI
Recent progress in artificial intelligence has been driven by empirical scaling laws linking performance improvements to increases in model parameters, data, and compute. These results have fueled the widespread belief that artificial general intelligence (AGI) will emerge primarily through continued scaling of large language models (LLMs). In this paper, we argue that this assumption conflates benchmark performance with intelligence and overlooks fundamental architectural limitations of current models. We propose Artificial Special Intelligence (ASI) as an alternative framework: intelligence arising from collections of small, specialized models that operate asynchronously, learn continuously, and interact with large-scale external memory. Drawing on evidence from machine learning, neuroscience, and cognitive science, we argue that intelligence is better characterized by structural properties—such as specialization, separation of compute and memory, and lifelong learning than by parameter count alone.
1. Introduction
The dominant paradigm in contemporary AI research is grounded in scaling laws, which demonstrate predictable improvements in language modeling loss and downstream task performance as a function of model size, dataset size, and compute budget [1, 2]. These findings have motivated the construction of increasingly large foundation models and the assumption that general intelligence will emerge as an extrapolation of scale.
However, scaling laws describe empirical trends within a fixed architectural family; they do not constitute a theory of intelligence. Despite dramatic gains, large language models remain brittle under distributional shift, lack causal and counterfactual reasoning, and struggle with long-horizon planning and abstraction [3, 4, 5]. These limitations persist even at extreme scale, suggesting that they are architectural rather than parametric in nature.
2. The Limits of Artificial General Intelligence
The concept of Artificial General Intelligence (AGI) presupposes the desirability of a single system capable of broadly competent behavior across domains. However, beyond its lack of a precise operational definition, AGI implicitly optimizes for capability breadth without regard to efficiency. Contemporary training paradigms for generally capable models prioritize aggregate task coverage and benchmark performance, while largely ignoring constraints on sample efficiency, compute efficiency, energy consumption, and adaptation cost [1, 2].
This objective stands in contrast to biological and social intelligence. While, in principle, most humans possess the capacity to learn a wide range of skills, societies do not converge toward universally general agents. Instead, they evolve toward specialization, where individuals focus on narrow domains and coordinate through shared knowledge and memory. Classical economic theory formalizes this phenomenon as the division of labor, demonstrating that specialization yields higher collective productivity under resource constraints [6]. From a game-theoretic perspective, specialization emerges as a stable equilibrium that minimizes redundant learning and optimizes resource allocation [7].
Empirical evidence from cognitive science further supports this view: human expertise is domain-specific, with skill acquisition exhibiting strong locality and transfer limitations [8, 9]. Even when general learning capacity exists, specialization remains the most efficient strategy for sustained performance. Intelligence at the societal level thus arises not from universal generality, but from distributed, specialized agents operating over shared memory and coordination mechanisms.
Optimizing artificial systems for generality without efficiency therefore misaligns with both biological precedent and rational system design. Monolithic general models internalize vast amounts of irrelevant information for any given task, incur high training and inference costs, and require repeated global updates to remain current. In contrast, systems composed of specialized components can allocate compute selectively, update knowledge locally, and scale capabilities through coordination rather than parameter growth [10, 4].
These considerations suggest that efficiency is not a secondary concern but a defining property of intelligence. An intelligent system should minimize the resources required to acquire, retain, and apply knowledge. Framing intelligence solely in terms of generality obscures this requirement and biases architectures toward inefficiency. This further motivates a shift from AGI toward frameworks—such as Artificial Special Intelligence—that explicitly optimize for specialization, coordination, and efficient use of compute and memory.
3. Artificial Special Intelligence
3.1. Definition
Definition (Artificial Special Intelligence). Artificial Special Intelligence (ASI) refers to a class of artificial systems composed of multiple specialized models, each optimized for a narrow set of tasks, which collectively exhibit adaptive, intelligent behavior through coordination, shared memory, and continual learning.
Formally, an ASI system consists of:
A set of specialized computational modules {Mi} with bounded capacity,
A shared memory substrate M with persistent and extensible storage,
Coordination mechanisms governing information flow, learning, and action,
Learning dynamics that support continual adaptation over time.
Intelligence, under this framework, is an emergent property of system structure and interaction rather than model generality.
3.2. Motivation
Specialization enables tighter inductive biases, improved sample efficiency, and faster adaptation. Empirical results increasingly show that small, task-specific models can outperform larger generalist models in real-world, latency- and costconstrained settings [11, 12, 13]. Recent work from industry and academia further suggests that future agentic systems will rely on ensembles of small models rather than a single foundation model [14].
4. Separation of Compute and Memory
A fundamental limitation of current LLMs is the entanglement of computation and memory within a single parameter space. All knowledge—procedural, declarative, transient, and obsolete—is encoded in model weights, making updates expensive and specialization difficult.
In contrast, biological systems exhibit a clear separation between fast, specialized computation and large-scale, persistent memory. The Complementary Learning Systems theory formalizes this distinction, separating rapid acquisition from slow consolidation processes [15]. Analogous separations in artificial systems—through retrieval-augmented generation, external memory, or symbolic stores—allow small models to remain adaptive while accessing effectively unbounded information [16, 17, 18].
5. Asynchronous Cognition
Current large language models (LLMs) operate synchronously: all reasoning occurs at inference time, triggered by an external prompt, and internal state is discarded afterward [19, 17]. In contrast, human cognition is asynchronous and predictive. Humans begin reasoning and planning before receiving complete contextual information, continuously updating hypotheses as new evidence arrives [20, 21]. This allows anticipatory inference, rapid adaptation, and efficient allocation of cognitive resources, rather than waiting for full context before acting or thinking.
Predictive processing theories posit that the brain constantly generates and updates a predictive model of the world, using incoming sensory signals to correct its expectations [22]. Similarly, LeCun has argued that autonomous intelligence requires agents to form an internal predictive representation of their environment, allowing them to evaluate actions and consequences asynchronously and in parallel with sensory input [19]. This predictive, asynchronous computation enables agents to plan, simulate potential outcomes, and maintain long-term situational awareness.
The absence of asynchronous, predictive mechanisms in contemporary AI limits an agent's ability to form long-term strategies, refine internal representations, and accumulate knowledge over time. Without them, models must process inputs sequentially and fully contextualize each inference, which increases latency, reduces adaptability, and prevents continuous background learning. Architectures that support event loops, predictive world models, and self-directed computation are therefore necessary to approach human-like adaptive behavior.
Incorporating predictive, asynchronous cognition enables AI systems to operate on partial information, update beliefs incrementally, and interleave computation with memory consolidation, reflecting a structural property of intelligence that current monolithic LLMs lack.
6. Continuous and Lifelong Learning
Traditional neural models, including most LLMs, are stateless during deployment: all learning occurs offline during training, and model weights remain fixed at inference time [1, 2]. This approach limits adaptability and makes agents brittle in dynamic environments, as new patterns or tasks cannot be incorporated without retraining.
In contrast, human intelligence exhibits continuous, stateful adaptation. Humans integrate new observations, experiences, and context into their internal models in real time, allowing rapid adjustment of behavior and predictions without wholesale re-learning [23, 24]. Translating this property to AI requires models or agents that maintain state across interactions, enabling inference-time learning—i.e., updating internal representations, memory, or policies incrementally as new data arrives, without full gradient-based retraining.
Several architectures support this principle:
Memory-augmented networks [17, 18] maintain external memory that can be written to and read from during inference, allowing the system to incorporate new facts dynamically.
Meta-learning approaches [25, 26] enable rapid adaptation of model parameters based on a small number of observations at deployment.
Continual learning algorithms [24, 23] mitigate catastrophic forgetting, supporting incremental updates to task-specific components while preserving prior competence.
Prediction-driven state updates (predictive coding) [22, 19] allow agents to refine internal models continuously by comparing expectations with observed outcomes.
Stateful, inference-time learning offers multiple advantages over traditional static models:
Adaptivity: Agents can respond to distributional shifts or novel tasks in real time.
Efficiency: Only relevant submodules are updated, reducing compute and energy overhead.
Groundedness: Models maintain a coherent internal state that can be queried and reasoned over across interactions, supporting long-term planning and memory-guided inference.
Implementing inference-time learning in practice could involve modular architectures where specialized submodels update local parameters or memory slots, guided by meta-learned adaptation rules or error-driven signals. Combined with asynchronous and predictive cognition, this enables AI agents to operate statefully, continuously, and autonomously, bringing them closer to the adaptive properties of biological intelligence.
7. Outlook: Towards Practical Artificial Special Intelligence
The preceding analysis suggests that intelligence is defined less by raw parameter count and more by structural properties that enable adaptive, efficient, and grounded reasoning. Asynchronous cognition allows agents to process partial information and anticipate future states [20, 21, 19], continuous and stateful learning permits incremental adaptation at inference time [23, 24, 17], and the separation of compute and memory ensures that specialized knowledge remains accessible without overburdening the active reasoning process [15, 16, 14].
Together, these properties point to a modular, small-model paradigm as the most plausible substrate for Artificial Special Intelligence (ASI). Instead of attempting to encode all knowledge in a single monolithic model, intelligence emerges from collections of specialized, task-aligned submodels that can remain continuously updated, coordinate through shared memory, and execute asynchronously. This architecture aligns with both biological precedent—where human cognition is distributed and specialized—and computational efficiency: small models are cheaper to update, easier to adapt, and more robust to dynamic environments than large monolithic systems.
Building on this perspective, we propose the following research hypotheses and future directions:
Specialization is fundamental: Small, task-aligned models can achieve greater adaptive efficiency than equivalently parameterized monolithic models.
Separation of compute and memory enhances groundedness: Offloading irrelevant or long-term information to external memory while keeping active reasoning focused enables better generalization and contextsensitive inference.
Asynchronous, predictive processing improves planning and anticipation: Agents capable of forming internal predictions based on partial inputs will outperform strictly synchronous models in dynamic environments.
Inference-time, continual learning reduces catastrophic forgetting and supports long-horizon adaptation: Agents that update state continuously can maintain relevance in non-stationary environments.
Evaluation and benchmarking: Future work should define benchmarks that measure long-horizon adaptation, memory usage, task efficiency, and robustness under changing contexts.
Collectively, these ideas provide a principled framework for ASI and offer strong justification for small, specialized models as the future of adaptive intelligence. By focusing on architectural structure, statefulness, and dynamic learning, this approach moves beyond the limitations of monolithic scaling and points toward deployable, efficient, and adaptive AI systems.
References
[1] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeff Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
[2] Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
[3] Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 2021.
[4] Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. Building machines that learn and think like people. Behavioral and Brain Sciences, 40, 2017.
[5] Gary Marcus and Ernest Davis. Rebooting AI: Building Artificial Intelligence We Can Trust. Pantheon Books, 2019.
[6] Adam Smith. An Inquiry into the Nature and Causes of the Wealth of Nations. W. Strahan and T. Cadell, 1776.
[7] Friedrich A. Hayek. The use of knowledge in society. The American Economic Review, 35(4):519–530, 1945.
[8] Michelene T.H. Chi, Paul J. Feltovich, and Robert Glaser. Categorization and representation of physics problems by experts and novices. Cognitive Science, 5(2):121–152, 1981.
[9] K. Anders Ericsson, Ralf Th. Krampe, and Clemens Tesch-Römer. The Role of Deliberate Practice in the Acquisition of Expert Performance, volume 100. 1993.
[10] Jerry A. Fodor. The Modularity of Mind. MIT Press, 1983.
[11] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
[12] Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, and Donald Metzler. Efficient transformers: A survey. ACM Computing Surveys, 2022.
[13] Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
[14] NVIDIA Research. Small language models are the future of agentic ai. Technical report, NVIDIA, 2024.
[15] James L. McClelland, Bruce L. McNaughton, and Randall C. O'Reilly. Why there are complementary learning systems in the hippocampus and neocortex. Psychological Review, 102(3):419–457, 1995.
[16] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledgeintensive nlp tasks. arXiv preprint arXiv:2005.11401, 2020.
[17] Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
[18] Jason Weston, Sumit Chopra, and Antoine Bordes. Memory networks. arXiv preprint arXiv:1410.3916, 2014.
[19] Yann LeCun. A path towards autonomous machine intelligence. arXiv preprint arXiv:2202.09449, 2022.
[20] Dana H. Ballard. Deictic codes for the embodiment of cognition. Behavioral and Brain Sciences, 20(4):723–742, 1997.
[21] Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14(2):179–211, 1990.
[22] Karl Friston. The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11:127–138, 2010.
[23] German I. Parisi, Ronald Kemker, Jose Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71, 2019.
[24] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
[25] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017.
[26] Alex Nichol and John Schulman. First-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
