The three phases of AGI evolution


One of the reasons that emergence of AGI (Artificial General Intelligence) technology has been difficult to predict, is that AGI depends on a constellation of several innovations in a specific sequence to come to fruition. Whereas given sufficient knowledge of the current state of technology, it is the job of the analyst to predict the next step in advancement, it is not so obvious when and how paradigm-shifting technologies will come to the fore. There are several examples of such paradigm shifts that were not widely expected until they landed.

On example is the personal computer. The miniaturization of processors, storage, introduction of lightweight floppies, the ability to connect the visual interface of these machines to cost effective computer screens, the invention of the general purpose single-user-friendly operating system such as DOS, the introduction of point-and-click interfaces and window-based UI all contributed to the PC becoming as much a typical household device as the flat-iron. All these innovations had to have occurred and in a specific order to make the PC a desirable and affordable product, and, eventually, a must-have for a contemporary household.

We can trace an analogous history in the emergence of mobile phones. Further miniaturization of processors and the resulting battery savings, better batteries, even smaller storage form-factor, more resilient screens at higher resolution, cellular data networks with sufficient bandwidth, network protocols that could support robust services on internet devices, and touch-interface on displays, all had to have happened for the iPhone, and eventually Android, Blackberry, and Microsoft devices to add up to a multi-billion dollar market.

Each of these innovations was non-trivial to predict on its own, but predicting the confluence of all these components coming together at just the right time was a daunting challenge. We face a similarly daunting challenge with AGI today.

Walking, talking, joke-cracking, hand-shaking, and business-deal-making AGI will not happen over night. There is not a single algorithmic or technical innovation that will make this technology possible. Rather, it is reasonable to expect, that it will happen in phases. Each phase will require its own constellation of innovations to enable it, and each phase will enable the next.

There are three major phases that will AGI evolution will likely traverse on its way to becoming the next household flat-iron.

The first phase we can call “Comprehension AI”. Comprehension is the crudest and most fundamental form of AGI. Through comprehension, the AGI can, given sufficient input, surmise the context of its environment, identify objects in this context, and abstract specific objects into their generic representative models (think how you and any other person can recognize reading glasses of different shapes and colors as a variant of the generic “glasses”). Importantly, in comprehension, the AGI can learn new models, objects, and contexts.

If you are thinking that sounds similar to what existing AI/ML algorithms are able to do, you are right. Today’s AI/ML algorithms are very exciting specifically because they implement a portion of AGI comprehension. However, we are still missing key elements to stand up Comprehension as a working technology. Namely, these elements include the ability to abstract observed patterns into hierarchical models, the ability to “imagine” models analogous to those found in the real world, the ability to assess appropriate context of a given situation, the ability to recall situations and patterns from the past, and finally the ability to make decisions and perform actions based on all of the above. AI certainly captures some of the key flavors, however, comprehension is a sophisticated stew with multiple dimensions of taste and mouthfeel that will require several leaps of innovation to achieve.

On its own comprehension is not very useful. We can think of basic comprehension as a wild beast in the jungle – perhaps a bird, with teeth, and big ears (all imaginary creatures are more fun with big ears). The bird can make sense of its surroundings, navigate them, and learn where it is safe to fly and where there may be danger. Our bird, with its walnut-sized brain, can’t do much, but it is already smarter than the latest self-driving cars.

Just as is the case with wild animals, we can directly interact with a Comprehension AI only in limited ways. Giving the command to sit, followed by vigorous hand-waving, and the introduction of a treat may, with varying degrees of success, get our AGI to proverbially park its hind, perhaps with less slobber and shedding. To reach the next step of AGI will require the next phase of AGI evolution: socialization.

The crux of socialization is language, that is to say, a representation of internal models that can be shared with others. Socialization builds on top of comprehension, since language is simply an extension of the existing model landscape where certain models are tagged or labeled with some audio/visual element. “Cat”, “dog”, “car” are all audio/visual representations for the abstract models of cat, dog, and car, etc. Once language exists, our AGI can tell us about how it perceives the world, what objects it understands in the scene, what utility it thinks these objects have, and so on. It can share with us the contents of its imagination, its goals, and its plans for achieving those goals.

Although language will enable socialization, a proper interactive AGI will need to be enabled with basic human social instincts in order to direct its concept formation in patterns familiar to people. To that end, we will need to digitally engender the concepts of empathy, tribalism, social hierarchy, fairness, authority, and conformity, to name a few. Without these built-in concepts, we will not be able to successfully imitate human-like behavior and interaction. Rather, we could mimic human behavior without these innate drives, but in order to spontaneously generate human-like responses in appropriate situations, these drives will be a fundamental requirement.

An AGI equipped with comprehension and language will be akin to a curious toddler or pre-teen, running around asking questions, making conclusion, many of them false, making mistakes, and learning from these mistakes. It will be a fun system to be sure, fascinating even, but still not very useful on tasks other than those you can entrust a 7 year old, albeit, one without fine motion control. For AGI to become more than a novelty, indeed, to become a useful contributor to our society, will require Specialization.

Specialization is the third level of behavioral complexity, which builds on top of Comprehension and Socialization. In specialization, the AGI uses comprehension and socialization to build competency in one or more skillset similar to how people learn new skills, through learning the theory or process behind the skill (aligning the learning to its innate collection of models), then watching the skill in action, and finally by attempting the skill to become proficient at it through practice.

In the specialization phase, we can begin to train AGI to become doctors, lawyers, and babysitters. Unlike human learners, AGI’s learning process will be replicable, meaning that once an AGI reaches some acceptable level of proficiency in a skill, it can be replicated as fast as one can copy a storage device. The learning process will be “forked”, creating skillful AGI’s with different specializations. Moreover, the AGI’s could, in theory, teach each other separately learned skills more effectively through a more optimized “language”, significantly reducing the time it takes each successive generation of AGI to learn a new and more complicated skill. At this point, AGI’s can be expected to branch off into a variety of functional, fully interactive, digital agents.

At each of the three phases of AGI development, we will find that we will require innovations in algorithms, hardware, and perhaps even theories of mind. Much like the first PC’s were slow to take-off and were primarily the domain of the techie enthusiast, so too, we should expect that comprehension AI and even socialization AI will be useful in only limited domains. However, once comprehension, socialization, and specialization skillsets are stood up and working together, we will see commercialization take off at mind-blowing speeds. Suddenly, similar to PCs in the 90’s or mobile phones in the 2k’s, AGI’s will become a pervasive and ever-present part of our daily lives, at home, at work, and everywhere in between.