Summary
I currently find it most plausible that future artificial general intelligence (AGI) will look somewhat like an amplified version of present-day trends in computing and governance. A reasonable default way to envision the future seems to be to extend the kinds of things we see today.
Caveats
It's very difficult to write anything original on most topics, including the topics of AGI and futurism. No doubt what I say here has been discussed numerous times before. For example, my stance is in many ways similar to Eric Drexler's framework of "Comprehensive AI Services" (of which I have only read two summaries, not the original monograph), although I don't agree with all of Drexler's specific views.
I should also note that I haven't followed the literature on AGI and AGI safety very closely in the second half of the 2010s decade, so what I write here may be somewhat out of date. This piece should be considered a random dump of some ideas rather than a product of thorough research.
Contents
Is AGI sharply different from narrow AI?
In science fiction about AGI, a common trope is the idea that at some point, a mere automaton "wakes up" or "becomes sentient" and then suddenly has human-like general intelligence. While I doubt that many serious thinkers hold a view this simplistic, there is a more reasonable debate about whether there's a relatively crisp point at which narrow artificial intelligence (AI) becomes qualitatively different as it grows into being full AGI.
I think there is something qualitatively different about general intelligence compared with narrow intelligence. Humans can read articles on just about any topic and come to understand it. Neither a rabbit nor a present-day laptop can do this. For example, rabbits and laptops couldn't learn calculus merely by watching enough math lecture videos. (Of course, there may well be higher levels of general intelligence inaccessible to human minds as well.)
However, I personally suspect that the transition from "narrow" to "general" intelligence is gradual, resulting from lots of small steps in cognitive ability. This is perhaps the core point of contention in the debate, and there are no knock-down arguments on either side. I think my main intuition in favor of gradualness comes from the fact that intelligent cognition, like the economy or a present-day computer, seems to be an emergent product of large numbers of components interacting in complex ways. This is in contrast to something like the invention of artificial flight or nuclear weapons, which are based on insights into relatively simple principles of physics. There was a discontinuous, qualitative jump in the performance of human-made aircraft in a short time, but as far as I'm aware, there has never been a similarly sharp jump in the size of the world economy or the capabilities of human-made computers.
I assume this is because even if you make a huge breakthrough along one dimension (e.g., explosive growth of one industry in a short period of time), that dimension is only a small piece of the larger whole. For example, suppose that the world economy is a sum of 100 different components that are each equally important. Even if one of the components grows by 10,000% in one year, if the other components only grow by, say, 2%, then the overall multiplier on the size of the economy in that year is merely 0.01 * 101 + 0.99 * 1.02 ≈ 2.
I expect something similar to be true for the growth in capabilities of intelligent systems. And indeed, this has been exactly what we've observed historically, both in terms of the overall economic output of the information-technology sector as well as in terms of the performance of narrow-AI systems on most specific tasks, like speech recognition or web search.
In other words, I expect that narrow AI will transition into "AGI" step by step. For example, DeepMind's 2015 Atari-playing AI was more general than past game-specific AIs, though it was still "narrow" in the sense that it couldn't do most other tasks. OpenAI's 2019 GPT-2 language model was general enough to perform "rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training" (Radford et al. 2019), though it was still narrow in most other senses. Following this trend, we should expect to see a gradual widening of the spheres of generality of AIs, plausibly with no single point at which there's a discontinuous jump in generality.
Continuing current trends as the default model
I think the simplest model of our technological future is that it roughly continues present-day trends, just at a much faster pace and with ever-greater complexity of the systems involved. This default assumption is probably wrong; it's likely there will be several unforeseeable black swans that happen along the way. But unless we have strong reason to expect qualitatively different behavior of a specific kind at some point, it seems reasonable to focus on something like the default model. In a similar way, Occam's razor applied to science is very often "wrong" in the sense that reality is in fact quite complicated above the level of fundamental physics, but it's still methodologically wise to assume the simple hypothesis first and then update it when evidence forces you to.
In particular, I think a plausible vision of AGI is that it's a gradual accumulation of greater and greater automation throughout all parts of human life. This includes automation of specific tasks, like truck driving or psychotherapy, as well as automation of meta-level tasks like controlling other automated systems.
This is basically how our computers work: they have programs to do specific things (e.g., office software or music players) as well as programs to coordinate/monitor other programs (e.g., operating systems or network firewalls). Many of these programs run silently in the background without most human users having any idea of what they do.
This is also how the larger economy works: industries specialize in various products or services and interact with one another. Coordination and monitoring among these actors is done by trade associations, governments, nonprofit organizations, and so on.
Arguably, this is even how the human brain works. The brain is an agglomeration of many subsystems that work together (sometimes tightly, sometimes loosely). Humans can be modeled to some degree as "unified agents", but this abstraction often breaks down, because humans have many competing goals, interests, moods, loyalties, etc, some of which dominate at some times while others dominate at other times. Likewise, companies or governments can to some degree be modeled as unified, rational agents, but this abstraction fails to account for internal political disputes, the rise and fall of social movements over time, and so on.
So, too, I think futuristic AI systems will to some degree be approximable as rational agents with various goals, but at least in the short run, these systems are likely to be emergent results of complex underlying interactions, without any overall utility function being written down somewhere. (It's possible that such systems would eventually aim to make their goals crisp. By analogy, humans create concrete laws, mission statements, and ethical systems to formalize their goals.)
AI safety in this framework
If the future of AI looks like a continuation of present-day trends, then the future of "AI safety" would presumably also look like an extension of the kinds of things that occur now.
Automation will presumably continue to be tested mostly using experimentation to see if the system behaves appropriately. As the generality of automation increases, it might become increasingly difficult to identify the source of particular failure modes for AIs. Sometimes the root causes will be fixed, and sometimes people may develop kludges around them, such as having additional automation that watches for and guards against undesirable behavior.
One concern with trying to get AI safety mostly by practical experimentation is the so-called "treacherous turn": as AIs grow more sophisticated, they may do things that outwardly please their human owners, while they secretly plot to do things humans don't want once they get more power. A framework that predicts the future based on present trends should agree that treachery is an issue, because it's already commonplace today. Humans (and even other animals) deceive each other all the time in small and large ways. Software can also be treacherous. For example, some malware performs a legitimate purpose while also doing shady stuff behind the scenes.
Often, malware researchers study potentially malicious programs inside a virtual machine environment (VME), to avoid infecting their main computers. Liston and Skoudis (2006) explain (p. 4):
Because so many security researchers rely on VMEs to analyze malicious code, malware developers are actively trying to foil such analysis by detecting VMEs. If malicious code detects a VME, it can shut off some of its more powerful malicious functionality so that researchers cannot observe it and devise defenses. Given the malicious code’s altered functionality in light of a VME, some researchers may not notice its deeper and more insidious functionality.
We are seeing an increasing number of malicious programs carrying code to detect the presence of virtual environments.
This is a simple version of the treacherous turn: behave nice while someone is watching, but execute your real goal when people won't notice.
Of course, presumably most if not all malware treachery so far has been explicitly engineered. But it seems plausible that complex learning systems will also stumble upon treachery of increasing complexity. The fact that treachery itself may be a rather graded concept that can increase in sophistication over time agrees with my general assumption that these trends will ramp up gradually.
Devising AI algorithms that are inherently less prone to treachery will likely be part of the solution. However, unless there are creative tricks I haven't thought of, the problem seems quite difficult, because complex systems engaged in broad learning and self-modification seem to inherently have some nontrivial risk of goal drift. By analogy, even if you indoctrinate someone in an ideology from childhood, that person may eventually come to disavow his past beliefs.
So AI treachery is likely to remain as an obstacle to AI safety. This includes both human-engineered treachery (AIs instilled with one set of values that pretend to have a different set of values, such as for purposes of infiltrating an enemy organization) and human-accidental, spontaneously emerging treachery resulting from misconfigured goals or goal drift. The treachery problem will likely necessitate significant efforts to detect and mitigate it. Therefore, I expect that a decent fraction of AI-safety work will be an extension of present-day computer security. Treachery could be combated by a combination of automated monitoring systems as well as human security analysts using software tools, which themselves may be augmented with some degree of AI.
I think this is consistent with how cybersecurity works today. Software is not usually designed from the ground up with a really secure architecture and thorough research into all possible vulnerabilities. Instead, people write complex software, test it out to some degree, start using it, watch out for bugs and security exploits, write patches, watch for more vulnerabilities, patch again, and so on. This can be supplemented by various other security measures, both to prevent exploits and limit the damage if they occur. Very few software systems have bulletproof security from the outset (or ever, for that matter).
While individual components of cybersecurity like encryption algorithms can be mathematically proved to work, practical cybersecurity as a whole is a huge mess. In real-world computer systems, "There is no such thing as perfect security, only varying levels of insecurity" (Salman Rushdie, qtd. in Singh 2012). Given that future AI systems will be more complex and more subsymbolic than most present-day software, I expect AI security to be even harder. Of course, people will also have more powerful tools for monitoring other software systems, to detect both unintentional and intentional misbehavior.
I think Eliezer Yudkowsky would disagree with the above discussion. In Harris (2018), Yudkowsky says (at 1h31m34s):
There's no continuity between what you have to do to fend off little pieces of code trying to break onto your computer and what you have to do to fend off something smarter than you. These are totally different realms and regimes and separate magisteria [...] of how you would even start to think about the problem. We're not going to get automatic defense against superintelligence by building better and better antivirus software.
There is indeed a huge gulf between present-day malware and superintelligence, but I think the path between them plausibly will be somewhat continuous, as AIs become smarter in incremental steps. Along with more capable AIs will come more capable tools for understanding, monitoring, and controlling AIs—not just antivirus software but also debuggers, visualization tools, institutional practices, etc.
History has already shown that advanced versions of "antivirus scanning" can work against human-level intelligent agents. One of many examples was the East German Stasi, which "has been described as one of the most effective and repressive intelligence and secret police agencies ever to have existed. [...] The Stasi employed one secret policeman for every 166 East Germans[ and] counting part-time informers, the Stasi had one agent per 6.5 people" (Wikipedia "Stasi"). Of course, this abstract analogy between totalitarian repression and anti-malware software should not be taken as downplaying the fear and brutality endured by human victims of the former. If anything, the comparison raises ethical questions about using force to control AIs who have near-human-level or human-level intelligence. Rather than seeking and destroying traitorous AIs, it seems ethically preferable to prevent AIs from undergoing goal drift in the first place, insofar as this is possible. One could also aim to build AIs who, unlike human dissidents, don't mind being terminated, which is part of the concept of "corrigibility" in AI alignment. Of course, ensuring this property in complex AIs may be difficult.
In the above discussion, I focused a lot on the treachery problem, though AI alignment involves numerous additional challenges. I expect that many of these other problems would similarly be grappled with in an iterative fashion based on experimentation, with failures being noticed, researched, and addressed as they occur. That said, it's certainly plausible that conceptual work of the type that the AI-alignment community has already been doing is pretty helpful in terms of drawing early attention to these topics.
The long run
I do worry that the picture of AI systems that I've painted might be too anthropomorphic and lacking in creativity, since it so closely resembles how human organizations work. Human societies have workers, people who oversee and coordinate other workers, systems to detect problems, and systems to detect deception. By some miracle, the interaction of all these agents, with some agents watching and stopping the efforts of others, produces a functional (if not necessarily ethical) overall society. I expect that machine intelligence will continue a roughly similar trend, with lots of interacting agents playing various roles, many in conflict with one another, with no global utility function being optimized for.
In the very long run, plausibly this would change. Artificial minds can be controlled in ways that would be much harder with biological creatures. For example, with software, it's possible to dramatically rewrite the architecture of minds, tweak parameters, reset minds to previous states, and achieve relatively complete surveillance into the internal computations that AIs run (although there will also be increasing sophistication in disguising those internal computations). Perhaps a society of interacting AIs would eventually coordinate around a more organized, top-down social structure to put an end to competition.
Predicting the future based on analogies to the past doesn't speak against the possibility of a small group of actors "taking over the world". For example, perhaps the USA could have taken over the world when it had a monopoly on nuclear weapons in 1945, had it aimed to do so. And to the extent the USA can be considered the global hegemon since the 1990s, it did kind of take over the world anyway. Cabals and even individual strongmen have seized control of entire countries throughout history. While no past leader has maintained permanent control historically, it's plausible that an AI future could be different because machine intelligence is more able to be completely re-engineered than human brains are. By analogy, except in the case of malware or extreme software bugs, our computers never "rebel" against us, because they're designed not to. Google and Facebook have never lost their hegemony over the collection of worker machines that they use, even if individual machines have "gone rogue" from time to time.
Implications of this framework
The framework I've sketched predicts that developments in AI will proceed step by step. There will be incremental gains in AI performance, which will have ripple effects throughout society. Various safety and security risks will be identified, and patches and precautions will be put in place. Advances in narrow AI will dramatically reshape the world, and as a result, large amounts of attention by governments, academics, journalists, and others will be devoted to these kinds of topics, making it relatively difficult for a small group of altruists to have an outsized influence.
Complex systems are hard to predict, and the AI future that I envision will be an extremely complex system, with large numbers of interacting components. This makes it difficult for altruists to know how to positively influence such a future, although very broad trends might still be discernable. For example, greater international cooperation may change the dynamics of AI development in not-totally-random ways. If society cares more about a given moral value, it seems more likely that value will transfer over to AIs (although whether existing moral values can survive intact through the tumult of the coming centuries is dubious).
One could make an argument that even if the above scenario is the most likely to happen, there might be higher utilitarian expected value in focusing on improving AI scenarios where the future is less chaotic, where high-level theoretical advances in AI safety have relatively more impact, and where AI comes quickly and silently enough that there won't be an extraordinary amount of mainstream attention devoted to it. This is a reasonable viewpoint, though I feel uneasy about it. Rather than wagering on what I see as a relatively less likely AI scenario, I would rather think more about the space of AI scenarios in general, to get a better grasp of the landscape of possibilities before taking action on a specific one. (Other people who have thought more about these issues may have already progressed to the "let's take action relative to a specific scenario" stage.)