If you haven’t heard yet, OpenAI, an organization which gained a large amount of news coverage through their large financial backing, and mission to ensure that strong AI is beneficial for humanity, shifted from a non-profit to a “Capped Profit” entity. This has led to a lot of discussion about the motivations behind their move.
There is a lot of nuance to such changes, and I cannot pretend to understand the economics and details of their specific implementation. Nevertheless, I claim that this decision, and experiments like it, are a very important component of safe AI research.
To the public, a dangerous AI is actively malevolent. From the Terminator’s robotic grin, to the human powerhouses of the Matrix, machines have malfunctioned, and are out to get us.
People more familiar with machine learning realize that an AI does not have to be evil, nor does it need a system error to be a danger to humankind. An indifferent AI can be equally destructive. This is beautifully summed up by Eliezer Yudkowsky’s quote:
The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.
The example of a paperclip maximizer is likewise chilling: When given the task of manufacturing as many paperclips as possible, a superintelligent AI could enslave all humans to work in paperclip-producing factories, or wipe them out entirely to stop them from interfering with its plans (You can see this from the AI’s point of view in this very addicting game).
The question of safe AI is therefore: is it possible to set up the machine’s goals so that they will be aligned with ours? How do we ensure that the machines, once they arrive, will be beneficial to humanity?
Setting up the correct goals turns out much more difficult than it would initially seem, because the goal must be encoded as a number, called a “loss function” (although there is research on alternatives). We can’t currently tell an AI to “make the world better”. We can only tell it to minimize infant mortality, maximize GDP, and minimize number of people dying of cancer. Notice how easy it is for a machine to misunderstand our intent: one way to achieve all of these goals is to kill all humans, and convert the entire country into factories. Then, nobody would die of cancer, no babies would ever die again, and the GDP would be astronomical.
This is not just theoretical. Anecdotes abound of researchers wanting artificial agents to do one thing, and having them do something completely different. In fact, even OpenAI had a post dedicated to this issue, where they trained a neural network to play a racing game by having it maximize score. Instead of finishing the race, the AI found that it could maximize score by going in a circle, collecting points by repeatedly crashing into things:
Imagine a genie in a bottle, trying to do the least amount of work. Saying “I wish to maximize X” needs to have doing what you want be the easiest possible way of achieving X.
An organization or company can be seen as an AI built of humans: a machine whose computational elements are human brains. I am certainly not the first to propose this, but I believe that I can convince you that this isn’t just a surface-level similarity.
The philosophy of groups and entities is an attractive topic of discussion (When does a group of cells become an organism? When does a group of people become a different entity altogether?), but my approach here will be more direct and mundane. I claim that the structure of an organization literally encodes a learning algorithm that minimizes its loss function, whatever it might be.
To show this, we can create a toy organization structure which will roughly encode the idea of people working for a company. The model uses the mechanic of employees getting promoted & fired based on helping/hurting the “bottom line” (ie: loss function). You can see the full model specification here:
Summarizing the results, we get the following:
This plot shows a curve familiar to everyone who has played with machine learning - at the beginning, the organization’s employees are randomly chosen, and over time, those who are most focused on the organization’s goals are promoted, and those who do not have the same singular focus are fired, resulting in an organization where on average, a member is over 2 standard deviations above the general population when it comes to focus on the company’s loss function.
Just like with an AI, this is not bad by itself - all it is saying is that the organization will end up having behavior generally similar to that of an AI which is given the same task. If the organization’s loss function stands against benefit to humanity, it will select for members who are either good at ignoring/giving excuses for cognitive dissonance, or simply care about having a job to feed their family more than the nebulous fate of faceless numbers in a database. To clarify: humans are known not to behave as rational self-interested agents in these situations, making the corporation’s predicted behavior distinctly nonhuman.
It could be argued that since people make up an organization, there is only so far the organization could go before its constituent elements would rebel. While there might be an element of truth to this, it looks like the answer to “how far” is far, FAR beyond what can be considered acceptable:
Our malleability and ability to cooperate as part of a group might plausibly be the single most important trait that has led to the rise of human civilization1, but it also makes organizations and groups of humans behave in sometimes terrifying and unscrupulous, even murderous ways - exactly the traits that we fear will show up in AI.
To end this section on a positive note, it does seem like there might be a line most humans will not cross, no matter what their group’s goals, and that is performing an action that will guarantee total human extinction. So indeed, unlike a “pure” AI, a machine made of humans does have a limit… somewhere between “kill all the jews/cambodians/tutsis/etc” and “sterilize the planet”.
Maybe that’s not such a positive note after all…
Let’s recap. I have established that organizations behave like non-human AIs, optimizing whatever goal they were tasked with: even just hiring and firing workers is sufficient to guarantee this. I have also claimed that these machines being made of humans does not stop them from behaving in decidedly anti-human, or even genocidal ways, if that ends up being what they try to optimize.
The question, then, is what are corporations optimizing? I will focus on corporations here, but the same ideas apply to governments, religious groups, and any other large organized group of humans.
I’ll use the example of Facebook, since it seems to be a popular scapegoat these days. Facebook’s purpose, according to its certificate of incorporation is:
The purpose of the corporation is to engage in any lawful act or activity for which corporations may be organized under the General Corporation Law of the State of Delaware (“General Corporation Law”).
There is nothing about social networks here - Facebook as an entity exists to do whatever it damn-well pleases, so long as it is not explicitly against the law. The company could completely restructure itself as a producer of lollipops or machine guns if it so desired. ML practitioners will realize that this isn’t a loss function at all - it is a regularizer.
To those unfamiliar with the terminology, a regularizer is a term you add to your loss function to nudge an optimizer, which might converge to an undesired result, towards an optimization point with good properties. The regularizer here consists of the laws and regulations governing corporate behavior. A corporation gets punished if it dumps toxic waste into rivers, or otherwise behaves in a manner society deems unacceptable.
Of course, examples immediately come to mind of corporations that were caught doing all sorts of illegal things. This is because a penalty for breaking the law is only paid if it one caught. Just like a regularizer multiplied by 0 has no effect, laws have no effect unless there is a non-negligible chance of them actually being enforced, and having a larger negative effect than the benefits of unwanted behavior2.
That being said, Facebook’s loss function is not fully specified by its articles of incorporation. Let’s look at the company mission:
Facebook’s mission is to give people the power to build community and bring the world closer together. People use Facebook to stay connected with friends and family, to discover what’s going on in the world, and to share and express what matters to them.
Whatever you think of Mark Zuckerberg, thus far his company has been faithfully optimizing this objective, despite the darker parts of human nature it has exposed. Even people at the company noticed that the goal seemed to be blindly pursued, which was raised in 2016 with The Ugly memo:
We connect people. Period. That’s why all the work we do in growth is justified. All the questionable contact importing practices. All the subtle language that helps people stay searchable by friends. All of the work we do to bring more communication in. The work we will likely have to do in China some day. All of it. (…) That can be bad if they make it negative. Maybe it costs someone a life by exposing someone to bullies. Maybe someone dies in a terrorist attack coordinated on our tools.
The mission can be seen as the current loss function that is powering the corporation. This isn’t the end, though - there is another level above the company’s mission.
In a public corporation, shareholders hold the power to modify the mission. This is known as a “pivot”. Big investors, such as banks, seem largely interested in ensuring that the mission statement, whatever it may be, guarantees profit. Notice that shareholders here are not people3 - they are usually other corporations and organizations.
Corporate law and AI safety have much more in common than is initially evident: they both have the goal of ensuring superhuman entities are beneficial to humanity. The only difference between a corporation and a strong computer-based AI is that a strong AI is hypothesized to be much faster.
If you want to research safe AI, don’t waste time with hypotheticals. You have a very easy problem statement: how do you set up government regulations, default corporate law, as well as the definition of a corporation to guarantee that the entity won’t behave in an anti-human manner? If you can’t manage it in the real corporate environment, then you have no chance against an entity that thinks and acts thousands of times faster than a corporation.
Similarly, board members of a corporation should be aware that the corporation will interpret the company’s mission literally, and blindly. Sure, maybe the current CEO has humanity’s best interests in mind, but the company is not the CEO. The CEO can die, leave, or be fired. Each such departure and hire is just another step in the corporate optimizer.
The same reasoning suggests that jailing an executive for crimes committed in pursuit of the company’s mission is ultimately ineffective long-term, for the same reason as previously stated: if it is beneficial for the corporation to perform illegal actions, the optimizer will eventually select another executive willing to perform them.
Finally, there is a simple test of corporate and AI safety. If the corporation’s very existence was causing harm to humanity, would the corporation/AI commit suicide? That is, does there exist an internal watchdog with the power to terminate the company, or otherwise is there a government regulation that when broken leads to the death sentence for an entire corporation?
The only such mechanism that I am aware of is a corporate bankrupcy and liquidation. Unfortunately, this mechanism is focused on corporations failing in a profitability goal. There seems to be no equivalent mechanic for corporations which fail at being beneficial to humanity - a corporation cannot go to jail for actions that would warrant life sentences were they done by humans.
Given the above discussion, I believe that there should be a lot more experimentation in specifying loss functions for corporations. In my view, OpenAI’s new structure is exactly a step in that direction.
On one hand, you want to restrict an AI’s space of actions to limit the harm it can do, but on the other hand, the resulting safe AI must be competitive with unsafe AI, otherwise unsafe AI will always win.
As far as I understand, this matches perfectly with the motivations specified for OpenAI’s “Capped Profit” entity. It remains to be seen if their specific setup is sufficient to guarantee safety, but at least they seem to recognize the problem!
Human individuals are weak and useless in nature without the power of our entire civilization providing them with technology, communications, and support. Think of humanity’s greatest achievements, such as visiting the moon. This involved thousands of people working together to create something with complexity beyond the abilities of a single human - no one human understood the intricacies of everything from mining metals to the circuit design of each subsystem of the lunar module. ↩
The machine can also avoid a loss by modifying the regularizer itself. The entire lobbying industry is built for this purpose. ↩
I believe that corporate personhood comes from the flawed intuition that a group of people behaves similarly to a big human. I hypothesize that a corporations behave more similarly to a pure AI than a human when given enough time (see the hiring/firing model above). ↩