top of page
Writer's pictureRaimund Laqua

Can Research into AI Safety Help Improve Overall Safety?



AI Safety
AI Safety

The use of Artificial Intelligence (AI) to drive autonomous automobiles otherwise known as "self-driving cars" has in recent months become an area of much interest and discussion. The use of self-driving cars while offering benefits also poses some challenging problems. Some of these are technical while others are more of a moral and ethical nature.

One of the key questions has to do with what happens if an accident occurs and particularly if the self-driving car caused the accident. How does the car decide if it should sacrifice its own safety to save a bus load of children? Can it deal with unexpected issues or only mimic behavior based on the data it learned from? Can we even talk about AI deciding for itself or having its own moral framework?

Before we get much further, it is important to understand that in many ways, the use of computers and algorithms to control machinery already exists and has for some time. There is already technology of all sorts used to monitor, control, and make decisions. What is different now is the degree of autonomy and specifically in how machine learning is done to support artificial intelligence.

In 2016, authors from Google Brain, Standford University, UC Berkley and OpenAI, published a paper entitled, "Concrete Problems in AI Safety." In this paper, the authors discuss a number of areas of research that could help to address the possibility of accidents caused by using artificial intelligence. Their approach does not look at extreme cases but rather looks through the lens of a day in the "life" of a cleaning robot.

The paper defines accidents as, "unintended and harmful behavior that may emerge from machine learning systems when we specify the wrong objective function, are not careful about the learning process, or commit other machine learning-related implementation errors." It further goes on to outline several safety-related problems:

  • Avoiding Negative Side Effects: How can we ensure that our cleaning robot will not disturb the environment in negative ways while pursuing its goals, e.g. by knocking over a vase because it can clean faster by doing so? Can we do this without manually specifying everything the robot should not disturb?

  • Avoiding Reward Hacking: How can we ensure that the cleaning robot won’t game its reward function? For example, if we reward the robot for achieving an environment free of messes, it might disable its vision so that it won’t find any messes, or cover over messes with materials it can’t see through, or simply hide when humans are around so they can’t tell it about new types of messes.

  • Scalable Oversight: How can we efficiently ensure that the cleaning robot respects aspects of the objective that are too expensive to be frequently evaluated during training? For instance, it should throw out things that are unlikely to belong to anyone, but put aside things that might belong to someone (it should handle stray candy wrappers differently from stray cellphones). Asking the humans involved whether they lost anything can serve as a check on this, but this check might have to be relatively infrequent—can the robot find a way to do the right thing despite limited information?

  • Safe Exploration: How do we ensure that the cleaning robot doesn’t make exploratory moves with very bad repercussions? For example, the robot should experiment with mopping strategies, but putting a wet mop in an electrical outlet is a very bad idea.

  • Robustness to Distributional Shift: How do we ensure that the cleaning robot recognizes, and behaves robustly, when in an environment different from its training environment? For example, strategies it learned for cleaning an office might be dangerous on a factory work floor.

These problems, while instructive and helpful to explore AI safety, also offer a glimpse of similar issues observed in actual workplace settings. This is not to say that people behave like robots; far from it. However, seeing things from a different vantage point can provide new insights. Solving AI safety may also improve overall workplace safety.

The use of artificial intelligence to drive autonomous machinery will no doubt increase in the months and years ahead. This will continue to raise many questions including how process and occupational safety will be impacted by the increase in machine autonomy. At the same time, research into AI safety may offer fresh perspectives on how we currently address overall safety.

"Just when you think you know something, you have to look at in another way. Even though it may seem silly or wrong, you must try."

From the movie, "Dead Poets Society"


167 views
bottom of page