A new ‘common sense’ test for AI could lead to smarter machines
Content provided by IBM and TNW.
Today’s AI systems are quickly evolving to become humans’ new best friend. We now have AIs that can concoct award-winning whiskey, write poetry, and help doctors perform extremely precise surgical operations. But one thing they can’t do — which is, on the surface, far simpler than all those other things — is use common sense.
Common sense is different from intelligence in that it is usually something innate and natural to humans that helps them navigate daily life, and cannot really be taught. In 1906, philosopher G. K. Chesterton wrote that “common sense is a wild thing, savage, and beyond rules.”
Robots, of course, run on algorithms that are just that: rules.
So no, robots can’t use common sense — yet. But thanks to current efforts in the field, we can now measure an AI’s core psychological reasoning ability, bringing us one step closer.
So why does it matter if we teach AI common sense?
Really it comes down to the fact that common sense will make AI better at helping us solve real-world issues. Many argue that AI-driven solutions designed for complex problems, like diagnosing Covid-19 treatments for example, often fail, as the system can’t readily adapt to a real-world situation where the problems are unpredictable, vague, and not defined by rules.
Common sense includes not only social abilities and reasoning but also a “naive sense of physics.”
Injecting common sense into AI could mean big things for humans; better customer service, where a robot can actually assist a disgruntled customer beyond sending them into an endless “Choose from the following options” loop. It can make autonomous cars react better to unexpected roadway incidences. It can even help the military draw life-or-death information from intelligence.
So why haven’t scientists been able to crack the common sense code thus far?
Called the “dark matter of AI”, common sense is both crucial to AI’s future development and, thus far, elusive. Equipping computers with common sense has actually been a goal of computer science since the field’s very start; in 1958, pioneering computer scientist John McCarthy published a paper titled “Programs with common sense” which looked at how logic could be used as a method of representing information in computer memory. But we’ve not moved much closer to making it a reality since.
Common sense includes not only social abilities and reasoning but also a “naive sense of physics” — this means that we know certain things about physics without having to work through physics equations, like why you shouldn’t put a bowling ball on a slanted surface. It also includes basic knowledge of abstract things like time and space, which lets us plan, estimate, and organize. “It’s knowledge that you ought to have,” says Michael Witbrock, AI researcher at the University of Auckland.
All this means that common sense is not one precise thing, and therefore cannot be easily defined by rules.
We’ve established that common sense requires a computer to infer things based on complex, real-world situations — something that comes easily to humans, and starts to form since infancy.
Computer scientists are making (slow) but steady progress toward building AI agents that can infer mental states, predict future actions, and work with humans. But in order to see how close we actually are, we first need a rigorous benchmark for evaluating an AI’s “common sense,” or its psychological reasoning ability.
Researchers from IBM, MIT, and Harvard have created just that: AGENT, which stands for Action-Goal-Efficiency-coNstraint-uTility. After testing and validation, this benchmark is shown to be able to evaluate the core psychological reasoning ability of an AI model. This means it can actually give a sense of social awareness and could interact with humans in real-world settings.
To demonstrate common sense, an AI model must have built-in representations of how humans plan.
So what is AGENT? AGENT is a large-scale dataset of 3D animations inspired by experiments that study cognitive development in kids. The animations depict someone interacting with different objects under different physical constraints. According to IBM:
“The videos comprise distinct trials, each of which includes one or more ‘familiarization’ videos of an agent’s typical behavior in a certain physical environment, paired with ‘test’ videos of the same agent’s behavior in a new environment, which are labeled as either ‘expected’ or ‘surprising,’ given the behavior of the agent in the corresponding familiarization videos.”
A model must then judge how surprising the agent’s behaviors in the ‘test’ videos are, based on the actions it learned in the ‘familiarization’ videos. Using the AGENT benchmark, that model is then validated against large-scale human-rating trials, where humans rated the ‘surprising’ ‘test’ videos as more surprising than the ‘expected’ test videos.
IBM’s trial shows that to demonstrate common sense, an AI model must have built-in representations of how humans plan. This means combining both a basic sense of physics and ‘cost-reward trade-offs’, which means an understanding of how humans take actions “based on utility, trading off the rewards of its goal against the costs of reaching it.”
While not yet perfect, the findings show AGENT is a promising diagnostic tool for developing and evaluating common sense in AI, something IBM is also working on. It also shows that we can utilize similar traditional developmental psychology methods to those used to teach human children how objects and ideas relate.
In the future, this could help significantly reduce the need for training in these models allowing businesses to save on computing energy, time, and money.
Robots don’t understand human consciousness yet — but with the development of benchmarking tools like AGENT, we’ll be able to measure how close we’re getting.