Scientists have trained an artificial intelligence (AI) system to see things the way humans do, by inferring an environment with just a few quick glimpses around, that may pave the way for more effective search-and-rescue robots.
Most AI tools are trained for very specific tasks, such as to recognise an object or estimate its volume in an environment they have experienced before. Scientists at University of Texas at Austin in the US wanted to develop an AI for general purpose, gathering visual information that can then be used for a wide range of tasks. “We want an agent that is generally equipped to enter environments and be ready for new perception tasks as they arise,” said Kristen Grauman, a professor at University of Texas. “It behaves in a way that’s versatile and able to succeed at different tasks because it has learned useful patterns about the visual world,” Grauman said in a statement.
The research, published their results today in the journal Science Robotics, used deep learning, a type of machine learning inspired by the brain’s neural networks, to train their agent on thousands of 360-degree images of different environments. Now, when presented with a scene it has never seen before, the agent uses its experience to choose a few glimpses — like a tourist standing in the middle of a cathedral taking a few snapshots in different directions — that together add up to less than 20 per cent of the full scene. What makes this system so effective is that it is not just taking pictures in random directions but, after each glimpse, choosing the next shot that it predicts will add the most new information about the whole scene. Based on glimpses, the agent infers what it would have seen if it had looked in all the other directions, reconstructing a full 360-degree image of its surroundings.
“It learns to make intelligent guesses about where to gather visual information to succeed in perception tasks,” Grauman said. One of the main challenges the scientists set for themselves was to design an agent that can work under tight time constraints. This would be critical in a search-and-rescue application. For example, in a burning building a robot would be called upon to quickly locate people, flames and hazardous materials and relay that information to firefighters. For now, the new agent operates like a person standing in one spot, with the ability to point a camera in any direction but not able to move to a new position. Equivalently, the agent could gaze upon an object it is holding and decide how to turn the object to inspect another side of it. Next, the researchers are developing the system further to work in a fully mobile robot.