The Robots of Gotham (2018): even drones have blind spots in their computer vision systems

Robin Murphy
Oct 23, 2018
6 min read

Recommendation: If you have a low tolerance for GameLit, you may want to skip this book-- but you’d be missing a smorgasbord of interesting ideas about AI and robotics and the opportunity to learn about computer vision.

The Robots of Gotham is a highly rated GameLit adventure (the book version of a videogame) where the protagonist, Barry Simcoe, leaves his hotel on a quest to retrieve something important while evading big robots, drones, sexy robots, dinosaurs, and more drones. Barry then returns to his hotel only to have to go out again and get something else from someplace harder and more sinister. Wash. Spin. Repeat. Eventually save the world from a pandemic clock-stopper virus. There is even a John Wick moment with a trapped dog named Croaker (the dog has frequent near death experiences, don’t ask) being one of the first quests.

GameLit may not be everyone’s favorite genre, but The Robots of Gotham should have a universal appeal because of its smorgasbord of interesting ideas about AI and robots. It poses the concept of machine intelligences gaining sentience, then being granted citizenship (which is not too far fetched as we’ve seen Sophia be granted citizenship by Saudi Arabia) but The Robots of Gotham takes it further, with some of the AI agents becoming elected officials, and a few becoming fascist dictators. The sentient AI’s develop a form of sex, really genetic algorithms, to create new AIs. AIs take years to become intelligent enough, following Doug Lenat’s Cyc idea, and so gestate in AI nurseries. And there is a lot about computer vision and cybersecurity, which by itself makes the The Robots of Gotham a great teachable moment.

One of the earliest and most realistic ideas in the book is the use of computer vision by the drones. Surveillance drones are everywhere in the streets of Chicago, giving the book a Dark Angel vibe. However, Barry and his new best friend, Sergei, exploit a weakness in unmanned systems and computer vision to evade detection.

The weakness in computer vision for drones, and in surveillance systems in general, is the sheer size and complexity of the imagery. In real life, a basic quadcopter that you can buy from Amazon with a 4K camera may produce over 1 gigabyte of high resolution video and imagery for a single 20 minute flight. Streaming that much data in real-time is effectively impossible because of wireless bandwidth, even for the US Department of Defense. When the Global Hawk drone was hastily pressed into service surveying Afghanistan and Iraq with its high resolution cameras, the military discovered that it did not have enough satellite communications bandwidth to stream the data and bandwidth for these types of assets still remains a problem.

Drones generally do stream, but what the operator sees on the display from consumer grade UAVs is usually a lower resolution stream, with the drone recording the high resolution version to an onboard SD card. The lower resolution video is sufficient for navigation and mission situation awareness but generally not high enough resolution to detect small items or indicators of missing people, etc. That’s because the lower resolution is usually achieved with a lossy compression algorithm that eliminates the nearly invisible, subtle details that computers use for object and scene identification. As a result, what is good enough for the human eye to control the drone is not good enough for a computer to process and analyze.

The current solution to the real-time streaming limitation is to hope that the drone returns and the SD card can be extracted, but in reality this just defers the problem. Now the operator has a physical copy of a large amount of high resolution data. The operator now has three options. One is to examine the data manually, which takes hours and time away from flying more missions. A second option is to upload the imagery to the cloud, which takes bandwidth and hours to upload, to allow others to examine the data manually (more hours) or to have a high performance computer run sophisticated detection algorithms (very fast once it gets the data). Cloud computing is often called “fog computing" because the bandwidth limitations and intermittent loss of wireless connectivity are so prevalent- you don’t have a pretty fluffy cloud, you get wispy, soupy, gray crap. The third option is bring a high performance computer with them (unlikely due to logistics).

The options to reduce the dependency on the cloud are to put more processing onboard the drone, put more processing with the operator’s control unit, or a combine the two. These types of solutions are called “edge computing” because the computations are being performed at the forward edge of the wireless network. Edge computing helps but problems still arise. If the computing is pushed to onboard the drone, the cost of the drone may increase significantly because it will likely use dedicated custom computing chips. If processing is done on the operator’s control unit then there is still a wireless issue between the drone and the operator control unit,.

In order to solve the drone-to-OCU wireless bandwidth problem, one solution is for the drone to examine its video feed and then send only those images which are relevant. This filtering reduces the bandwidth. Filtering has another advantage: Privacy. If the drone is tracking a targeted person through a crowd, then the drone can cut out regions of the image that aren’t of interest or reduce the resolution beyond use so as to reduce bandwidth. But note that this also means that any bystanders are now effectively invisible. The original data would still be stored on the SD card for future use and more complete examination if needed, but there would be less opportunity for unauthorized surveillance and mayhem. Sure, if you were a Bad Person, you could shoot down the drone so that the SD card would be lost but shooting down a drone suggests that someone was doing bad things at that location and time and trigger attention. Instead, a better strategy is to count on the data about your bad activities being effectively buried under the data avalanche if you never cause anyone to go back to look.

The Edge drone filtering incurs two problems that helps Bad People evade drones. The biggest one is that the drone can only perceive in real time what it is pre-programmed to look for or mark as interesting. If you can hack the system to change that configuration file of who and what to flag and alert on, then you can pretty do as you want. If you get yourself on the not interesting list, then you become invisible, or at least until the original data is reviewed in more detail. This induced “blind spot" was also part of the science narrative of the TV show Person of Interest. It’s also what we as humans do to handle the data from our visual system- we internally reduce the data via expectations. For example, how many times have you walked past a set of keys that you were looking for because they were not in the place you expected them to be? Psychologists have documented this filtering phenomena since the 1980s.

The second problem is that it is harder to share information in real-time throughout the larger system. If the authorities discover a new person or situation to target, they have to somehow quickly push the new configuration onto the drone. Pushing updates in real-time reduces the cybersecurity of the system and puts the drone into MurderBot territory of vulnerability to hacking. Essentially, Edge computing is playing the percentages that most situations will be what it is programmed to detect or react to and that truly novel situations will be identified through some other means.

What doesn’t help Bad People evade drones is disguises. Pulling over a gray hoodie over your head may fool a human, but the computer vision system is often looking at much harder to hide attributes, such as gait detection and unique clothing (Barry’s shoes are a key feature in identifying him). In some regards, the computer has to work harder to find and track a person, and so it uses many more cues that we humans apparently do.

Computer vision is just one of the many interesting ideas in the The Robots of Gotham. If you don’t like GameLit, you might enjoy reading The Robots of Gotham as a scientific detective story: play an internal game of “hmmm, could I write a better algorithm or introduce a better data review procedure to prevent Barry and Sergei from avoiding detection?” The answer may be a surprising “no”- which makes The Robots of Gotham one of the most scientifically accurate books of the past few years. And who knows, maybe one day Venezuela will become a superpower through its adoption of artificial intelligence while the US becomes vulnerable because it outlaws AI… Until then keep reading!

Read The Robots of Gotham today by ordering with Amazon, just follow the link below:

- Robin

For an audio version of this review, click below...