Since last August, the city of Cincinnati has issued body-worn cameras to 650 police officers at a cost of more than $5 million. When a radio call comes in, officers are supposed to switch on the cameras and start recording. As a result, the department has logged an average of about 90 hours of video a day, every day, since the program was unveiled.
“We have a total of 80,798 individual videos on the system,” says Cincinnati Police Captain Douglas Wiesman. “That is 14,690 hours of video. 24.29 terabytes of data. It’s crazy, isn’t it? That’s insane. I’m not even an IT guy, but I know 24 terabytes is a lot of data.”
As proponents like the American Civil Liberties Union advocate for greater transparency in what have too often become deadly interactions between police and the public, several U.S. states have passed legislation on body-worn police cameras, and interest from departments is growing — supported by $75 million in grant money set aside by the U.S. Department of Justice. Today, more than 40 major American cities are using or planning to use the lightweight, pager-sized devices, and the trend is spreading globally, with England, France and Singapore among those countries that have jumped on board.
In response, a whole industry has sprung up to supply the video-capture hardware. Taser, the company long known for stun guns, dominates the U.S. body camera market. In the first nine months of 2016, Taser rolled out more than 50,000 of them — including more than 11,000 head-mounted units — in 35 cities, according to the company. It also offers a suite of cloud-based tools for uploading, storing and editing the footage. Taser estimates that the body-worn camera and police video management industry will be worth $1 billion a year in revenue.
But as the use of body-worn cameras rises rapidly among police departments in both the United States and around the world, more and more municipalities are facing a challenge familiar to the one the Cincinnati Police Department is now confronting: how to review and manage massive amounts of video data. After all, isolating and cataloging key moments of video footage, which can often encompass several hours of an officer’s shift, is in itself daunting. Multiply that across a whole department, and the technology has left back offices inundated with footage that must be watched, edited, and processed for court proceedings, or carefully redacted for public records and media requests.
Wiesman says the Cincinnati Police Department has hired additional staff just to keep up with the data deluge, but the issue now has some technology companies angling to develop computer-based solutions.
One of the early leaders is Dextro, a computer-vision firm based in New York City.
“We see police departments like the Seattle Police Department that are recording thousands of videos a day,” says Erin Fleischli, Dextro’s director of business development. “That quickly adds up to hundreds of thousands of hours that need to be reviewed. Manual analysis isn’t scalable.”
Dextro has already been developing computer algorithms and vision techniques to help amateur videographers catalog large volumes of video, and to help journalists sift through YouTube videos and identify newsworthy clips. But after meeting with police departments and body-worn camera companies at the invitation of the White House Office of Science and Technology Policy in December 2015, the company realized it could provide similar assistance to police departments.
“We’re naturally fitted to tackle those problems,” says David Luan, Dextro’s co-founder. “That’s what we’ve built everything to do.”
Luan says he isn’t ready to identify which police departments the company is currently working with, nor when its technology will be commercially available — though it is being backed by several angel investors and venture funds. It is also far from the only computer-vision company focusing on video discovery. AlchemyAPI, Clarifai and MetaMind — acquired last year by Salesforce — apply similar algorithms to video.
But thanks to that White House invitation, Dextro has set its sights squarely on police video, and it is now teaching its computer-vision software to recognize patterns specific to police encounters by combing through thousands of hours of stock footage. Along the way, the artificial intelligence system is learning to identify and tag the most important moments in an officer’s video, Fleischli says. That will allow it to automatically extract thumbnails of valuable footage, “like the best view of a possible weapon within a three-minute video.”
Finding relevant footage in police video is not as simple as it sounds. “You need to be able to clearly identify actions like a foot chase, handcuffing, frisking,” she says. “You can’t do that from a series of still images. It’s really hard to extract that information without looking at movement.”
Unlike automatic reading of license plates — which applies optical character recognition to a single image — processing police video means sifting through dozens of frames per second and then recognizing and tracking an object or motif over time. For a foot chase, for example, a computer vision algorithm must learn what a running person looks like and how that person behaves across hundreds of frames which will inevitably contain a fair amount of motion blur. To achieve a reliable level of digital discernment, Dextro must feed its algorithms all the police footage it can find — from the public domain as well as from police clients, including the blurry and the grainy — to train them to recognize the complex actions it wants to extract. This has Dextro playing at the cutting edge of computer vision — but not everyone is convinced that the technology is anywhere near ready for work as sensitive as police video analysis.
Recognizing complex motion in video, after all, has been a difficult problem for computer vision scientists for years, says Carl Vondrick, an artificial intelligence researcher at the Massachusetts Institute of Technology. Computers can now recognize basic actions like a handshake or a person sitting down — or predict a person’s next move, providing it’s simple enough, as Vondrick’s research has shown. But distinguishing an action that occurs over a longer period of time, such as a foot chase, requires the algorithm to remember and integrate information it catalogued long ago. “Humans are good at this,” Vondrick says. “Machines don’t do that yet.”
“When you start trying to recognize events that are unfolding over minutes or hours, you have to be integrating information across I don’t even know how many frames,” Vondrick says. “That is a very difficult question for A.I. right now.”
Dextro hopes to address those shortcomings by integrating its computer vision systems with existing police video storage platforms — so-called “evidence management systems” — such as Taser’s Axon and Evidence.com products, though a more fundamental question still lingers: Just how much video really needs to be evaluated by automated systems such as Dextro’s? Or put another way: Why buy a machine to sift through hours of unremarkable but time-coded footage if a police officer — or lawyer or defendant — knows exactly when the event in question occurred?
Those concerns have left some in the space questioning the value of Dextro’s project.
“Ninety-nine times out of 100,” users are able to find the video they need without the help of algorithms, says Alasdair Field, CEO of Reveal Media. His company provides body-worn cameras and a video management system to police departments, as well as parking lot attendants and nightclub staff, in 35 countries. Generally, Field says, data like the time, date, place and a case’s identification number are enough to find video of interest. “Frankly there’s no need for any computer intelligence beyond a database that ties X number with Y video,” Field says.
While body-worn cameras are new, he adds, the techniques used to log evidence haven’t changed.
Still, as police departments around the world collect more and more video, Dextro is betting that people will be looking for automated solutions to help process the data — and perhaps to avoid the cost of additional staffing. Research programs applying algorithms to police video are already underway at the University of California, Los Angeles, which has teamed up with the Los Angeles Police Department.
WatchGuard Video, a Texas-based body-worn camera supplier, also recognized the need for automated tools. It has 60 engineers working to develop them, some using computer vision technology like Dextro’s, according to spokeswoman Jaime Carlin. Last October it released Redactive, an add-on to its evidence-management system that automatically scans police videos to identify faces that officers can choose to blur out. This removes the burden of having to hire more personnel to handle facial redaction for public records, court or media requests, Carlin says. Many departments simply don’t realize how much additional work body-worn cameras require, Carlin says. “It’s beyond buying hardware.”
The Cincinnati Police Department’s Wiesman, who is organizing the department’s body-worn camera program, agrees. “We didn’t know how to staff it,” he says. Cincinnati has already hired five technicians and a supervisor to process the body-worn camera video. The department plans to add an administrative clerk and has also assigned two police officers to help civilians understand the video. That’s a total of nine people dedicated to processing video, he says, which is on top of the millions the department is paying Taser for the cameras and video storage.
Wiesman says he wasn’t aware of emerging technologies like Dextro’s, but admits that if its integration was seamless, his team would give it a try. The number of cameras deployed by the Cincinnati Police Department will soon exceed 1,000, as every sworn officer is slated to get one, Wiesman says.
Dextro sees that as an opportunity. Using its technology as a productivity multiplier, the company believes police departments will be able to crunch unwieldy amounts of data in a tiny fraction of the time it would take human staffers — although human analysts would still need to review the isolated footage before shipping it off to the courthouse or newsroom.
“Our job is not to eliminate humans from the loop,” Luan says. “We’re doing the hard legwork, and the judgment call is still made by people.”