The tool is called Automatic Alternative Text, and it dovetails with text-to-speech engines that allow blind people to use Facebook in other ways. Using deep neural networks, the system can identify particular objects in a photo, from cars and boats to ice cream and pizza. It can pick out particular characteristics of the people in the photo, including smiles and beards and eyeglasses. And it can analyze a photo in a more general sense, determining that a photo depicts sun or ocean waves or snow. The text-to speech engine will then “read” these things aloud.
A Facebook employee named Matt King showed me a prototype of the service last fall. King, 49, is blind, and though he acknowledged that the service was far from perfect, he said it was a notable improvement over the the status quo. He wasn’t wrong. King showed the system a photo of a friend and his bike as he traveled through Europe. Facebook’s AI described the scene as outdoors. It said the photo included grass and trees and clouds, and that the scene was near water. If the photo had turned up in his News Feed in the past, King would have known only that his friend had posted a photo.
“My dream is that it would also tell me that it includes Christoph with his bike,” King told me. “But from my perspective as a blind user, going from essentially zero percent satisfaction from a photo to somewhere in the neighborhood of half … is a huge jump.”
As King told me, the system doesn’t always get things right. And it hasn’t yet reached a point where it generates captions in full and complete sentences. But this will come. Others have already used deep neural nets to do much the same thing. As King pointed out, a service that only gets part of the way there is still important now—Facebook says that more than 50,000 people already are using the service with text-to-speech engines.