The app uses AI to recognize people, objects, and scenes
by James Vincent@jjvincent
Jul 12, 2017
Microsoft has released Seeing AI a smartphone app that uses computer vision to describe the world for the visually impaired. With the app downloaded, the users can point their phone’s camera at a person and it’ll say who they are and how they’re feeling. They can also point it at a product and it’ll tell them what it is. All of this is done using artificial intelligence that runs locally on their phone.
The company showed off a prototype of Seeing AI in March last year at its Build conference, but starting today, the app is available to download for free in the US on iOS. However, there’s no word yet on when it’ll come to Android or other countries.
The app works in a number of scenarios. As well as recognizing people it’s seen before and guessing strangers’ age and emotion, it can identify household products by scanning barcodes. It also reads and scan documents, and recognizes US currency. This last function is a good example of how useful it can be. As all dollar bills are the same size and color regardless of value, spotting the difference can be difficult or even impossible for the visually impaired. An app like Seeing AI helps them find that information.
The app uses neural networks to identify the world around it, the same basic technology that’s being deployed all over Silicon Valley, powering self-driving cars, drones, and more. The app’s most basic functions are carried out directly on the device itself. This means they can be accessed more quickly and in situations where there’s no stable internet connection. However, Seeing AI’s experimental features like describing an entire scene or recognizing handwriting require a connection to the cloud.
Speaking to The Verge at a Microsoft event in London, Saqib Shaikh, the tech lead on Seeing AI, said he most commonly used the app for reading documents like signs and menus. He points out the app doesn’t just perform the basic task of optical character recognition technology, but also directs the user telling them to move the camera left or right to get the target in shot.
Shaikh says that the difference between this and similar apps is the speed of the neural nets: “One of the things we wanted to do was face recognition on device, and we’ve done that so within a few milliseconds you’ll hear the result. It’s all about the speed, and we try to do as much as we can on the device.”