Computer vision is considered an AI-complete problem. In other words, solving it would be equivalent to creating a program that’s as smart as humans. Needless to say, such a program is yet to be created. However, if you’ve ever used apps like Google Goggles or Google Photos—or watched the segment on Google Lens in the keynote of Google I/O 2017—you probably realize that computer vision has become very powerful.
Through a REST-based API called Cloud Vision API, Google shares its revolutionary vision-related technologies with all developers. By using the API, you can effortlessly add impressive features such as face detection, emotion detection, and optical character recognition to your Android apps. In this tutorial, I’ll show you how.
Prerequisites
To be able to follow this tutorial, you must have:
- a Google Cloud Platform account
- a project on the Google Cloud console
- the latest version of Android Studio
- and a device that runs Android 4.4 or higher
If some of the above requirements sound unfamiliar to you, I suggest you read the following introductory tutorial about the Google Cloud Machine Learning platform:
-
Android SDKHow to Use Google Cloud Machine Learning Services for Android
1. Enabling the Cloud Vision API
You can use the Cloud Vision API in your Android app only after you’ve enabled it in the Google Cloud console and acquired a valid API key. So start by logging in to the console and navigating to API Manager > Library > Vision API. In the page that opens, simply press the Enable button.
If you’ve already generated an API key for your Cloud console project, you can skip to the next step because you will be able to reuse it with the Cloud Vision API. Otherwise, open the Credentials tab and select Create Credentials > API key.
In the dialog that pops up, you will see your API key.
2. Adding Dependencies
Like most other APIs offered by Google, the Cloud Vision API can be accessed using the Google API Client library. To use the library in your Android Studio project, add the following compile
dependencies in the app
module’s build.gradle file:
compile 'com.google.api-client:google-api-client-android:1.22.0' compile 'com.google.apis:google-api-services-vision:v1-rev357-1.22.0' compile 'com.google.code.findbugs:jsr305:2.0.1'
Furthermore, to simplify file I/O operations, I suggest you also add a compile
dependency for the Apache Commons IO library.
compile 'commons-io:commons-io:2.5'
Because the Google API Client can work only if your app has the INTERNET
permission, make sure the following line is present in your project’s manifest file:
3. Configuring the API Client
You must configure the Google API client before you use it to interact with the Cloud Vision API. Doing so primarily involves specifying the API key, the HTTP transport, and the JSON factory it should use. As you might expect, the HTTP transport will be responsible for communicating with Google’s servers, and the JSON factory will, among other things, be responsible for converting the JSON-based results the API generates into Java objects.
For modern Android apps, Google recommends that you use the NetHttpTransport
class as the HTTP transport and the AndroidJsonFactory
class as the JSON factory.
The Vision
class represents the Google API Client for Cloud Vision. Although it is possible to create an instance of the class using its constructor, doing so using the Vision.Builder
class instead is easier and more flexible.
While using the Vision.Builder
class, you must remember to call the setVisionRequestInitializer()
method to specify your API key. The following code shows you how:
Vision.Builder visionBuilder = new Vision.Builder( new NetHttpTransport(), new AndroidJsonFactory(), null); visionBuilder.setVisionRequestInitializer( new VisionRequestInitializer("YOUR_API_KEY"));
Once the Vision.Builder
instance is ready, you can call its build()
method to generate a new Vision
instance you can use throughout your app.
Vision vision = visionBuilder.build();
At this point, you have everything you need to start using the Cloud Vision API.
4. Detecting and Analyzing Faces
Detecting faces in photographs is a very common requirement in computer vision-related applications. With the Cloud Vision API, you can create a highly accurate face detector that can also identify emotions, lighting conditions, and face landmarks.
For the sake of demonstration, we’ll be running face detection on the following photo, which features the crew of Apollo 9:
I suggest you download a high-resolution version of the photo from Wikimedia Commons and place it in your project’s res/raw folder.
Step 1: Encode the Photo
The Cloud Vision API expects its input image to be encoded as a Base64 string that’s placed inside an Image
object. Before you generate such an object, however, you must convert the photo you downloaded, which is currently a raw image resource, into a byte
array. You can quickly do so by opening its input stream using the openRawResource()
method of the Resources
class and passing it to the toByteArray()
method of the IOUtils
class.
Because file I/O operations should not be run on the UI thread, make sure you spawn a new thread before opening the input stream. The following code shows you how:
// Create new thread AsyncTask.execute(new Runnable() { @Override public void run() { // Convert photo to byte array InputStream inputStream = getResources().openRawResource(R.raw.photo); byte[] photoData = IOUtils.toByteArray(inputStream); inputStream.close(); // More code here } });
You can now create an Image
object by calling its default constructor. To add the byte
array to it as a Base64 string, all you need to do is pass the array to its encodeContent()
method.
Image inputImage = new Image(); inputImage.encodeContent(photoData);
Step 2: Make a Request
Because the Cloud Vision API offers several different features, you must explicitly specify the feature you are interested in while making a request to it. To do so, you must create a Feature
object and call its setType()
method. The following code shows you how to create a Feature
object for face detection only:
Feature desiredFeature = new Feature(); desiredFeature.setType("FACE_DETECTION");
Using the Image
and the Feature
objects, you can now compose an AnnotateImageRequest
instance.
AnnotateImageRequest request = new AnnotateImageRequest(); request.setImage(inputImage); request.setFeatures(Arrays.asList(desiredFeature));
Note that an AnnotateImageRequest
object must always belong to a BatchAnnotateImagesRequest
object because the Cloud Vision API is designed to process multiple images at once. To initialize a BatchAnnotateImagesRequest
instance containing a single AnnotateImageRequest
object, you can use the Arrays.asList()
utility method.
BatchAnnotateImagesRequest batchRequest = new BatchAnnotateImagesRequest(); batchRequest.setRequests(Arrays.asList(request));
To actually make the face detection request, you must call the execute()
method of an Annotate
object that’s initialized using the BatchAnnotateImagesRequest
object you just created. To generate such an object, you must call the annotate()
method offered by the Google API Client for Cloud Vision. Here’s how:
BatchAnnotateImagesResponse batchResponse = vision.images().annotate(batchRequest).execute();
Step 3: Use the Response
Once the request has been processed, you get a BatchAnnotateImagesResponse
object containing the response of the API. For a face detection request, the response contains a FaceAnnotation
object for each face the API has detected. You can get a list of all FaceAnnotation
objects using the getFaceAnnotations()
method.
Listfaces = batchResponse.getResponses() .get(0).getFaceAnnotations();
A FaceAnnotation
object contains a lot of useful information about a face, such as its location, its angle, and the emotion it is expressing. As of version 1, the API can only detect the following emotions: joy, sorrow, anger, and surprise.
To keep this tutorial short, let us now simply display the following information in a Toast
:
- The count of the faces
- The likelihood that they are expressing joy
You can, of course, get the count of the faces by calling the size()
method of the List
containing the FaceAnnotation
objects. To get the likelihood of a face expressing joy, you can call the intuitively named getJoyLikelihood()
method of the associated FaceAnnotation
object.
Note that because a simple Toast
can only display a single string, you’ll have to concatenate all the above details. Additionally, a Toast
can only be displayed from the UI thread, so make sure you call it after calling the runOnUiThread()
method. The following code shows you how:
// Count faces int numberOfFaces = faces.size(); // Get joy likelihood for each face String likelihoods = ""; for(int i=0; iYou can now go ahead and run the app to see the following result:
5. Reading Text
The process of extracting strings from photos of text is called optical character recognition, or OCR for short. The Cloud Vision API allows you to easily create an optical character reader that can handle photos of both printed and handwritten text. What's more, the reader you create will have no trouble reading angled text or text that's overlaid on a colorful picture.
The API offers two different features for OCR:
TEXT_DETECTION
, for reading small amounts of text, such as that present on signboards or book covers- and
DOCUMENT_TEXT_DETECTION
, for reading large amounts of text, such as that present on the pages of a novelThe steps you need to follow in order to make an OCR request are identical to the steps you followed to make a face detection request, except for how you initialize the
Feature
object. For OCR, you must set its type to eitherTEXT_DETECTION
orDOCUMENT_TEXT_DETECTION
. For now, let's go with the former.Feature desiredFeature = new Feature(); desiredFeature.setType("TEXT_DETECTION");You will, of course, also have to place a photo containing text inside your project's res/raw folder. If you don't have such a photo, you can use this one, which shows a street sign:
You can download a high-resolution version of the above photo from Wikimedia Commons.
In order to start processing the results of an OCR operation, after you obtain the
BatchAnnotateImagesResponse
object, you must call thegetFullTextAnnotation()
method to get aTextAnnotation
object containing all the extracted text.final TextAnnotation text = batchResponse.getResponses() .get(0).getFullTextAnnotation();You can then call the
getText()
method of theTextAnnotation
object to actually get a reference to a string containing the extracted text.The following code shows you how to display the extracted text using a
Toast
:Toast.makeText(getApplicationContext(), text.getText(), Toast.LENGTH_LONG).show();If you run your app now, you should see something like this:
Conclusion
In this tutorial you learned how to use the Cloud Vision API to add face detection, emotion detection, and optical character recognition capabilities to your Android apps. I'm sure you'll agree with me when I say that these new capabilities will allow your apps to offer more intuitive and smarter user interfaces.
It's worth mentioning that there's one important feature that's missing in the Cloud Vision API: face recognition. In its current form, the API can only detect faces, not identify them.
To learn more about the API, you can refer to the official documentation.
And meanwhile, check out some of our other tutorials on adding computer learning to your Android apps!
Android SDKHow to Use Google Cloud Machine Learning Services for Android Android SDKCreate an Intelligent App With Google Cloud Speech and Natural Language APIs Android ThingsAndroid Things and Machine Learning