Use Machine Learning to Recognize Images With IBM Watson

Wouldn’t it be great if an Android app could see and understand its surroundings? Can you imagine how much better its user interface could be if it could look at its users and instantly know their ages, genders, and emotions? Well, such an app might seem futuristic, but it’s totally doable today.

With the IBM Watson Visual Recognition service, creating mobile apps that can accurately detect and analyze objects in images is easier than ever. In this tutorial, I’ll show you how to use it to create a smart Android app that can guess a person’s age and gender and identify prominent objects in a photograph.

Prerequisites

To be able to follow this tutorial, you must have:

an IBM Bluemix account
Android Studio 3.0 Canary 8 or higher
and a device or emulator running Android 4.4 or higher

1. Activating the Visual Recognition Service

Like all Watson services, the Visual Recognition service too must be manually activated before it can be used in an app. So log in to the IBM Bluemix console and navigate to Services > Watson. In the page that opens, press the Create Watson service button.

From the list of available services shown next, choose Visual Recognition.

You can now give a meaningful name to the service and press the Create button.

Visual Recognition service configuration

Once the service is ready, an API key will be generated for it. You can view it by opening the Service credentials tab and pressing the View Credentials button.

2. Project Setup

In this tutorial, we’ll be using the Watson Java and Android SDKs while interacting with the Visual Recognition service. We’ll also be using the Picasso library in order to fetch and display images from the Internet. Therefore, add the following implementation dependencies to your app module’s build.gradle file:

implementation 'com.ibm.watson.developer_cloud:visual-recognition:3.9.1'
implementation 'com.ibm.watson.developer_cloud:android-sdk:0.4.2'
implementation 'com.squareup.picasso:picasso:2.5.2'

To be able to interact with Watson’s servers, your app will need the INTERNET permission, so request for it in your project’s AndroidManifest.xml file.

Additionally, the app we’ll be creating today will need access to the device’s camera and external storage media, so you must also request for the CAMERA and WRITE_EXTERNAL_STORAGE permissions.

Lastly, add your Visual Recognition service’s API key to the strings.xml file.

a1234567890bcdefe

3. Initializing a Visual Recognition Client

The Watson Java SDK exposes all the features the Visual Recognition service offers through the VisualRecognition class. Therefore, you must now initialize an instance of it using its constructor, which expects both a version date and the API key as its arguments.

While using the Visual Recognition service, you’ll usually want to take pictures with the device’s camera. The Watson Android SDK has a CameraHelper class to help you do so. Although you don’t have to, I suggest you also initialize an instance of it inside your activity’s onCreate() method.

private VisualRecognition vrClient;
private CameraHelper helper;

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    // Initialize Visual Recognition client
    vrClient = new VisualRecognition(
            VisualRecognition.VERSION_DATE_2016_05_20,
            getString(R.string.api_key)
    );

    // Initialize camera helper
    helper = new CameraHelper(this);
}

At this point, you have everything you need to start analyzing images with the service.

4. Detecting Objects

The Visual Recognition service can detect a large variety of physical objects. As input, it expects a reasonably well-lit picture whose resolution is at least 224 x 224 pixels. For now, let’s use the device camera to take such a picture.

Step 1: Define a Layout

The user must be able to press a button to take the picture, so your activity’s layout XML file must have a Button widget. It must also have a TextView widget to list the objects detected.

Optionally, you can throw in an ImageView widget to display the picture.

In the above code we’ve added an on-click event handler to the Button widget. You can generate a stub for this widget in code it by clicking on the light bulb shown beside it.

public void takePicture(View view) {
    // More code here
}

Step 2: Take a Picture

You can take a picture by simply calling the CameraHelper object’s dispatchTakePictureIntent() method, so add the following code inside the event handler:

helper.dispatchTakePictureIntent();

The above method uses the device’s default camera app to take the picture. That means to gain access to the picture taken, you must override your activity’s onActivityResult() method and look for results whose request code is REQUEST_IMAGE_CAPTURE. Here’s how you can do that:

@Override
protected void onActivityResult(int requestCode, 
                                int resultCode, 
                                Intent data) {
    super.onActivityResult(requestCode, resultCode, data);
    
    if(requestCode == CameraHelper.REQUEST_IMAGE_CAPTURE) {
        // More code here
    }
}

Once you find the right result, you can extract the picture from it in the form of a Bitmap object using the getBitmap() method of the CameraHelper class. You can also get the absolute path of the picture using the getFile() method. We’ll need both the bitmap and the absolute path, so add the following code next:

final Bitmap photo = helper.getBitmap(resultCode);
final File photoFile = helper.getFile(resultCode);

If you chose to add the ImageView widget to your layout, you can display the picture now by directly passing the bitmap to its setImageBitmap() method.

ImageView preview = findViewById(R.id.preview);
preview.setImageBitmap(photo);

Step 3: Classify the Picture

To detect items in the picture, you must pass the image as an input to the classify() method of the VisualRecognition object. Before you do so, however, you must wrap it in a ClassifyImagesOptions object, which can be created using the ClassifyImagesOptions.Builder class.

The return value of the classify() method is a ServiceCall object, which supports both synchronous and asynchronous network requests. For now, let’s call its execute() method to make a synchronous request. Of course, because network operations are not allowed on the UI thread, you must remember to do so from a new thread.

AsyncTask.execute(new Runnable() {
    @Override
    public void run() {
        VisualClassification response =
                    vrClient.classify(
                            new ClassifyImagesOptions.Builder()
                                    .images(photoFile)
                                    .build()
                    ).execute();

        // More code here
    }
});

The classify() method is built to handle multiple pictures at once. Consequently, its response is a list of classification details. Because we are currently working with a single picture, we just need the first item of the list. Here’s how you can get it:

ImageClassification classification = 
                        response.getImages().get(0);
                        
VisualClassifier classifier = 
                        classification.getClassifiers().get(0);

The Visual Recognition service treats each item it has detected as a separate class of type VisualClassifier.VisualClass. By calling the getClasses() method, you can get a list of all the classes.

Each class has, among other details, a name and a confidence score associated with it. The following code shows you how to loop through the list of classes and display the names of only those whose scores are greater than 70% in the TextView widget.

final StringBuffer output = new StringBuffer();
for(VisualClassifier.VisualClass object: classifier.getClasses()) {
    if(object.getScore() > 0.7f)
        output.append("<")
              .append(object.getName())
              .append("> ");
}

runOnUiThread(new Runnable() {
    @Override
    public void run() {
        TextView detectedObjects = 
                findViewById(R.id.detected_objects);
        detectedObjects.setText(output);
    }
});

Note that the above code uses the runOnUiThread() method because the contents of the TextView widget can be updated only from the UI thread.

If you run the app now and take a picture, you will be able to see Watson’s image classification working.

5. Analyzing Faces

The Visual Recognition service has a dedicated method to process human faces. It can determine the age and gender of a person in any photograph. If the person’s famous, it can also name him or her.

Step 1: Define a Layout

Analyzing faces with the Visual Recognition service is not too different from classifying objects. So you are free to reuse the layout you created earlier. However, to introduce you to a few more features the service offers, I’m going to create a new layout, this one with a slightly different functionality.

This time, instead of taking pictures using the camera and passing them to the service, let’s directly pass an image URL to it. To allow the user to type in a URL and start the analysis, our layout will need an EditText widget and a Button widget. It will also need a TextView widget to display the results of the analysis.

I suggest you also add an ImageView widget to the layout so that the user can see the image the URL points to.

Step 2: Display the Image

Inside the on-click handler of the Button widget, you can call the getText() method of the EditText widget to determine the image URL the user typed in. Once you know the URL, you can simply pass it to Picasso’s load() and into() methods to download and display the image in the ImageView widget.

EditText imageURLInput = findViewById(R.id.image_url);
final String url = imageURLInput.getText().toString();

ImageView preview = findViewById(R.id.preview);
Picasso.with(this).load(url).into(preview);

Step 3: Run Face Analysis

To run face analysis on the URL, you must use the detectFaces() method of the VisualRecognition client. Just like the classify() method, this method too needs a VisualRecognitionOptions object as its input.

Because you already know how to use the execute() method to make synchronous requests, let’s now call the enqueue() method instead, which runs asynchronously and needs a callback. The following code shows you how:

vrClient.detectFaces(new VisualRecognitionOptions.Builder()
        .url(url)
        .build()
).enqueue(new ServiceCallback() {
    @Override
    public void onResponse(DetectedFaces response) {        
        // More code here
    }

    @Override
    public void onFailure(Exception e) {
    }
});

As you can see in the above code, inside the onResponse() method of the callback object, you have access to a DetectedFaces object, which contains a list of face analysis results. Because we used a single image as our input, we’ll be needing only the first item of the list. By calling its getFaces() method, you get a list of all the Face objects detected.

List faces = response.getImages().get(0).getFaces();

Each Face object has a gender and age range associated with it, which can be accessed by calling the getGender() and getAge() methods.

The getGender() method actually returns a Gender object. You must call its own getGender() method to get the gender as a string, which will either be “MALE” or “FEMALE”. Similarly, the getAge() method returns an Age object. By calling its getMin() and getMax() methods, you can determine the approximate age of the face in years.

The following code shows you how to loop through the list of Face objects, generate a string containing the genders and ages of all the faces, and display it in the TextView widget:

String output = "";
for(Face face:faces) {
    output += "<" + face.getGender().getGender() + ", " +
            face.getAge().getMin() + " - " +
            face.getAge().getMax() + " years old>n";
}

TextView personDetails = findViewById(R.id.person_details);
personDetails.setText(output);

Here’s a sample face analysis result:

Conclusion

The Watson Visual Recognition service makes it extremely easy for you to create apps that are smart and aware of their surroundings. In this tutorial, you learned how to use it with the Watson Java and Android SDKs to detect and analyze generic objects and faces.

To learn more about the service, you can refer to the official documentation.

And be sure to check out some of our other posts about machine learning here on Envato Tuts+!

Android Things

Android Things and Machine Learning

Paul Trebilcox-Ruiz
Android Things

Android Things: Creating a Cloud-Connected Doorman

Paul Trebilcox-Ruiz
Android SDK

How to Use Google Cloud Machine Learning Services for Android

Ashraff Hathibelagal
Android SDK

Create an Intelligent App With Google Cloud Speech and Natural Language APIs

Ashraff Hathibelagal
Android SDK

How to Use the Google Cloud Vision API in Android Apps

Ashraff Hathibelagal