How Machine Learning Is Winning the Photographic Face Race
Several years ago, some photo applications started to include face recognition features. Broadly speaking, faces are generally consistent in appearance, and yet distinct enough from most surroundings to make them easily spotted by software algorithms.
From there, the software focused on identifying the faces as people, not just human shapes, giving you a way to find specific people in your photo library. Planning an anniversary party for your aunt and uncle, and want to create a display of them over the years? Using Adobe Lightroom, Apple Photos, Google Photos, or several other apps and services, it’s trivial to find in your library all the images the people are in.
We’re now deep into the next phase of these features, where software can automatically identify all sorts of objects and scenes. Using machine learning and a healthy amount of network processing power, applications can pick out which photos contain sunsets or snowy scenes. They can show you all the dogs in your library; or dogs on beaches; or dogs on beaches under stormy skies. Some can now identify your pooch, even amid groups of other pups.
Beyond the Library
That’s fine for searching your photos, but what percentage of time do people spend organizing compared to editing? Spoiler: It’s not high. (One reason I go on and on about organizing one’s library is that it pays dividends in situations when you really need it. Collecting those photos for an anniversary party can be a quick task with metadata intact; or, it can be a long, last-minute grind if you have to scroll through all your photos looking for the ones you want. But I digress…)
However, the technology isn’t being used just to find pictures of your friends. If the software can identify a face, it can identify a person.
More important, it can treat that area differently from other sections of a photo. Knowing more about what’s in an image affects how it’s edited, and this is where things are getting interesting.
For example, consider a scene where a person is standing in front of a detailed background. To add definition to that background or make it pop, one method is to increase the amount of clarity or structure in the image. That adjustment adds local contrast throughout the scene, which benefits the background, but is usually not flattering for the main subject, enhancing wrinkles and other skin details.
To get the best result, you need to work on the two areas separately, masking out the person so the clarity adjustment affects only the background. That involves creating selections, brushing the affected area, or adding a radial selection in Lightroom, or working with layer masks in Photoshop. In other words, it requires more time.
But if the software already knows which area is the face and can determine the rest of the visible body, that selection work is done for you.
The developer Skylum has teased a feature of its forthcoming Luminar 4 application, due later this year, that is smart about adding texture and clarity to an image. Called AI Structure, it automatically excludes the person’s face as you increase the strength of the control—which is a single slider. You expend two seconds of work to get the same result as 10 minutes or more in Lightroom or Photoshop.
These same concepts apply to other areas, like skies, which are often edited separately using gradient masks to bring out more texture or color.
So, going back to our face example, software can identify which areas in a photo contain faces, and can automatically generate masks to treat those areas differently from the rest of the photo.
The software can also identify the different sections of faces that we all share, and act upon those. It knows which blobs are eyes, can tell whether they’re open or closed, identify teeth in a smile, and so on. And with that knowledge, it allows you to make adjustments.
Photoshop and Photoshop Elements can distort faces, sometimes to absurd effect, but as a retouching tool they’re good for small adjustments. In the past, this was done using the general controls in the Liquify tool, which bend and warp pixels in broad fashions. Now, the Face Tool includes a column’s worth of sliders to control elements such as individual eye size, height, width, and tilt; nose height and width; and the shape of the mouth, from individual control of the lips to how broad the smile is.
Once again, when the software can identify these particular areas, it can take action on them. The application Portraiture, from Imagenomic, is geared entirely toward smoothing skin and adjusting skin color, which are tasks that traditionally require more work if doing them manually.
Faster than the Blink of a Corrected Eye
This type of subject recognition isn’t just happening on the editing side, of course. Camera manufacturers are using these technologies to help you when you’re shooting. Today’s cameras identify faces to track focus, even locking onto individual eyes with some models. And some software use predictive algorithms to anticipate the direction of action in a scene to maintain focus, or to stay locked on a moving person even if they turn their head and hide their face.
The forefront of these advancements is happening not in DSLRs or mirrorless cameras, but in smartphones. Apple’s Face ID feature, for example, uses facial recognition as a security measure to unlock an iPhone or iPad Pro, but the same cameras that identify the face also judge its depth in the scene and distance from the camera lens. That data is used to create a live mask to selectively blur the background and simulate a shallower depth of field.
And yet, that’s still surface-level technology. The latest flagship phones from the major manufacturers include night modes that boost the amount of visible light to illuminate dark scenes. Typically, that would be accomplished by leaving the shutter open longer, but that creates blurry results unless the camera is locked down on a tripod. So instead, the phones use a variety of technologies to capture several exposures at different speeds, then piece together fragments from each shot and blend them together to create a sharp, better-lit photo.
Even photos shot in daylight are often composites assembled by the phone’s silicon and bits. Apple’s Smart HDR feature automatically merges exposures to balance a scene’s lighting, resulting in photos where the foreground is discernable and the sky isn’t blown out to white.
Due to software and dedicated processors, all of this happens in the blink of an eye—in fact, on some cameras, the shutter can wait until it detects that both eyes are in fact open.
What started as a way to easily pick out one’s relatives from an assortment of photos has evolved into technology that’s used every time we tap the shutter button on our phones. And we’re still in the early days of machine–learning-assisted photography.