Image Identification: Speed And Precision For Automated Video Production

Image Identification: Speed and Precision For Automated Video Production

By Paula Viana (INESC) 

Artificial Intelligence, Computer Vision, Context-Aware Applications: a set of keywords that are being used in several application areas. But what is their real impact? How can they improve the workflow for content creation, reduce costs and accelerate the delivery and richness of new content?

FotoInMotion is at the front end of a new age where, by integrating all these concepts, it will become easier and faster to create new videos, based on a deeper understanding of an image and related environments. 

The project’s initial results have been deployed and tested under different application scenarios: fashion, photojournalism and festivals, and they show that we can aim to fulfill the requirements of several content creators. 

Let’s see how!

Scenario 1: Photojournalism/Crowdsourcing

Independence Day is approaching, and FiM-TV would like to produce some small clips to be broadcasted online throughout the day to keep visitors on the site. A call for contributions is launched: users should use the FiM App, take photos and upload them in the repository. The only requirement would be that the American flag is seen in the photo independently of its position, background scenario, folded or unfolded, etc. 

Thousands of photos arrive at the FiM-TV, there is no time for manually annotating them or to have one of the editors working on the photo. Given this, the IT Director launches the FiM automatic image analysis module, the flags are detected and located in the images and this information is saved together with other contextual information coming from the FiM App and made available for the storytelling tools. 3D effects are then applied to the flags and animation is produced based on the template created for the Independence Day. In a few minutes, dozens of new clips are available to start populating the web site.

Scenario 2: Live Fashion Show

A set of images is taken at a fashion show for the winter collection of a brand producing jewelery. FiM tools enable the identifying of relevant objects like earrings and necklaces and localizing them in the photo. Additionally, because the photos were acquired using the FiM mobile App, GPS information and other contextual data is also collected and this information enables identifying the city where the show took place, as well as the name of the brand, given that the company had advertised the event on the web. The photos and metadata are uploaded into the FiM repository and, as the brand wants to quickly create a short video on the event, FiM peaks this information and launches its automatic video-creation module. Based on the templates customized for fashion/jewelery, the creative tools start by introducing zooming effects in each of the jewelery objects, navigating into the photo from the biggest to the smallest object. The video ends up with a full view of the photo and a caption that says FiM on Gold Winter Collection Show at the Luxembourg Museum of Arts.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.