In a previous post, we explained how to build a Media Analysis Solution that extracts key details from media files without machine learning expertise. However, for media content in the cloud, there are many additional applications and business opportunities to explore.

In this blog post, we show how to increase the monetization potential of media content by using the breadth and depth of the AWS Cloud. Customers can benefit from this solution by automatically generating ad inventories for their new and historical content, therefore creating targeted strategies for monetizing content. Take a journey in building an AWS solution that helps you automatically insert ads in suitable positions within your content.

Before diving into the solution, the following is an introduction to the AWS technologies and standards used throughout this post.

AWS Elemental MediaConvert

AWS Elemental MediaConvert is a file-based video transcoding service with broadcast-grade features. It allows you to easily create video-on-demand (VOD) content for broadcast and multiscreen delivery at scale. MediaConvert is used in this article to transcode the original uploaded content and prepare it for streaming.

HLS (HTTP Live Streaming)

HLS is an HTTP adaptive bitrate streaming communications protocol. To host and stream HTTP Live Streams, you only need a conventional web server: media files are split into downloadable segments and organized in playlists (m3u8 files).

The user player downloads the playlist first and then proceeds to download segments as the user progresses through the media. Following is an example of m3u8 file.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:11
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:11,
mediafile_00001.ts
#EXTINF:10,
mediafile_00002.ts
#EXTINF:11,
mediafile_00003.ts
#EXTINF:10,
mediafile_00004.ts
#EXTINF:11,
mediafile_00005.ts
...

HLS supports the H.264 video codec and audio in AAC, MP3, AC-3, or EC-3 encapsulated in MPEG-2 streams.

AWS Elemental MediaTailor

AWS Elemental MediaTailor is a content personalization and monetization service. It allows video providers to serve targeted ads to end users while maintaining broadcast quality-of-service in multiscreen video applications.

This post shows you how to use MediaTailor with automatically generated ads inventory.

VMAP and VAST

VMAP (Video Multiple Ad Playlist) is an XML template that content owners can use to specify the structure of an ads inventory. It determines when and how many ads should be displayed in a stream. MediaTailor uses VMAP files to inject ads into your content. Ads included in an ad inventory may be described by using the VAST (Video Ad Serving Template) specification.

Big Buck Bunny

Big Buck Bunny is a 2008 short, open-source film made by the Blender Institute. In this post, Big Buck Bunny is used to test the end-to-end processing. Big Buck Bunny is copyright of Blender Foundation and it’s distributed under Creative Commons License 3.0.

Big Buck Bunny (c) 2008 Blender Foundation - Metadata of the sample video used in this example

Big Buck Bunny (c) 2008 Blender Foundation

FFmpeg

IMPORTANT LEGAL NOTICE: this solution uses FFmpeg to analyze the low-level visual features of the uploaded media files. FFmpeg is a free and open-source software suite for handling video, audio, and other multimedia files and streams. FFmpeg is distributed under the LGPL license. For more information about FFmpeg, please see the following link: https://www.ffmpeg.org/. Your use of the solution will cause you to use FFmpeg. If you do not want use of FFmpeg, do not use the solution.

Prerequisites

To get the best out of this blog post, you need an AWS account.

Prior experience with AWS CloudFormation, Docker and Python is advised, but not required.

Solution overview

The idea behind the solution is that with Computer Vision, we can automate the process of finding suitable positions for ads insertion within media content.

The media content is analyzed with various techniques to find suitable spots for ads insertion. The result is an ads-featured stream, ready to be presented to end users.

There is a large number of ways a piece of content could be segmented so an ad is inserted in a non-intrusive or disruptive way. For example:

  • visual features
    • chapter and scene detection – by simply analyzing low-level visual descriptors such as Dominant Color or Color Layout
    • long sequences of black frames – these sequences are often used by content producers for ad insertion.
  • audio features
    • silence detection – so as not to place ads midsentence
    • volume or frequency distribution changes – often semantic changes in content is accompanied by a change in dynamics, rhythm, or tone of the soundtrack.
    • signature detection – often new scenes in a video are introduced by the same jingle or musical passage
  • semantic features
    • high variation of sentiment – it’s common for ads to be placed in positions that generate the so-called cliffhanger effect. This is often accompanied by a strong change in sentiment to get an even more dramatic effect. The sentiment gradient can be obtained by either exploiting visual features (facial expressions, for example) or by transcribing and analyzing the dialogue.
    • face and celebrity detection – insert an ad right after or right before a celebrity exits the scene
    • semantic analysis via AI/ML – using Amazon ML and AI services like Amazon Transcribe, Amazon Comprehend and Amazon Rekognition
    • it is important to mention that semantics could also inform the ADS (Ads Decision Server) about what product to advertise. This could be also used to blacklist ads that might be considered inappropriate, illegal, or offensive in a particular context.

The presented solution focuses on detecting long sequences of black frames.

Solution architecture

Architecture of the presented solution. It uses Amazon S3 to store the source, the transcoded contents, and the VMAP files. Lambda functions are used to start a transcoding job and the analysis tasks on Amazon ECS. Amazon DynamoDB is used to store metadata. AWS Elemental MediaConvert is used to produce HLS playlists while MediaTailor operates the ads insertion based of the generated VMAP file.

Process

This solution is aimed at automating the ingestion and analysis of media files to produce ad-featured streams.

The input is a source media file – an mp4 file – and the output of the solution is an ad-featured HLS stream delivered via AWS Elemental MediaTailor.

the process to automate creation of VMAP files and ad-featured streams.

Ingestion

The ingestion process begins with the upload of a media file to an Amazon S3 bucket. Whenever a media source file is uploaded, a Lambda function creates a transcoding job in AWS Elemental MediaConvert. This triggers a custom analysis task on Amazon Elastic Container Service at Fargate, and stores the media file metadata in a DynamoDB table.

Transcode

In this step, the source media file is encoded and split into segments. The segments and the HLS playlist file are stored on the destination Amazon S3 bucket and are ready for streaming. In the next step, the source media file is analyzed to find suitable positions for ads by using FFmpeg for black frames detection.

The MediaConvert transcoding job is put on a custom queue deployed with the following AWS CloudFormation snippet.

cloudformation/fanout-lambda.yml, lines 164 through 169

MediaConvertQueue:
  Type: AWS::MediaConvert::Queue
  Properties:
    Description: !Sub 'Media Convert Queue - ${SolutionName} (${EnvironmentSuffix})'
    Name: !Sub ${EnvironmentSuffix}-${SolutionName}-queue
    Status: PAUSED

The Status property in this deployment it is set to PAUSED by default, which means that the queue accepts new jobs, but it doesn’t start the processing.

Once you are ready to enable the processing, log in to the MediaConvert console. Then select Queues from the menu on the left and select the new queue.

AWS Elemental MediaConvert queues

Tap on the Edit queue button.

Edit AWS Elemental MediaConvert queue.

From the Status drop down, select Active, then Save queue

Make the queue active.

Alternatively, you can set the Status property to Active in the CloudFormation template and redeploy the stack.

The following snippet, from the fan-out Lambda, uses the MediaConvert SDK to enqueue a job on the MediaConvert queue. It uses a job JSON template you can configure and download from the MediaConvert console.

fanout-lambda.py, lines 97 through 107

  print("**********MEDIA CONVERT REQUEST**********")
  print(json.dumps(job_body, indent=2))
  response = mediaconvert.create_job(
    Queue=job_body["Queue"],
    AccelerationSettings=job_body["AccelerationSettings"],
    Role=job_body["Role"],
    Settings=job_body["Settings"],
    StatusUpdateInterval=job_body["StatusUpdateInterval"],
    Priority=job_body["Priority"],
  )

Analysis

To build this sample solution, the scope of the analysis is narrowed down to a simple use case: finding long sequences of black frames.

Black frames detection

As the analysis of a media file is generally a long-lived task, Amazon ECS on AWS Fargate is a better deploy candidate than Lambda. The timeout of a Lambda function may not be enough to allow the full analysis of the video and its file system might not be large enough to allow a full download.

A custom task (a Docker container with Python and FFmpeg installed) analyzes the uploaded media file for long segments of black frames (> 1s). The timestamps found in this way are then used to compile a VMAP ad inventory.

The following snippet runs FFmpeg black frames detection in the Fargate task and generates the VMAP XML file.

The FFmpeg command is invoked in a subprocess, capturing its output and parsing it with the ad hoc function build_manifest.

IMPORTANT LEGAL NOTICE: By using this snippet, the solution makes use of FFmpeg to analyze the low-level visual features of the uploaded media files. FFmpeg is a free and open-source software suite for handling video, audio, and other multimedia files and streams. FFmpeg is distributed under the LGPL license. For more information about FFmpeg, please see the following link: https://www.ffmpeg.org/. Your use of the solution will cause you to use FFmpeg. If you do not want to use FFmpeg, do not use the solution.

tasks/black-frames/task/task.py, lines 33 through 59

# 2. download the media file to the local filesystem
  
with open(TASK_FILE_NAME, "wb") as fp:
   s3.download_fileobj(INPUT_MEDIA_BUCKET, INPUT_MEDIA_KEY, fp)
  
FFMPEG_COMMAND = [
   "ffmpeg",
   "-i",
   TASK_FILE_NAME,
   "-vf",
   f"blackdetect=d={FFMPEG_BLACK_DURATION}:pix_th={FFMPEG_BLACK_THRESHOLD}",
   "-an",
   "-f",
   "null",
   "-",
]
  
# 3. runs ffmpeg
  
p = subprocess.Popen(FFMPEG_COMMAND, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  
out, err = p.communicate()
  
# 4. builds the VMAP file from the output of FFMPEG
# using err instead of out because ffmpeg spits output on err instead of std
  
manifest = build_manifest(err.split("n"))
  
print(manifest)

More details on this Fargate task here.

Campaign generation

The Fargate task also uploads the VMAP file on an S3 bucket; a Lambda function is invoked to generate a campaign in MediaTailor. The resulting ad-featured stream URL is stored in DynamoDB.

VMAP file generated by processing Big Buck Bunny.

VMAP file

This is a sample VMAP manifest generated by analyzing Big Buck Bunny.

On lines 3, 7, and 11 the VMAP file reports the suggested positions for ad breaks.

On lines 5, 9, and 13 the description of a single ad break is delegated to an external ADS (Ad Decision Server).

Distribution

The ad-featured stream is ready to be distributed. You can retrieve the playlist URL in the DynamoDB metadata table.

the metadata generated by the processing pipeline stored in DynamoDB

Before you can stream the playlist directly from the S3 bucket, you have to change the ACL of the playlists and the segments to public-read.

ake the playlist and the segments public in S3.

For a production environment, the recommendation is to keep playlists and segments private and distribute the content via Amazon CloudFront.

Reporting

AWS Elemental MediaTailor supports both client-side and server-side reporting. Further development of this solution allows for serverless reporting API built on the top of Amazon API Gateway and DynamoDB. This allows admins to monitor the campaigns and automatically generate reports with Amazon QuickSight.

Summary

In this post, we built a minimal ingestion and analysis pipeline to automatically insert ads in media content. AWS Elemental MediaConvert and MediaTailor make it possible to automatically prepare your content for monetization and move your media content to AWS opens your business to new opportunities.

Call to action!

extract from the ads featured stream. Big Buck Bunny (c) 2008 - Blender Foundation

Big Buck Bunny (c) 2008 – Blender Foundation

Before exploring the code base, have a look at the final result of the ingestion of Big Buck Bunny here.

The code is hosted in this repository and you can find more information on how to deploy it to your account here.