In a previous post, we explained how to build a Media Analysis Solution that extracts key details from media files without machine learning expertise. However, for media content in the cloud, there are many additional applications and business opportunities to explore.
In this blog post, we show how to increase the monetization potential of media content by using the breadth and depth of the AWS Cloud. Customers can benefit from this solution by automatically generating ad inventories for their new and historical content, therefore creating targeted strategies for monetizing content. Take a journey in building an AWS solution that helps you automatically insert ads in suitable positions within your content.
Before diving into the solution, the following is an introduction to the AWS technologies and standards used throughout this post.
AWS Elemental MediaConvert
AWS Elemental MediaConvert is a file-based video transcoding service with broadcast-grade features. It allows you to easily create video-on-demand (VOD) content for broadcast and multiscreen delivery at scale. MediaConvert is used in this article to transcode the original uploaded content and prepare it for streaming.
HLS (HTTP Live Streaming)
HLS is an HTTP adaptive bitrate streaming communications protocol. To host and stream HTTP Live Streams, you only need a conventional web server: media files are split into downloadable segments and organized in playlists (m3u8 files).
The user player downloads the playlist first and then proceeds to download segments as the user progresses through the media. Following is an example of m3u8 file.
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:11
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:11,
mediafile_00001.ts
#EXTINF:10,
mediafile_00002.ts
#EXTINF:11,
mediafile_00003.ts
#EXTINF:10,
mediafile_00004.ts
#EXTINF:11,
mediafile_00005.ts
...
HLS supports the H.264 video codec and audio in AAC, MP3, AC-3, or EC-3 encapsulated in MPEG-2 streams.
AWS Elemental MediaTailor
AWS Elemental MediaTailor is a content personalization and monetization service. It allows video providers to serve targeted ads to end users while maintaining broadcast quality-of-service in multiscreen video applications.
This post shows you how to use MediaTailor with automatically generated ads inventory.
VMAP and VAST
VMAP (Video Multiple Ad Playlist) is an XML template that content owners can use to specify the structure of an ads inventory. It determines when and how many ads should be displayed in a stream. MediaTailor uses VMAP files to inject ads into your content. Ads included in an ad inventory may be described by using the VAST (Video Ad Serving Template) specification.
Big Buck Bunny
Big Buck Bunny is a 2008 short, open-source film made by the Blender Institute. In this post, Big Buck Bunny is used to test the end-to-end processing. Big Buck Bunny is copyright of Blender Foundation and it’s distributed under Creative Commons License 3.0.
FFmpeg
IMPORTANT LEGAL NOTICE: this solution uses FFmpeg to analyze the low-level visual features of the uploaded media files. FFmpeg is a free and open-source software suite for handling video, audio, and other multimedia files and streams. FFmpeg is distributed under the LGPL license. For more information about FFmpeg, please see the following link: https://www.ffmpeg.org/. Your use of the solution will cause you to use FFmpeg. If you do not want use of FFmpeg, do not use the solution.
Prerequisites
To get the best out of this blog post, you need an AWS account.
Prior experience with AWS CloudFormation, Docker and Python is advised, but not required.
Solution overview
The idea behind the solution is that with Computer Vision, we can automate the process of finding suitable positions for ads insertion within media content.
There is a large number of ways a piece of content could be segmented so an ad is inserted in a non-intrusive or disruptive way. For example:
- visual features
- chapter and scene detection – by simply analyzing low-level visual descriptors such as Dominant Color or Color Layout
- long sequences of black frames – these sequences are often used by content producers for ad insertion.
- audio features
- silence detection – so as not to place ads midsentence
- volume or frequency distribution changes – often semantic changes in content is accompanied by a change in dynamics, rhythm, or tone of the soundtrack.
- signature detection – often new scenes in a video are introduced by the same jingle or musical passage
- semantic features
- high variation of sentiment – it’s common for ads to be placed in positions that generate the so-called cliffhanger effect. This is often accompanied by a strong change in sentiment to get an even more dramatic effect. The sentiment gradient can be obtained by either exploiting visual features (facial expressions, for example) or by transcribing and analyzing the dialogue.
- face and celebrity detection – insert an ad right after or right before a celebrity exits the scene
- semantic analysis via AI/ML – using Amazon ML and AI services like Amazon Transcribe, Amazon Comprehend and Amazon Rekognition
- it is important to mention that semantics could also inform the ADS (Ads Decision Server) about what product to advertise. This could be also used to blacklist ads that might be considered inappropriate, illegal, or offensive in a particular context.
The presented solution focuses on detecting long sequences of black frames.
Solution architecture
Process
This solution is aimed at automating the ingestion and analysis of media files to produce ad-featured streams.
The input is a source media file – an mp4 file – and the output of the solution is an ad-featured HLS stream delivered via AWS Elemental MediaTailor.
Ingestion
The ingestion process begins with the upload of a media file to an Amazon S3 bucket. Whenever a media source file is uploaded, a Lambda function creates a transcoding job in AWS Elemental MediaConvert. This triggers a custom analysis task on Amazon Elastic Container Service at Fargate, and stores the media file metadata in a DynamoDB table.
Transcode
In this step, the source media file is encoded and split into segments. The segments and the HLS playlist file are stored on the destination Amazon S3 bucket and are ready for streaming. In the next step, the source media file is analyzed to find suitable positions for ads by using FFmpeg for black frames detection.
The MediaConvert transcoding job is put on a custom queue deployed with the following AWS CloudFormation snippet.
cloudformation/fanout-lambda.yml, lines 164 through 169
MediaConvertQueue:
Type: AWS::MediaConvert::Queue
Properties:
Description: !Sub 'Media Convert Queue - ${SolutionName} (${EnvironmentSuffix})'
Name: !Sub ${EnvironmentSuffix}-${SolutionName}-queue
Status: PAUSED
The Status
property in this deployment it is set to PAUSED
by default, which means that the queue accepts new jobs, but it doesn’t start the processing.
Once you are ready to enable the processing, log in to the MediaConvert console. Then select Queues from the menu on the left and select the new queue.
Tap on the Edit queue button.
From the Status drop down, select Active, then Save queue
Alternatively, you can set the Status property to Active in the CloudFormation template and redeploy the stack.
The following snippet, from the fan-out Lambda, uses the MediaConvert SDK to enqueue a job on the MediaConvert queue. It uses a job JSON template you can configure and download from the MediaConvert console.
fanout-lambda.py, lines 97 through 107
print("**********MEDIA CONVERT REQUEST**********")
print(json.dumps(job_body, indent=2))
response = mediaconvert.create_job(
Queue=job_body["Queue"],
AccelerationSettings=job_body["AccelerationSettings"],
Role=job_body["Role"],
Settings=job_body["Settings"],
StatusUpdateInterval=job_body["StatusUpdateInterval"],
Priority=job_body["Priority"],
)
Analysis
To build this sample solution, the scope of the analysis is narrowed down to a simple use case: finding long sequences of black frames.
As the analysis of a media file is generally a long-lived task, Amazon ECS on AWS Fargate is a better deploy candidate than Lambda. The timeout of a Lambda function may not be enough to allow the full analysis of the video and its file system might not be large enough to allow a full download.
A custom task (a Docker container with Python and FFmpeg installed) analyzes the uploaded media file for long segments of black frames (> 1s). The timestamps found in this way are then used to compile a VMAP ad inventory.
The following snippet runs FFmpeg black frames detection in the Fargate task and generates the VMAP XML file.
The FFmpeg command is invoked in a subprocess, capturing its output and parsing it with the ad hoc function build_manifest.
IMPORTANT LEGAL NOTICE: By using this snippet, the solution makes use of FFmpeg to analyze the low-level visual features of the uploaded media files. FFmpeg is a free and open-source software suite for handling video, audio, and other multimedia files and streams. FFmpeg is distributed under the LGPL license. For more information about FFmpeg, please see the following link: https://www.ffmpeg.org/. Your use of the solution will cause you to use FFmpeg. If you do not want to use FFmpeg, do not use the solution.
tasks/black-frames/task/task.py, lines 33 through 59
# 2. download the media file to the local filesystem
with open(TASK_FILE_NAME, "wb") as fp:
s3.download_fileobj(INPUT_MEDIA_BUCKET, INPUT_MEDIA_KEY, fp)
FFMPEG_COMMAND = [
"ffmpeg",
"-i",
TASK_FILE_NAME,
"-vf",
f"blackdetect=d={FFMPEG_BLACK_DURATION}:pix_th={FFMPEG_BLACK_THRESHOLD}",
"-an",
"-f",
"null",
"-",
]
# 3. runs ffmpeg
p = subprocess.Popen(FFMPEG_COMMAND, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
# 4. builds the VMAP file from the output of FFMPEG
# using err instead of out because ffmpeg spits output on err instead of std
manifest = build_manifest(err.split("n"))
print(manifest)
More details on this Fargate task here.
Campaign generation
The Fargate task also uploads the VMAP file on an S3 bucket; a Lambda function is invoked to generate a campaign in MediaTailor. The resulting ad-featured stream URL is stored in DynamoDB.
VMAP file
This is a sample VMAP manifest generated by analyzing Big Buck Bunny.
On lines 3, 7, and 11 the VMAP file reports the suggested positions for ad breaks.
On lines 5, 9, and 13 the description of a single ad break is delegated to an external ADS (Ad Decision Server).
Distribution
The ad-featured stream is ready to be distributed. You can retrieve the playlist URL in the DynamoDB metadata table.
Before you can stream the playlist directly from the S3 bucket, you have to change the ACL of the playlists and the segments to public-read.
For a production environment, the recommendation is to keep playlists and segments private and distribute the content via Amazon CloudFront.
Reporting
AWS Elemental MediaTailor supports both client-side and server-side reporting. Further development of this solution allows for serverless reporting API built on the top of Amazon API Gateway and DynamoDB. This allows admins to monitor the campaigns and automatically generate reports with Amazon QuickSight.
Summary
In this post, we built a minimal ingestion and analysis pipeline to automatically insert ads in media content. AWS Elemental MediaConvert and MediaTailor make it possible to automatically prepare your content for monetization and move your media content to AWS opens your business to new opportunities.
Call to action!
Before exploring the code base, have a look at the final result of the ingestion of Big Buck Bunny here.
The code is hosted in this repository and you can find more information on how to deploy it to your account here.