Skip to main content

Deploying an Application to One of Vodafone's AWS Wavelength Zones

DevRel | 04 Oct 2024
12 minutes reading time

women with a laptop inside a data center

Introduction

As the next entry on the Multi-access Edge Computing (MEC) series (see previous blog post on how to create and access an AWS EC2 instance on AWS Wavelength), we will now focus on how to create and deploy an AI based YouTube video summariser to an instance and then how to access it. Just as a reminder, Vodafone is a unique partner to AWS in that it is the first one to offer commercially available AWS Wavelength deployment across multiple edge zones and countries (London - Manchester in the UK and Berlin - Dortmund - Munich in Germany) and AWS Regions (eu-west-2 and eu-central-1).

The steps below will guide the developer through the creation of a Python webapp where the user is prompted to enter the link for a YouTube video and the application will produce a synopsis for it. Therefore, it is expected from the reader some profficiency in the Python programming language.

And at the end of this tutorial, I will let you know how Vodafone is making it easier for developers to utilise different AWS Wavelength Zones (WLZ) to ensure that the end users are always having the best possible experience!

It should take around 20 minutes to complete all the steps on this guide.

Prerequisites

To complete this guide you will need:

  • A device on Vodafone's network (a mobile or a computer connected to a Vodafone 5G hotspot, for example).

    • Attention: currently, WLZs are only available in England and Germany.

    • The best way to experience the benefits of Vodafone's MEC in the UK is by using an IoT SIM card, but a regular Vodafone SIM card will work as well. Information on how to acquire an IoT SIM card can be found here.

  • An active AWS account.

  • AWS CLI installed and configured. Here is the official installation/update guide for AWS CLI.

  • WLZ are not automatically available and need to be opted in. This guide describes how to do that.

  • Python 3.11.

  • pip 22.3, the recommended pacakage management system for Python that allow developers to install and manage software pacakges written in Python.

  • Since the app will rely on the usage of AI, we will need NVIDIA drivers and software to make use of the EC2 instance's GPU. The guide at this link contains the necessary steps.

Warning

  1. Not all of the AWS resources utilised during this exercise qualify for AWS' Free-Tier, so be aware that there will be charges to your account. To reduce costs, it is strongly advised that any EC2 instances not being used are kept in a "STOPPED" state. Using two EC2 instance (a g4dn.2xlarge at a WLZ and a t2.micro at eu-west-2), one carrier IP and one VPC for an hour a day would generate an approximate cost of 39.48 USD per month. The choice to use a g4dn.2xlarge instance instead of the t3.medium used in the previous guide is because the large one is the only one with GPUs, which will be used for the AI based app. This estimation is based on certain assumptions and might vary depending on your own usage and needs. A summary of this estimate can be found here. You can lear about pricing assumptions here.

  2. EC2 instances deployed on WLZ have a few constraints. The main ones for this walkthrough are:

    1. UDP traffic between the EC2 instance and the internet is not allowed and only outbound and the response is allowed for TCP. Between the EC2 instance and a device on the carrier network, all traffic is allowed.

    2. Only the following instance types are supported:

      • t3.medium

      • t3.xlarge

      • r5.2xlarge

      • g4dn.2xlarge

    3. EBS volumes must be gp2.

A full list of considerations can be found on this link.

Setting Up

This walkthrough starts from the end of the last guide, meaning that the final architecture will remain unchanged:

final architecture containing 2 EC2 instances

Figure 1: final architecture

There are only two basic changes that need to be done from the original steps:

  1. Because the aim of this tutorial is to deploy an AI based application, the larger g4dn.2xlarge instance type will be used, as it is the only one with a GPU that can be deployed to an AWS Wavelength Zone.

  2. The instance will need a larger storage capacity (100GB versus the original 8GB) to account for the software that needs to be installed.

Basically, the command to launch the instance of the application server needs to be changed to this:

export REGION=eu-west-2
export AMI_ID_LINUX=$(aws ssm get-parameter \
    --region $REGION \
    --name /aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-x86_64 \
    --query "Parameter.Value" \
    --output text)
export KEY_NAME=VFWLTutorial
export APP_INST_ID=$(aws ec2 run-instances \
    --region $REGION \
    --instance-type g4dn.2xlarge \
    --network-interface '[{"DeviceIndex":0,"NetworkInterfaceId":"'$APP_ENI_ID'"}]' \
    --image-id $AMI_ID_LINUX \
    --key-name $KEY_NAME \
    --block-device-mappings '[{"DeviceName":"/dev/xvda","Ebs":{"Encrypted":false,"DeleteOnTermination":true,"VolumeSize":100,"VolumeType":"gp2"}}]' \
    --output text \
    --query 'Instances[0].InstanceId')

That took care of the AWS side of things. Now, to configure the instance, SSH into it as described in the previous blog post and install Python 3.11, pip22.3 and the necessary libraries:

sudo dnf install python3.11 python3.11-pip -y
pip3.11 install pytubefix youtube_transcript_api transformers flask torch whisper

The application

With the environment set up, write the application that will be deployed to the server located at one of Vodafone's AWS Wavelength Zones. AWS has a service capable of building, training and deploying machine learn models called Amazon SageMaker, however it is not natively supported in WLZs. There are workarounds, such as the one mentioned in this article, but for the purposes of this guide, we will not be using them.

The application described on the second paragraph takes a YouTube URL as input, gets the video's transcript and feed it through a text-to-text generation AI model to get a summary.

There are two options here:

  • Create the files locally and transfer them to the server; or
  • Login to the server and create the files there.

Either way, first you need to create a .py file, load the necessary libraries and add a function that will retrieve the transcript directly from YouTube:

from pytubefix import YouTube
from youtube_transcript_api import YouTubeTranscriptApi
from transformers import pipeline
from flask import Flask, render_template, request
import torch
import whisper

# given an URL, retrieve the transcript directly from YouTube
def get_yt_transcript(url):
    # place holder for the transcription
    text = ''
    # extract the video ID from the video in the URL
    vid_id = pytube.extract.video_id(url)
    # retrieve the transcript directly from YouTube and return it once finished
    temp = YouTubeTranscriptApi.get_transcript(vid_id)
    for t in temp: 
        text += t['text'] + ''
    return text

Sometimes, YouTube videos do not contain a transcription. The function below extracts the video's audio and feed it through an open source automatic speech recognition (ASR) neural net named whisper, running locally, to transcribe it. Add this function to that same file.

# given an URL, transcribe a YouTube video
def transcribe_yt_vid(url):
    # download YouTube video's audio
    save_dir = "./docs/youtube"
    yt = YouTube(str(url))
    try: 
        audio = yt.streams.filter(only_audio = True).first()
    except:
        return ''
    out_file = audio.download(filename = "audio.wav",
                              output_path = save_dir)    
    # loading the model
    # here device="cuda" informs the server to use the GPU
    mode = whisper.load_model("base", device="cuda")
    # performing the transcription via the AI
    result = model.transcribe(out_file)
 
    text = result['text']    
    # return the video's transcript
    return text

Putting the two functions together:

def transcribe_youtube_video(url):
    # creates a YouTube object
    yt = YouTube(str(url))
    # place holder for transcription
    text = ''
    # get the transcript from YouTube if available
    try: 
        text = get_yt_transcript(url)
    # if it fails...    
    except:
        print('Transcription not available, transcribing now ...')
        pass
    # transcribes the video if YouTube did not provide a transcription
    if text == '':
        text = transcribe_yt_vid(url)
    # returns the video title and the transcription
    return yt.title, text

Once the transcription is available, it will serve as input, together with the title and the maximum number of words for the summary.

Fort his part, the program will utilise Gemma Model, a lightweight text-generation open model from Google. This model will be running locally, using the instance's GPU and will receive as input a prompt containing the transcript and title of the video.

Here is the code for the function:

# given a video title, transcription and the minimum number of words, returns a summary for the video
def summarize_text(title, text, words):
    # this is the prompt
    # feel free to edit 
    prompt = """
    You are an intelligent AI assistant that is tasked to review the content of a video and provide a concise and accurate summary.\n
    You do not provide information that is not mentioned in the video. You only provide information that you are absolutely sure about.\n
    The following is the title and a transcript of a video:\n
    ----------------------- \n
    TITLE: 
    """ + title + """
    \n
    TRANSCRIPT:\n
    """ + text + """
    \n 
    ----------------------- \n
    Take these and distil them into a consolidated summary.\n
    SUMMARY:\n
    """
    
    # here, "device=0" informs the host to use the GPU
    summarizer = pipeline("text-generation", model="google/gemma-2b", device=0)
    summary = summarizer(prompt, do_sample=False, max_new_tokens=words, batch_size=1)
    
    # returns the summary
    return summary[0]["generated_text"]

Now those functions need to go into a web application, which is comprised of two parts: the Python code and a couple of HTML pages. We will use the Flask framework, which is lightweight and flexible.

First, we will create two simple HTML pages: in the same directory where the .py file is located, create a sub directory named "templates" and create two files in it: data.html and form.html. Here are their contents:

  • data.html
{% for key,value in form_data.items() %}
<h2>{{key}}</h2>
<p>{{value}}</p>
{% endfor %}
  • form.html
<!-- once the user press the "Summarise" button, the contents of the "Url" and "words" fields will be POSTED to the "data" route -->
<form action="/data" method="POST">
    <p>YouTube URL<input type="text" name="Url" /></p>
    <p>maximum number of words <input type="number" name="words" placeholder="200" /></p>
    <p><input type="submit" value="Summarise" /></p>
</form>

Back to the .py file, add the following between the imports and the functions previously created:

# initialise the Flask app
app = Flask(__name__)
# this is the landing page. It will render the “form.html” page defined earlier
@app.route('/')
def form():
    return render_template('form.html')
# when the user click on "Summarise", the information will be passed to the "data" route below:
@app.route('/data', methods = ['POST', 'GET'])
def data():
    # test if the user typed in "/data" directly into the address bar, instead of submitting the form
    if request.method == 'GET':
        return f"The URL /data is accessed directly. Try going to '/form' to submit form."
    if request.method == 'POST':
        # retrieve the URL and the maximum number of words for the summary
        url = request.form['Url']
        words = request.form['words']
        # in case no value was entered, set it to 200
        if words == '':
            words = 200
        # retrieve the title and the transcription of the YouTube video
        title, text = transcribe_youtube_video(url)
        # input the title, transcription and number of words into the function and retrieve the summary
        out = summarize_text(title, text, words)
        # render the 'data.html' file passing the summary to it
        return render_template('data.html', form_data = dict(Summary = out))

# the following functions have already been defined earlier on
def transcribe_youtube_video...
def get_yt_transcript...
def transcribe_yt_vid...
def summarize_text...

# the below will make sure that the app is externally accessible on the carrier IP address and at port 8181
app.run(host='0.0.0.0', port=8181)

The port mentioned above (8181) is the same one that was configured in the security group for the application server in the previous blog post. If you wish to use a different port, make suer that ti is configured in the security group as well.

Now, to run the app, execute the following command on the application server's terminal:

python3.11 ./<nameOfPyFile>.py

The following output should show up:

 * Serving Flask app 'yourAppName'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8181
 * Running on http://<internalIPAddress>:8181
Press CTRL+C to quit

Ensuring the device is connected to Vodafone's mobile network, open a browser and type in http://<carrierIpAddressFromEC2Instance>:8181. Make sure it is HTTP as HTTPS has not been configured!! The following content should be displayed:

web form containing a field for the YouTube video ULR and the maximum words for the summary

Figure 2: input form for the YouTube Summarise

Paste the URL of a YouTube video. Given the limitations of the large language model (LLM), the recommendation is to use videos around the 5 minutes mark with dialog or narration in them, as the app needs text content to work. Larger videos will generate longer transcripts which can be problematic when going through the LLM. Movie trailers tend to hit that sweet spot.

Following the click on the "Submit" button, a new page should load after a while containing a synopsis of the video.

Simple Edge Discovery API

Now that the application is up and running and users are flocking to it to get the much-needed summary of the latest movie trailer, there might be a need to ensure that they are having the best possible experience, both in terms of latency and security. As mentioned on previous posts, the combination of Vodafone and AWS Wavelength Zones can help developers ensure this, by allowing them to deploy their application on different instances in different WLZs.

For example, when building the front-end for the AI based YouTube video summariser discussed in this guide, you can embed the logic to call Vodafone's Simple Edge Discovery API at the client-side, so that you will know which WLZ you should point the end user to, ensuring they are always having the fastest and safest experience possible.

The code for this could look something like the following (clientID and clientSecret can be generated at the apps section of the Developer Marketplace portal):

# program to retrieve the name of the closest Edge Computing platform
import requests as requests
import json as json
# Define the token API URL and parameters
# Staging URL
token_url = "https://api.vf-dmp.engineering.vodafone.com/oauth2/v1/token"
token_headers = {
    'Authorization': 'Basic <clientID:clientSecret encoded in Base64 format>',
    'Content-Type': 'application/x-www-form-urlencoded',
} 
token_data = {
    'grant_type': 'client_credentials',
} 
# Call the token API and get the response 
token_response = requests.post(token_url, headers=token_headers, data=token_data) 
# Check if the token API call was successful
if token_response.status_code == 200:
    # Extract the access token from the response
    access_token = token_response.json().get("access_token")
    # Define the API endpoint URL and headers
    endpoint_url = https://api.vf-dmp.engineering.vodafone.com/eds/v0/mec-platforms
    endpoint_headers = {
        "IP-Address": f"<deviceIPAddress>",
        "Authorization": f"Bearer {access_token}"
    }    
    params = {
    'filter': 'closest',
    }
    # Call the API endpoint and get the response
    endpoint_response = requests.get(endpoint_url, params=params, headers=endpoint_headers)
    # Check if the API endpoint call was successful
    if endpoint_response.status_code == 200:
        # Print the response data
        print(endpoint_response.json()["Ern"])
    else:
        # Print the error message
        print(f"API endpoint call failed with status code {endpoint_response.status_code}")
else:
    # Print the error message
    print(f"Token API call failed with status code {token_response.status_code}")

Summary

In this tutorial, I guided you through how to deploy a genAI based Python web application to one of Vodafone's AWS Wavelength Zone. Additionally, it showed how to utilise Vodafone's Simple Edge Discovery API to check for the closest WLZ. Just as a reminder, "closest" is not necessarily in the geographical sense, but in terms of latency.

In our next post in this series, we will extend the architecture we are currently working on by adding an application load balancer to the WLZ.

To stay up to date with the latest on Vodafone’s APIs and the Developer Marketplace, make sure to register your interest in our form. You can also use the form to reach out to us if you are having any issues with the steps listed here.

Keep watching this space!

Vodafone Developer Portal

Discover, try, and purchase our APIs to start building your own apps