Export MySQL data to Amazon S3 using Lambda and Python

25 Mar, 2021

AWS documentation officially recommends exporting a MySQL database to Amazon S3 using the Data Pipeline service, but what if we want something a bit more dynamic? In this article, we will see how to backup a MySQL database and save it in an Amazon S3 bucket using a simple script written in Python. Also, we will use AWS Lambda to execute the Python script and AWS Event Rule to schedule the Lambda execution.

In this project we will:

create the Python script to export MySQL data to the S3 bucket;
containerize the Python script in a Docker image;
create the Elastic Container Registry image and push the Docker image;
create the CloudFormation stack template;
the complete CloudFormation template.

Create the Python script to export MySQL data to the S3 bucket

First, we need to write the Python script to export the MySQL data to the Amazon S3 bucket. To be fair, it’s better to specify that we could use any other programming language instead of Python, like, for example, NodeJS. Indeed, the only thing that we will do through the script is to execute the mysqldump command and upload the exported data to an Amazon S3 bucket, and this can be accomplished with pretty much every scripting language.

Also, because we are developers and lazy by definition, we want to make a reusable service. In the end, we will be able to back-up multiple MySQL databases into the same S3 bucket simply by generalizing small pieces of code.

That’s said, let’s go to the script. In just a few lines of code, we are running the mysqldump command to export the database and the AWS CLI to upload it to an Amazon S3 bucket. Also, note that every information is passed to the script using environment variables:

import subprocess
import os
import time
import json

S3_BUCKET = os.environ['S3_BUCKET']

timestamp = time.strftime('%Y-%m-%d-%I:%M')


def handler(event, context):
    print("Function started")

    DB_HOST = event['DB_HOST']
    DB_NAME = event['DB_NAME']
    DB_USER = event['DB_USER']
    DB_PASS = event['DB_PASS']

    print("%s %s ".format(DB_HOST, DB_NAME))

    command = "mysqldump -h %s -u %s -p%s %s | gzip -c | aws s3 cp - s3://%s/%s.gz" % (
        DB_HOST, DB_USER, DB_PASS, DB_NAME, S3_BUCKET, DB_NAME + "_" + timestamp)
    subprocess.Popen(command, shell=True).wait()
    print("MySQL backup finished")
    return "backup finished"

Basically, we are wrapping a bash command using, in this case, a Python subprocess. Also, to make the script more reusable we are doing the following:

we get the database host, name, user, and password from the event to allow us to invoke the function dynamically;
we retrieve the S3 bucket name from an environment variable to store every export in the same bucket.

Containerizing the script in a Docker image

Let’s now quickly wrap our simple script in a Docker image.

But first, we need some context: why are we using a Docker image? The answer is simple: in this case, we prefer to have full control of the OS and the environment using Docker because the function is highly related to the environment where is executed. Also, we need an environment where mysqldump and the AWS CLI are installed, and a Docker image is a more dynamic, customizable, and safe environment. In general, we don’t need to build a Docker image to work with AWS Lambda but this is the case.

That said, create in the same Python’s script folder a Dockerfile with the following lines:

ARG FUNCTION_DIR="/home/app/"

FROM python:slim-buster

ARG FUNCTION_DIR
RUN mkdir -p ${FUNCTION_DIR}
COPY *.py ${FUNCTION_DIR}

# Installing Lambda image dependencies
RUN apt-get update \
  && apt-get install -y \
  g++ \
  make \
  cmake \
  unzip \
  libcurl4-openssl-dev
RUN python3 -m pip install awslambdaric

# Installing AWS CLI to upload to S3 bucket
RUN  pip3 install \
  awscli

# Installing mysqldump and cleaning apt cache
RUN apt update && apt install -y mariadb-client && \
  apt-get clean autoclean && \
  apt-get autoremove --yes && \
  rm -rf /var/lib/{apt,dpkg,cache,log}/

WORKDIR ${FUNCTION_DIR}
ENTRYPOINT [ "/usr/local/bin/python3", "-m", "awslambdaric" ]
CMD [ "main.handler" ]

Note that the CMD command exposes “main.handler” where:

main: is the name of the Python file (“main.py”)
handler: is the name of the function inside the main file (def handler())

Create the Elastic Container Registry image

In order to make the script work, we need to create an Elastic Container Registry repository where we will push the Docker image to use in our Lambda function.

To create the repository, let’s search for ECR in the AWS console and click on “Elastic Container Service”:

now click on “Create Repository”, in “Visibility settings” select “Private” and give the repository a name of your choice:

Finally, click again on “Create repository”.

Once you create the repository, you can open the registry details by clicking on the repository name. Right now, there are no images inside it:

To push the image in our repository, click on “View push commands” to open the window with the instructions to follow:

Copy and paste the lines from the instruction to push the image to the Elastic Container Registry. If everything is correct, you’ll see the uploaded image on the dashboard like this:

Elastic Container Registry uploaded image

Click on “Copy URI” under the “latest” tag, we will need this in the next step!

Create the CloudFormation stack template

From now on, each AWS resource will be followed by its corresponding CloudFormation template.

The stack is has the following resources:

Lambda: the serverless function which will execute the Python script and export the MySQL database to the destination S3 bucket using mysqldump and AWS CLI;
S3: the bucket that will contain every backup generated by the Lamba functions;
SNS Topic: every time a new export is uploaded into the bucket, we will receive an email notification;
Lambda Role and Topic Policy: we will handle the permissions between Lambda, S3 bucket, and the SNS Topics;
Event Rules: one or more scheduled events to trigger the Lambda functions during the day, for one or more databases.

In the following sections, we will see how to create each resource in detail using CloudFormation. If you are not interested in the step-by-step explanation for each resource, you can jump to the end of the article where you find the complete CloudFormation template to fully load the stack.

That’s said, we can start writing the CloudFormation template. Create a cloudformation.yml file with the following lines of code:

AWSTemplateFormatVersion: "2010-09-09"
Description: MySQL backup to Amazon S3 with Python
Parameters:
  ECRImage:
    Type: String
    Default: xxx.dkr.ecr.REGION.amazonaws.com/IMAGE_NAME:VERSION
    Description: Elastic Container Registry image URL
  CronExpression:
    Type: String
    Default: cron(0 12 * * ? *)
    Description: Cron expression for MySQLdump event
Resources:
  # ...here we will create every stack resource

The parameters are variables in the CloudFormation template we can use as references to other resources. In particular:

ECRImage: paste in ECRImage.Default the link to the Elastic Container Registry latest image;
CronExpression: in CronExpression.Default insert the cron expression in the cron(field) format to specify when the Python script will be triggered during the day. For more info, take a look at the official AWS schedule expression documentation page.

Create Lambda function to export MySQL database to S3 Bucket

At this point, we need to create a Lambda function from the Elastic Container Registry image we have previously created. As we already said, the Lambda function will execute the Python script to connect and export the database and upload the backup to an Amazon S3.

Take a look at the following template:

Lambda:
  Type: AWS::Lambda::Function
  Properties:
    Code:
      ImageUri: !Ref ECRImage
    Description: Python Lambda Function to mysqldump
    Environment:
      Variables:
        S3_BUCKET: !Ref Bucket
    FunctionName: !Join ["-", [!Ref AWS::StackName, "lambda"]]
    MemorySize: 512
    PackageType: Image
    Timeout: 300
    Role: !GetAtt LambdaRole.Arn

Some of the values are references from other resources:

ECRImage: the Elastic Container Registry image link;
Bucket: the S3 bucket name is passed as an environment variable;
FunctionName: generated using the stack name, followed by -lambda suffix;
Role: the Lambda Role Arn that we will create in the next step.

Keep in mind that you can also customize some properties. In particular, note the Timeout: 300 and MemorySize: 512. You should adjust these values based on your needs; if you are planning to export a large amount of data and tables, you would probably set higher values.

The Lambda cost is based on the execution time and memory of the function. For more information about Lambda pricing, please take a look at the official AWS documentation.

Let’s now create the Lambda Role to give the function the privileges to PutObjects into the S3 bucket:

LambdaRole:
  Type: AWS::IAM::Role
  Properties:
    AssumeRolePolicyDocument:
      Statement:
        - Effect: Allow
          Principal:
            Service:
              - lambda.amazonaws.com
          Action:
            - sts:AssumeRole
    Policies:
      - PolicyName: S3Policy
        PolicyDocument:
          Statement:
            - Action:
                - s3:PutObject
              Effect: Allow
              Resource:
                - !Join ["", ["arn:aws:s3:::", !Ref Bucket, "/*"]]
    ManagedPolicyArns:
      - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

In particular, into the Policies, we create the “S3Policy” which allows the function to s3:PutObject into the S3 bucket.

Create the Amazon S3 Bucket to store the MySQL backups

Now we have to create the Amazon S3 bucket resource where the Python script will store the MySQL exports. To do so, we can simply add:

Bucket:
  Type: AWS::S3::Bucket
  DependsOn:
    - SNSTopicPolicy
  Properties:
    BucketName: !Join ["-", [!Ref AWS::StackName, "bucket"]]
    AccessControl: Private
    PublicAccessBlockConfiguration:
      BlockPublicAcls: True
      BlockPublicPolicy: True
      IgnorePublicAcls: True
      RestrictPublicBuckets: True
    LifecycleConfiguration:
      Rules:
        - Id: GlacierRule
          Status: Enabled
          Transitions:
            - TransitionInDays: "1"
              StorageClass: GLACIER
    NotificationConfiguration:
      TopicConfigurations:
        - Topic: !Ref Topic
          Event: s3:ObjectCreated:*

In particular, we are creating the S3 bucket with the following properties:

AccessControl and PublicAccessBlockConfiguration: we make the S3 bucket private;
LifecycleConfiguration: after 1 day, every object in the bucket will be transitioned in the Glacier storage, which is a more cost-efficient storage class when we are dealing with backups and with objects that are not frequently accessed;
NotificationConfiguration: the SNS Topic reference that will send us an email evey time an object is created (s3:ObjectCreated:*).

We create now a simple SNS Topic which will send us an email every time the Python script will upload an object into the S3 bucket. To do so, we just need to add:

Topic:
  Type: AWS::SNS::Topic
  Properties:
    DisplayName: !Join ["-", [!Ref AWS::StackName, "topic"]]
    TopicName: !Join ["-", [!Ref AWS::StackName, "topic"]]
    Subscription:
      - Protocol: email
        Endpoint: myemail@example.com

This is straight forward, in particular the following properties:

Subscription.Protocol: “email” is how the topic will notify us;
Subscription.Endpoint: is where it will notify us (insert here your email address).

The last step is to create the TopicPolicy to give the SNS Topic resource the permission to be notified by the AWS S3 bucket when the object is created:

SNSTopicPolicy:
  Type: AWS::SNS::TopicPolicy
  Properties:
    PolicyDocument:
      Statement:
        - Effect: Allow
          Principal:
            Service: s3.amazonaws.com
          Action: sns:Publish
          Resource: !Ref Topic
          Condition:
            ArnLike:
              aws:SourceArn:
                !Join ["", ["arn:aws:s3:::", !Ref AWS::StackName, "-bucket"]]
    Topics:
      - !Ref Topic

Event Rule to schedule the execution of the Lambda function

Finally, the last step is to create an Event Rule to trigger the Lambda function at a specific daily schedule.

We start adding:

Cron:
  Type: AWS::Events::Rule
  Properties:
    Description: MySQLdump Cron
    Name: !Join ["-", [!Ref AWS::StackName, "cron"]]
    ScheduleExpression: !Ref CronExpression
    State: ENABLED
    Targets:
      - Arn: !GetAtt Lambda.Arn
        Id: Lambda
        Input: >-
          {
            "DB_HOST": "host",
            "DB_NAME": "db",
            "DB_USER": "user",
            "DB_PASS": "example"
          }

In the Input property, we are defining the event that will be sent to the Lambda function in the form of valid JSON. To be clear, the Event Rule will trigger the Lambda function sending the event with the MySQL database credentials, and the Python script will be executed taking the credentials from that event and uploading the exported data into the S3 bucket.

Also, if we want to, we can create multiple event rules in order to schedule multiple MySQL exports. One thing to note is that we need to have unique Event Rule names, in particular the Cron.Properties.Name property must be different for each cron we define. This is an example:

Cron1:
  Type: AWS::Events::Rule
  Properties:
    Description: MySQLdump Cron
    Name: !Join ["-", [!Ref AWS::StackName, "cron-1"]]
    ScheduleExpression: !Ref CronExpression
    State: ENABLED
    Targets:
      - Arn: !GetAtt Lambda.Arn
        Id: Lambda
        Input: >-
          {
            "DB_HOST": "host",
            "DB_NAME": "first_db",
            "DB_USER": "user",
            "DB_PASS": "example"
          }
Cron2:
  Type: AWS::Events::Rule
  Properties:
    Description: MySQLdump Cron
    Name: !Join ["-", [!Ref AWS::StackName, "cron-2"]]
    ScheduleExpression: !Ref CronExpression
    State: ENABLED
    Targets:
      - Arn: !GetAtt Lambda.Arn
        Id: Lambda
        Input: >-
          {
            "DB_HOST": "host2",
            "DB_NAME": "second_db",
            "DB_USER": "user",
            "DB_PASS": "example"
          }

Deploy the CloudFormation stack

To deploy the CloudFormation stack, open a terminal window and go to the folder where the “cloudformation.yml” file is located. Once in the right directory, run the following command:

aws cloudformation deploy --template-file cloudformation.yml --stack-name YOUR_STACK_NAME --capabilities CAPABILITY_NAMED_IAM

replace YOUR_STACK_NAME with the name that you want to give to the stack. If you want to override some of the parameters that we set up in the stack, you simply need to use the --parameters-override argument, for example:

aws cloudformation deploy --template-file cloudformation.yml --stack-name YOUR_STACK_NAME --parameter-overrides ECRImage=xxxx.dkr.ecr.eu-west-1.amazonaws.com/ANOTHER_ECR_IMAGE:VERSION --capabilities CAPABILITY_NAMED_IAM

Keep in mind that the CloudFormation template will create the S3 bucket starting from the stack name. Try to give the stack a mostly unique name, so that also the S3 bucket will have a unique name because if the S3 bucket name already exists, you will not able to deploy the stack.

Once deployed the stack, we should see something similar in AWS CloudFormation:

CloudFormation stack for MySQL export to S3 with Python script

Complete CloudFormation template stack

As we promised, this is the complete CloudFormation template:

AWSTemplateFormatVersion: "2010-09-09"
Description: MySQL dump to Amazon S3 with Python
Parameters:
  ECRImage:
    Type: String
    Default: xxx.dkr.ecr.REGION.amazonaws.com/IMAGE_NAME:VERSION
    Description: Elastic Container Registry image URL
  CronExpression:
    Type: String
    Default: cron(0 12 * * ? *)
    Description: Cron expression for MySQLdump event
Resources:
  Lambda:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ImageUri: !Ref ECRImage
      Description: Python Lambda Function to mysqldump
      Environment:
        Variables:
          S3_BUCKET: !Ref Bucket
      FunctionName: !Join ["-", [!Ref AWS::StackName, "lambda"]]
      MemorySize: 512
      PackageType: Image
      Timeout: 300
      Role: !GetAtt LambdaRole.Arn
  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - sts:AssumeRole
      Policies:
        - PolicyName: S3Policy
          PolicyDocument:
            Statement:
              - Action:
                  - s3:PutObject
                Effect: Allow
                Resource:
                  - !Join ["", ["arn:aws:s3:::", !Ref Bucket, "/*"]]
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
  Bucket:
    Type: AWS::S3::Bucket
    DependsOn:
      - SNSTopicPolicy
    Properties:
      BucketName: !Join ["-", [!Ref AWS::StackName, "bucket"]]
      AccessControl: Private
      PublicAccessBlockConfiguration:
        BlockPublicAcls: True
        BlockPublicPolicy: True
        IgnorePublicAcls: True
        RestrictPublicBuckets: True
      LifecycleConfiguration:
        Rules:
          - Id: GlacierRule
            Status: Enabled
            Transitions:
              - TransitionInDays: "1"
                StorageClass: GLACIER
      NotificationConfiguration:
        TopicConfigurations:
          - Topic: !Ref Topic
            Event: s3:ObjectCreated:*
  Topic:
    Type: AWS::SNS::Topic
    Properties:
      DisplayName: !Join ["-", [!Ref AWS::StackName, "topic"]]
      TopicName: !Join ["-", [!Ref AWS::StackName, "topic"]]
      Subscription:
        - Protocol: email
          Endpoint: myemail@example.com
  SNSTopicPolicy:
    Type: AWS::SNS::TopicPolicy
    Properties:
      PolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: s3.amazonaws.com
            Action: sns:Publish
            Resource: !Ref Topic
            Condition:
              ArnLike:
                aws:SourceArn:
                  !Join ["", ["arn:aws:s3:::", !Ref AWS::StackName, "-bucket"]]
      Topics:
        - !Ref Topic
  Cron:
    Type: AWS::Events::Rule
    Properties:
      Description: MySQLdump Cron
      Name: !Join ["-", [!Ref AWS::StackName, "cron-1"]]
      ScheduleExpression: !Ref CronExpression
      State: ENABLED
      Targets:
        - Arn: !GetAtt Lambda.Arn
          Id: Lambda
          Input: >-
            {
              "DB_HOST": "host",
              "DB_NAME": "db",
              "DB_USER": "user",
              "DB_PASS": "example"
            }

If you like our post, please share it:

AWS CloudFormation Docker MySQL Python