Airflow — The Easy Way

“Running Airflow on AWS EC2 & RDS using docker-compose”

Hello Folks,

Image Source — Google

Lets start the year on the roll. Wishing you all a successful learning year.

I am Kunal Shah, AWS Certified Solutions Architect, helping clients to achieve optimal solutions on the Cloud. Cloud Enabler by choice, having 6+ Years of experience in the IT industry.

I love to talk about Cloud Technology, Digital Transformation, Analytics, DevOps, Operational efficiency, Cost Optimization, Cloud Networking & Security.

You can reach out to me @ www.linkedin.com/in/kunal-shah07

Abstract

For quick set up of Apache Airflow, we will deploy airflow using docker-compose and run it on AWS EC2 & RDS Instance.

Some of the readers reached out to me for more easy & development friendly playground for Airflow Setup on AWS.

Here I am with Airflow — The Easy Way

Table Of Contents

  • Introduction
  • Prerequisites
  • Architecture
  • AWS Infrastructure Provisioning
  • Airflow Provisioning
  • Environment Validation
  • Cleanup

Introduction -

Airflow — Please check my first blog

docker-compose — It is used to run multiple containers as a single service. For example, suppose you had an application which required NGNIX and MySQL, you could create one file which would start both the containers as a service without the need to start each one separately.

The docker-compose.yaml contains several service definitions:
airflow-scheduler — The scheduler monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete.
airflow-webserver — The webserver available at http://localhost:8080.
airflow-worker — The worker that executes the tasks given by the scheduler.
airflow-init — The initialization service.
flower — The flower app for monitoring the environment & available at http://localhost:5555.
redis — The redis — broker that forwards messages from scheduler to worker.

Some directories in the container are mounted, which means that their contents are synchronized between the services.

  • ./dags — you can put your DAG files here.
  • ./logs — contains logs from task execution and scheduler.
  • ./plugins — you can put your custom plugins here.

Prerequisites -

  • Must have access to an AWS account with the required roles or permissions. The below steps can be run from AWS EC2 Instance(Ubuntu) in the given AWS account with necessary access permissions.
  • AWS Services — Full Access to RDS, EC2, IAM, S3, VPC
  • Tools Dependencies — AWS CLI (V2), Cron, docker-compose

Architecture -

High Level — Airflow on EC2 & RDS Architecture

AWS Infrastructure Provisioning -

$ aws configure

AWS Access Key ID [None]: (Your Access Key)

AWS Secret Access Key [None]: (Your Secret Key)

Default region name [None]: (Your Region)

Default output format [None]: json

  • Install Ubuntu Desktop & XRDP for remote RDP.

# sudo apt-get update && sudo apt-get upgrade

# sudo apt install tasksel

# sudo tasksel install ubuntu-desktop

# reboot (You have to Log In Again to EC2 Instance & run the below command)

# sudo apt-get install xrdp

  • Now you can either change the user ubuntu password or create a new user.
  • This will be used for RDP authentication.
  • Install vim editor -> apt install vim
  • Install Cron -> apt install cron
  • (Optional) Install Google Chrome browser. Run below mentioned commands in the given order.

# wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb

# sudo apt install ./google-chrome-stable_current_amd64.deb

Airflow Provisioning -

  • Copy the docker-compose.yaml file on AWS EC2 Instance & update below parameters.

‘AIRFLOW__CORE__SQL_ALCHEMY_CONN’

‘AIRFLOW__CELERY__RESULT_BACKEND’

  • set the env variable -> echo -e “AIRFLOW_UID=50000\nAIRFLOW_GID=0” > .env
  • Create local folders on EC2 instance -> mkdir ./dags ./logs ./plugins
  • Install docker-compose ->

sudo curl -L “https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)” -o /usr/bin/docker-compose

  • Set Crontab to Sync s3 folder to EC2 local folder.

# crontab -e

# add below commands inside the editor.

# * * * * * /usr/local/bin/aws s3 sync s3://<S3 Folder Path> /root/dags/

# * * * * * /usr/local/bin/aws s3 sync s3://<S3 Folder Path> /root/plugins/

# Change s3 folder as per your environment bucket folder.

  • Start the Cron service -> service cron start
  • Deploy Airflow through docker-compose -> docker-compose up -d
  • Please verify container status using below commands from EC2 bash terminal

# docker ps

# docker-compose run airflow-worker airflow info

docker ps — output
  • To upload custom DAGs on Airflow Web UI -
  • We need to upload DAGs & plugins file in the respective created s3 bucket.

Environment Validation -

Airflow Web UI
  • Enter Credentials

username — airflow

password — airflow

  • After login Check the DAGs & start running it.
Example DAGs
  • As you trigger the DAG, Airflow will create pods to execute the code included in the DAG.
DAGs Running Status
  • Check RDS connections on AWS Console it will show current connections from Airflow docker.
  • Voilaaaa..!! Airflow is ready on AWS EC2 & RDS.
  • Pros- Easy, Fast, developer friendly setup
  • Cons- Not production ready, Performance issues, Slowness

Cleanup -

  • docker compose stop.
  • Delete the CloudFormation template of AWS EC2 & RDS.
  • Delete the S3 buckets created from console.

THANK YOU & FOLLOW FOR MORE..

I had fun deploying this setup & playing around AWS EC2, RDS & AIRFLOW.

Hope you guys like it & start playing around.

More things lined up around AWS Stay Tuned..

“Nothing is particularly hard if you break it down into small bits”

Image Source — Google

--

--

--

AWS | GCP | Cloud Enabler | Cloud Network & Security | CFT | Docker | K8s | Terraform | SysOps | Cricket | Life | Dance | Blog | Share |

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

CS373 Fall 2021, Week 12: Nitin Jain

WHY RUN A SHINYAPP IN DOCKER.

The most easiest SpringCloud tutorial ever | Chapter 3: Service Consumer (Feign) (Finchley Version)

Help Ethereum become more secure and decentralized and set up a node in 5 minutes!

Information Technology — Object Oriented Concepts & Programming : Overview of C++

Raspberry Pi #3: build a remotely controlled Media Center

Integrating Github , Jenkins and Docker (running a container in docker)

MAGIC — HackTheBox WriteUp

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
TheRawKnightt

TheRawKnightt

AWS | GCP | Cloud Enabler | Cloud Network & Security | CFT | Docker | K8s | Terraform | SysOps | Cricket | Life | Dance | Blog | Share |

More from Medium

How to load the CSV data to PostgreSQL via dbt (Data Build Tool)

Oracle DBMS Scheduler

How Machine Advertising visualises billions of events per day with ClickHouse

SAP — Logging onto SAP System