In my previous post I called out some of the features that AWS offer which I have found very useful and in this post I want to talk about one of them in particular – AWS Lambda.  In our AWS environments we make extensive use of Lambda for automating general infrastructure maintenance as well as handling various data processing jobs. Lambda is cost effective, simple to use and scalable. However, it does come with one potential drawback which I blatantly cheat my way around when needed, as I will explain.

AWS Lambda is the serverless computing offering from Amazon Web Services.  What does this mean? It means that you don’t need to provision a server instance on which you run your code – you simply deploy your Lambda functions and AWS handles all of the back end infrastructure. This is great for a few reasons;

  1. You don’t have to worry about the hardware. Running this code yourself would require at least one (or more depending on scale) EC2 instance to be created which you then have to manage. Lambda functions run on their own platform and they will scale up what is needed for the workload you throw at them.
  2. It’s very cheap, and often free – Lambda functions are billed only for the compute time you use (rounded only to the nearest 100ms) whereas EC2 is billed by the hour. If there are functions you need to run that only take a few seconds then that is a substantial saving. In addition, AWS provide a free tier for Lambda which gives you one million free requests a month and 400,000 GB-seconds of compute time. 
  3. It’s extremely easy to set up. Lambda functions can be written in Java, NodeJS or (my personal favourite) Python 2.7. Once your function is written and tested you simply deploy it into the Lambda environment and you’re up and running.

Lambda functions will respond to ‘events’ which can be timed schedules, or even events such as a file being placed in an S3 bucket.  Now, whilst I’m the first to admit that I’m certainly not pushing the capability as far as it can go it has still allowed us to set up some useful and interesting jobs to help maintain our environments. Some jobs that we use Lambda for are;

  • Automate the starting up and shutting down of our EC2 instances. Many servers only need to be up during the working day, and so we have Lambda functions which will run in the morning and the evening to start up/shutdown servers. This is all tag based, so we are able to control when machines are available simply by updating the correct tags for the instance.
  • Automated backups – every evening we take AMI backups of all servers, and this is done with Lambda functions which take the backups and then tag them with a validity period (such as a week) after which the backup is deleted.
  • Processing incoming data – often we need to reach out to external sites to pull data into our system. The majority of this is handled by Lambda which will pull the data from the correct site and push a copy into an S3 bucket. Any servers which need to consume the data can then take it from S3 without needing access to the external internet.
  • Receiving incoming data – one data source is sent to us via e-mail as an attachment. But instead of having to handle this manually the data is sent to a mailbox set up at AWS where we then have a Lambda function which reads in the e-mail, extracts the attachment and places a copy in an S3 bucket for it to be processed further.

This is all fairly routine stuff – but handing this all off to Lambda means we have a robust and cheap framework to do all our regular processing with. And in addition – Lambda is fully supported by CloudFormation so we are able to control the creation and deployment of our Lambda code directly from our desktops by updating the Lambda CloudFormation template. So scheduling changes / removing functions / adding functions can all be done by writing a few lines of code.

Lambda does have one drawback which can sometimes limit usefulness. Lambda functions can only run for a maximum of five minutes which is an AWS imposed limit that is not possible to change at this time. This means (for example) that large files are difficult to process with Lambda if they run the risk of taking more than five minutes to download/upload etc. I have at least one job where I need to download a file of a few hundred megabytes from an external SFTP site which can often take longer depending on the connection speed at that moment in time. But I still would really like to use Lambda to control the scheduling of the job so I have everything controlled from one place.

So what do I do? I blatantly cheat! I place the code on a server that I know will be running at the point I need my data, and then I call that code from Lambda. So my process looks like;

  • Lambda schedule fires
  • Lambda function logs in to my server, and executes the script to be run as a background job
  • Lambda function logs off the server
  • Lambda function terminates
  • Job completes on the external server

This gives me the flexibility of scheduling my function through Lambda but allows me to work around the five minute timeout because the main code does not run through Lambda. It is cheating, but as I have a server up already it doesn’t add cost and it works well.

Lambda gives you enormous flexibility to control your environments in a cost effective and very easy to use way. When used in conjunction with CloudFormation you have the ability to manage all of those tedious housekeeping/control jobs entirely by writing code on your desktop.  This has saved me significant amounts of time on existing environments and means I can re-use this for future implementations without having to constantly re-invent the wheel. If you are currently using AWS and not making use of Lambda (and CloudFormation) then you’re missing a big trick!

Martin Campbell

Data Wizard at Comet Global Consulting

Martin is a Data Wizard who has travelled all over the world practicing the dark arts of Database Marketing. By day you will find him providing solution support across our range of AWS and EMM clients in the UK. By night he continues his quest to become a fully-fledged Parseltongue and wannabe Data Scientist by finding interesting things to look at with the Python programming language. Martin read Physics at University College Oxford, which is probably the closest he’ll ever get to Hogwarts.