Saturday, July 13, 2024

AWS: Physical Resource ID in Custom Resources

Originally, Custom Resources in Cloudformation were designed for wrapping AWS resources that are not yet supported by the Cloudformation service into a Lambda. However, we often use them for other purposes:

  • Retrieve information from other resources (like the data in terraform)
  • Trigger some actions
  • Implement some logic

In most of those cases, we do not care about resource deletion. But we are often surprised by calls from Cloudformation to delete the resource. The reason is the misunderstanding or the misuse of the Physical Resource ID. And the origin of this, for me, comes from a bad design choice on AWS part.

Let's have a look at the way the cfnresponse module is written. 

def send(event, context, responseStatus, responseData, physicalResourceId=None, noEcho=False, reason=None):
    responseUrl = event['ResponseURL']
    responseBody = {
        'Status' : responseStatus,
        'Reason' : reason or "See the details in CloudWatch Log Stream: {}".format(context.log_stream_name),
        'PhysicalResourceId' : physicalResourceId or context.log_stream_name,
        'StackId' : event['StackId'],
        'RequestId' : event['RequestId'],
        'LogicalResourceId' : event['LogicalResourceId'],
        'NoEcho' : noEcho,
        'Data' : responseData
    }

There are 2 bad choices:

  •  The Physical Resource ID parameter is optional. It makes you think that it is not important. That if you don't set it, some default behavior will handle it correctly for you. 
  • The default value is random. Even worse, it is not consistently random. It is set to the log stream name, that changes on each Lambda cold start.

That means that most of the time, your Physical Resource ID will change on each call, except if you trigger it several times in a row. And this change of Physical Resource ID is the one that triggers the call to the delete part of your Lambda.

You can imagine that your Resource behaves like an EC2. If you modify a tag, your instance will keep its ID. But if you change its VPC, a new instance will be created, with a new ID. In that case, the old instance must be deleted. You can consider the Physical Resource ID to be like the instance ID. You want to decide, based on which parameter was modified, if the old Resource must be kept or deleted. 

Which means that in most cases, you do not want your Physical Resource ID to change. So the default behavior is wrong. It will lead to:

  • Have your Lambda called for deletion for no reason.
  • Can cause accidental calls that you don't expect. We had the case when an S3 bucket was deleted in production because someone added a parameter to the Custom Resource.
  • Makes you write some useless code to avoid to call the Lambda when you don't expect it, like checking that your Cloudformation stack is really deleting the Resource. 

So the correct behavior is to always set the Physical Resource ID. And usually to a constant value:

cfnresponse.send(event, context, cfnresponse.SUCCESS, responseData, "ConstantPhysicalID")

What about the legacy code? Those old Custom Resources that already have a Physical Resource ID set to the log stream name? The good thing is that the previously set Physical Resource ID is sent to the Lambda in the event parameter. So you can simply set it back to its previous value:

physicalId = event["PhysicalResourceId"]

cfnresponse.send(event, context, cfnresponse.SUCCESS, responseData, physicalId)


No comments:

Post a Comment