Thursday, October 6, 2022

WTF: Enumeration Galore

 I found some Python code from a guy that used lots of enumeration. He probably copied the code from somewhere and did not understand it well. At the end, we have some pretty confusing code. For example, he wanted to add to a list of trusting accounts some accounts from another list, avoiding duplication. Here is his code:

for i, index in enumerate(account_list):
    if len([a for a in trusting_accounts if a == account_list[i]]) == 0:
        trusting_accounts += [account_list[i]]

The enumerate function returns both the index and the item in the list. However, the fact that he named them i and index shows me that he had no idea what the second part was about. He actually never used it, always referring to the index confusingly called 'i'.

The if condition is even more confusing. It triggers only if the list of accounts having the same name as the current account in the loop is empty. Simply said, it triggers if the account is not already in the list. The corrected version of the code is here:

for account in account_list:
    if account not in trusting_accounts:

Of course, this pattern was used in a lots of place in his code. The function for removing a list of accounts from the trusting accounts for instance looks like this:

for i, index in enumerate(account_list):
    for ii, iindex in enumerate(trusting_accounts):
        if account_list[i] == trusting_accounts[ii]:

There is an inner enumeration to find the index of the account to remove. He probably didn't know that there is a remove function in the list doing all the work. Here is the corrected code:

for account in account_list:
    if account in trusting_accounts:

Much clearer...

Thursday, September 22, 2022

AWS: Unblock CloudFormation stacks from UPDATE_ROLLBACK_FAILED state

 In our project, we have lots of AWS accounts. And we are in charge of deploying base resources in each of them. To do this, we use CloudFormation to deploy several stacks. Actually several stacks nested into one master stack.

One problem we have is that our customers do not always update their stacks to the latest version. Also, when a stack fails to update, they sometimes let it rollback, and do not care to ask for a fix. Of course, this summer, there was even less updates, people being on vacation. And also no release, since we felt there would be nobody to deploy it. So we end up this September with a larger release than usual.

The result of all this is that we found ourselves staring at lots of accounts with a stack entering UPDATE_ROLLBACK_FAILED state. The reason? Deprecation. As it happened, AWS decided to deprecate Python 3.6, and also a couple of Policies (AWSConfigRole, AWSCloudTrailReadOnlyAccess). Of course we updated our stacks with the correct values for Python and the replacement policies some time ago. But as the stacks were not always up to date, and there were some issues while updating to the new release, many stacks started to rollback. And since some of the rollbacked values were deprecated, the rollbacks failed.

When you are in that case, you have two choices. The first one is to delete everything and redeploy. We tried it on one account, and it was really painful. Too many dependencies and manual actions. The second one is the one advised to us by the AWS support itself: continuing rolling back. When you continue a rollback, you have the possibilities to skip some resources. In our case, we needed to skip all lambdas using Python 3.6, and all roles using the deprecated policies.

We tried it manually in the AWS Console, and there is one caveat: you can select resources from nested stacks, but not resources from stacks nested into nested stacks. Since we have many accounts to update, and many resources to rollback, we decided to script the whole process.

We thought it will be simple: using Python and boto3, we list all the resources in our stack, recursively entering nested stacks, and filtering all lambdas and roles. We ran into several problems:

  • You cannot skip resources that are not in a failed stack
  • You cannot skip resources that are not in a failed state
  • You cannot skip resources that are in a failed state because CloudFormation cancelled the update
  • Once you run rollback with skipped resources, CloudFormation discovers new failing resources, so you have to iterate until all is fine, or the list of resources to skip does not change between two iterations.
  • Name of resources in nested stack are <nested_stack_name>.<resource_logical_id>. Even for resources in several level of nested stack, you still use the same pattern, giving only the name of the direct parent stack.
  • Waiting for a rollback to complete will throw an exception if the rollback fails.

This is the script that helps turning a stack from UPDATE_ROLLBACK_FAILED to UPDATE_ROLLBACK_COMPLETE:

import boto3


STACK_NAME = "MyStack"

def get_stack_status(cf_client, stack_name):
    response = cf_client.describe_stacks(StackName=stack_name)
    return response["Stacks"][0]["StackStatus"]

def find_blocking_resources(cf_client, stack_name, parent, resources):
    response = cf_client.describe_stack_resources(StackName=stack_name)

    if (
        and get_stack_status(cf_client, stack_name) != "UPDATE_ROLLBAK_FAILED"

    for resource in response["StackResources"]:
        if (
            resource["ResourceType"] == "AWS::CloudFormation::Stack"
            and resource["ResourceStatus"] == "UPDATE_FAILED"
            nested_name = resource["PhysicalResourceId"].split("/")[1]
                cf_client, nested_name, nested_name + ".", resources
        elif (
            resource["ResourceType"] in BLOCKABLE_RESOURCES
            and resource["ResourceStatus"] == "UPDATE_FAILED"
            and resource.get("ResourceStatusReason")
            != "Resource update cancelled"
            resource.append(parent + resource["LogicalResourceId"])

cf_client = boto3.client("cloudformation")

status = get_stack_status(cf_client, STACK_NAME)

    print("Nothing to unblock. Exiting")

resources = []
previous_resources = ["DUMMY"]
waiter = cf_client.get_waiter("stack_rollback_complete")

print("Starting unblocking process")
while status != "UPDATE_ROLLBACK_FAILED" and previous_resources != resources:
    previous_resources = resources
    resources = []
    find_blocking_resources(cf_client, STACK_NAME, "", resources)
    print("Skipping ", resources)

        StackName=STACK_NAME, ResourcesToSkip=resources
    except Exception as err:

    status = get_stack_status(cf_client, STACK_NAME)

print("Final stack status:", status)

Of course, once this is done, you still have to fix the stacks and run an update.

Thursday, August 25, 2022

Modifying AWS resource for testing

 If you are writing unit tests using boto3 and moto, you might want to modify a value in a resource. I had this case, for instance, where I wanted to set the state of an AMI to pending for a test. Using moto, AMIs are always created in the available state. However, if you try to do it this way, it will fail:

ec2 = boto3.resource("ec2")
image = ec2.Image(AMI_ID)
image.state = "pending"

You will end up with this message:

AttributeError: can't set attribute 'state'

It is possible to modify the resource object. However, it will just modify this object instance, the model itself will not be updated. So if you use the resource as a parameter to a function, it will work. But if the function retrieves its data with boto3, it will get the original value.

Here is my solution:{"State": "pending"})

That helped in my case. 

Monday, June 13, 2022

WTF: Case for a Join

 I found this code in our repository:

roles = ""
for i, id in enumerate(account_ids):
    if i > 0:
        roles = roles + ","
        if i == len(account_ids) - 1:
            roles = roles + "\"arn:aws:iam::*:role/" + entity_name + "-" + id + "-admin"
            roles = roles + "\"arn:aws:iam::*:role/" + entity_name + "-" + id + "-admin\""
    if i == 0:
        if i == len(account_ids) - 1:
            roles = roles + "arn:aws:iam::*:role/" + entity_name + "-" + id + "-admin"
            roles = roles + "arn:aws:iam::*:role/" + entity_name + "-" + id + "-admin\""

It takes some time to understand what is going on here, but basically, we are building a coma separated list of AWS roles from a list of account IDs.

When you see a condition based on the index of a for loop, you start to feel a very strong code smell.

First, the test for the index being bigger than zero, or equal to 0, is mainly made for handling the coma. But not only. There are then tests repeated, with almost similar codes, to check if we are on the last iteration, all this to discover if we have to add a " character at the beginning or the end of our string.

At the end, it produces a string in the form: role1","role2","role3

The reason that there is no quotation marks at the beginning or the end of the string, is that ultimately, it will be put in a template that is declared with the marks already there, in this form: "__ROLES__".

All this code is quite bad, and the coma and quotation marks handling can all be left to a call to the join function:

roles = '","'.join([f"arn:aws:iam::*:role/{entity_name}-{id}-admin"
    for id in account_ids])

From 13 lines to 1, I kind of like it.

Tuesday, June 7, 2022

AWS assume role one-liner

 A couple of months ago, I wanted to simplify the way I assume a role in AWS, an operation I perform several times a day. Usually, you would run a command like this one with the AWS CLI:

aws sts assume-role --role-arn $MY_ROLE_ARN --role-session-name test

It would return you a JSON document like this one:

    "AssumedRoleUser": {
        "AssumedRoleId": "AROA3XFRBF535PLBIFPI4:s3-access-example",
        "Arn": "arn:aws:sts::123456789012:assumed-role/xaccounts3access/s3-access-example"
    "Credentials": {
        "SecretAccessKey": "9drTJvcXLB89EXAMPLELB8923FB892xMFI",
        "SessionToken": "AQoXdzELDDY//////////wEaoAK1wvxJY12r2IrDFT2IvAzTCn3zHoZ7YNtpiQLF0MqZye/qwjzP2iEXAMPLEbw/m3hsj8VBTkPORGvr9jM5sgP+w9IZWZnU+LWhmg+a5fDi2oTGUYcdg9uexQ4mtCHIHfi4citgqZTgco40Yqr4lIlo4V2b2Dyauk0eYFNebHtYlFVgAUj+7Indz3LU0aTWk1WKIjHmmMCIoTkyYp/k7kUG7moeEYKSitwQIi6Gjn+nyzM+PtoA3685ixzv0R7i5rjQi0YE0lf1oeie3bDiNHncmzosRM6SFiPzSvp6h/32xQuZsjcypmwsPSDtTPYcs0+YN/8BRi2/IcrxSpnWEXAMPLEXSDFTAQAM6Dl9zR0tXoybnlrZIwMLlMi1Kcgo5OytwU=",
        "Expiration": "2016-03-15T00:05:07Z",
        "AccessKeyId": "ASIAJEXAMPLEXEG2JICEA"

And then you would need to export environment variables for setting the access key, secret key and session token.

export AWS_SESSION_TOKEN="..."

So I looked for a simpler solution and I stumbled upon this StackOverfow question: AWS sts assume role in one command

Some suggestions use the very useful JQ utility which allows to retrieve information from JSON documents. But in case of AWS CLI commands, it is normally not necessary, since they all accept the --query option that supports JMESPath syntax. So, as one answer suggested, you can simply use the join built-in command to construct your export command, and let the shell evaluate it. Which is the solution I used for some months now:

eval $(aws sts assume-role \
 --role-arn $MY_ROLE_ARN \
 --role-session-name test \
 --query 'join(``, [`export AWS_ACCESS_KEY_ID=`,
 Credentials.AccessKeyId, ` ; export AWS_SECRET_ACCESS_KEY=`,
 Credentials.SecretAccessKey, `; export AWS_SESSION_TOKEN=`,
 Credentials.SessionToken])' \
 --output text)

This command has been really practical, so I decided to add an entry to my blog about it, so I will always remember where to look for it. So I returned to the StackOverflow site, and I found out that a simpler solution was suggested since. It uses the built-in printf shell function, that I had absolutely no idea existed:

$(aws sts assume-role \
--role-arn $MY_ROLE_ARN \
--role-session-name test \
--query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
--output text))

I guess you never stop learning.

Monday, May 16, 2022

WTF: Python Dict two liner

 When working with DynamoDB in AWS, you know that the values you put in there are not your usual JSON. Instead of you key/value pair, your rather have key/type/value. Something like this:

    'key': {
        'type': 'value'

Of course, you can let your API build this mess for you. Or you can do it yourself, as I found in our code base. Except it had this little gem in it. For creating a map, you have to apply this key/type/value pattern to all values. And then you have to insert the map using the 'M' type. The code ended with these two lines:

data = dict({})
data['M'] = my_map

First, dict({}) is like creating en empty dict from an empty dict. You either use dict() or {}, but not both. But since you can create your dict inline, I'd rather use this one-liner:

data = {'M': my_map}

Does it look simpler only to me?

Thursday, April 21, 2022

WTF: Oops, already in my list

 I found an interesting loop pattern, used at several places in our code. It goes like this: 

  • I create an empty list
  • I start a loop to fill it
  • Inside my loop, I create an object and add it to my list
  • Still in the loop, I need to modify the object I just added, so I retrieve it from the list
An example here:
templates = []
index = 0

json_templates = json.loads(templates_as_str)
for template_file in json_templates:

    template = templates[index]
    # Do some stuff on my template

    index += 1
So yes, I could have created my template reference before storing it in the list. But it's too late! I need to use an index now...