Friday, December 30, 2022

Dict's get has a Default Value

 I found this pattern in several places in our Python code base:

state = "on"
if "State" in data and data["State"] == "off":
    state = "off"

The state here can have two values: on or off. So all this code can be pretty much replaced with this one line:

state = data.get("State", "on")

If the state is missing, it will get the value "on". Of course, you might say that it is not completely equivalent, since anything but "off" would be transformed into "on" in the first code. But then, this code is usually followed by this condition:

if state != "off":

I would have rather replaced the state variable with a boolean, and have a code like this one instead:

state_on = data.get("State") != "off"
if state_on:

If the state is missing, the get would return None, which is also different from "off".

Thursday, October 6, 2022

WTF: Enumeration Galore

 I found some Python code from a guy that used lots of enumeration. He probably copied the code from somewhere and did not understand it well. At the end, we have some pretty confusing code. For example, he wanted to add to a list of trusting accounts some accounts from another list, avoiding duplication. Here is his code:

for i, index in enumerate(account_list):
    if len([a for a in trusting_accounts if a == account_list[i]]) == 0:
        trusting_accounts += [account_list[i]]

The enumerate function returns both the index and the item in the list. However, the fact that he named them i and index shows me that he had no idea what the second part was about. He actually never used it, always referring to the index confusingly called 'i'.

The if condition is even more confusing. It triggers only if the list of accounts having the same name as the current account in the loop is empty. Simply said, it triggers if the account is not already in the list. The corrected version of the code is here:

for account in account_list:
    if account not in trusting_accounts:
        trusting_accounts.append(account)

Of course, this pattern was used in a lots of place in his code. The function for removing a list of accounts from the trusting accounts for instance looks like this:

for i, index in enumerate(account_list):
    for ii, iindex in enumerate(trusting_accounts):
        if account_list[i] == trusting_accounts[ii]:
            trusting_accounts.pop(ii)

There is an inner enumeration to find the index of the account to remove. He probably didn't know that there is a remove function in the list doing all the work. Here is the corrected code:

for account in account_list:
    if account in trusting_accounts:
        trusting_accounts.remove(account)

Much clearer...

Thursday, September 22, 2022

AWS: Unblock CloudFormation stacks from UPDATE_ROLLBACK_FAILED state

 In our project, we have lots of AWS accounts. And we are in charge of deploying base resources in each of them. To do this, we use CloudFormation to deploy several stacks. Actually several stacks nested into one master stack.

One problem we have is that our customers do not always update their stacks to the latest version. Also, when a stack fails to update, they sometimes let it rollback, and do not care to ask for a fix. Of course, this summer, there was even less updates, people being on vacation. And also no release, since we felt there would be nobody to deploy it. So we end up this September with a larger release than usual.

The result of all this is that we found ourselves staring at lots of accounts with a stack entering UPDATE_ROLLBACK_FAILED state. The reason? Deprecation. As it happened, AWS decided to deprecate Python 3.6, and also a couple of Policies (AWSConfigRole, AWSCloudTrailReadOnlyAccess). Of course we updated our stacks with the correct values for Python and the replacement policies some time ago. But as the stacks were not always up to date, and there were some issues while updating to the new release, many stacks started to rollback. And since some of the rollbacked values were deprecated, the rollbacks failed.

When you are in that case, you have two choices. The first one is to delete everything and redeploy. We tried it on one account, and it was really painful. Too many dependencies and manual actions. The second one is the one advised to us by the AWS support itself: continuing rolling back. When you continue a rollback, you have the possibilities to skip some resources. In our case, we needed to skip all lambdas using Python 3.6, and all roles using the deprecated policies.

We tried it manually in the AWS Console, and there is one caveat: you can select resources from nested stacks, but not resources from stacks nested into nested stacks. Since we have many accounts to update, and many resources to rollback, we decided to script the whole process.

We thought it will be simple: using Python and boto3, we list all the resources in our stack, recursively entering nested stacks, and filtering all lambdas and roles. We ran into several problems:

  • You cannot skip resources that are not in a failed stack
  • You cannot skip resources that are not in a failed state
  • You cannot skip resources that are in a failed state because CloudFormation cancelled the update
  • Once you run rollback with skipped resources, CloudFormation discovers new failing resources, so you have to iterate until all is fine, or the list of resources to skip does not change between two iterations.
  • Name of resources in nested stack are <nested_stack_name>.<resource_logical_id>. Even for resources in several level of nested stack, you still use the same pattern, giving only the name of the direct parent stack.
  • Waiting for a rollback to complete will throw an exception if the rollback fails.

This is the script that helps turning a stack from UPDATE_ROLLBACK_FAILED to UPDATE_ROLLBACK_COMPLETE:

import boto3

BLOCKABLE_RESOURCES = [
    "AWS::Lambda::Function",
    "AWS::IAM::Role",
]

STACK_NAME = "MyStack"


def get_stack_status(cf_client, stack_name):
    response = cf_client.describe_stacks(StackName=stack_name)
    return response["Stacks"][0]["StackStatus"]


def find_blocking_resources(cf_client, stack_name, parent, resources):
    response = cf_client.describe_stack_resources(StackName=stack_name)

    if (
        parent
        and get_stack_status(cf_client, stack_name) != "UPDATE_ROLLBAK_FAILED"
    ):
        return

    for resource in response["StackResources"]:
        if (
            resource["ResourceType"] == "AWS::CloudFormation::Stack"
            and resource["ResourceStatus"] == "UPDATE_FAILED"
        ):
            nested_name = resource["PhysicalResourceId"].split("/")[1]
            find_blocking_resources(
                cf_client, nested_name, nested_name + ".", resources
            )
        elif (
            resource["ResourceType"] in BLOCKABLE_RESOURCES
            and resource["ResourceStatus"] == "UPDATE_FAILED"
            and resource.get("ResourceStatusReason")
            != "Resource update cancelled"
        ):
            resource.append(parent + resource["LogicalResourceId"])


cf_client = boto3.client("cloudformation")

status = get_stack_status(cf_client, STACK_NAME)

if status != "UPDATE_ROLLBACK_FAILED":
    print("Nothing to unblock. Exiting")
    exit(0)

resources = []
previous_resources = ["DUMMY"]
waiter = cf_client.get_waiter("stack_rollback_complete")

print("Starting unblocking process")
while status != "UPDATE_ROLLBACK_FAILED" and previous_resources != resources:
    previous_resources = resources
    resources = []
    find_blocking_resources(cf_client, STACK_NAME, "", resources)
    print("Skipping ", resources)

    cf_client.continue_update_rollback(
        StackName=STACK_NAME, ResourcesToSkip=resources
    )
    try:
        waiter.wait(StackName=STACK_NAME)
    except Exception as err:
        print(err)

    status = get_stack_status(cf_client, STACK_NAME)

print("Final stack status:", status)

Of course, once this is done, you still have to fix the stacks and run an update.

Thursday, August 25, 2022

Modifying AWS resource for testing

 If you are writing unit tests using boto3 and moto, you might want to modify a value in a resource. I had this case, for instance, where I wanted to set the state of an AMI to pending for a test. Using moto, AMIs are always created in the available state. However, if you try to do it this way, it will fail:

ec2 = boto3.resource("ec2")
image = ec2.Image(AMI_ID)
image.state = "pending"

You will end up with this message:

AttributeError: can't set attribute 'state'

It is possible to modify the resource object. However, it will just modify this object instance, the model itself will not be updated. So if you use the resource as a parameter to a function, it will work. But if the function retrieves its data with boto3, it will get the original value.

Here is my solution:

image.meta.data.update({"State": "pending"})

That helped in my case. 

Monday, June 13, 2022

WTF: Case for a Join

 I found this code in our repository:

roles = ""
for i, id in enumerate(account_ids):
    if i > 0:
        roles = roles + ","
        if i == len(account_ids) - 1:
            roles = roles + "\"arn:aws:iam::*:role/" + entity_name + "-" + id + "-admin"
        else:
            roles = roles + "\"arn:aws:iam::*:role/" + entity_name + "-" + id + "-admin\""
    if i == 0:
        if i == len(account_ids) - 1:
            roles = roles + "arn:aws:iam::*:role/" + entity_name + "-" + id + "-admin"
        else:
            roles = roles + "arn:aws:iam::*:role/" + entity_name + "-" + id + "-admin\""

It takes some time to understand what is going on here, but basically, we are building a coma separated list of AWS roles from a list of account IDs.

When you see a condition based on the index of a for loop, you start to feel a very strong code smell.

First, the test for the index being bigger than zero, or equal to 0, is mainly made for handling the coma. But not only. There are then tests repeated, with almost similar codes, to check if we are on the last iteration, all this to discover if we have to add a " character at the beginning or the end of our string.

At the end, it produces a string in the form: role1","role2","role3

The reason that there is no quotation marks at the beginning or the end of the string, is that ultimately, it will be put in a template that is declared with the marks already there, in this form: "__ROLES__".

All this code is quite bad, and the coma and quotation marks handling can all be left to a call to the join function:

roles = '","'.join([f"arn:aws:iam::*:role/{entity_name}-{id}-admin"
    for id in account_ids])

From 13 lines to 1, I kind of like it.

Tuesday, June 7, 2022

AWS assume role one-liner

 A couple of months ago, I wanted to simplify the way I assume a role in AWS, an operation I perform several times a day. Usually, you would run a command like this one with the AWS CLI:

aws sts assume-role --role-arn $MY_ROLE_ARN --role-session-name test

It would return you a JSON document like this one:

{
    "AssumedRoleUser": {
        "AssumedRoleId": "AROA3XFRBF535PLBIFPI4:s3-access-example",
        "Arn": "arn:aws:sts::123456789012:assumed-role/xaccounts3access/s3-access-example"
    },
    "Credentials": {
        "SecretAccessKey": "9drTJvcXLB89EXAMPLELB8923FB892xMFI",
        "SessionToken": "AQoXdzELDDY//////////wEaoAK1wvxJY12r2IrDFT2IvAzTCn3zHoZ7YNtpiQLF0MqZye/qwjzP2iEXAMPLEbw/m3hsj8VBTkPORGvr9jM5sgP+w9IZWZnU+LWhmg+a5fDi2oTGUYcdg9uexQ4mtCHIHfi4citgqZTgco40Yqr4lIlo4V2b2Dyauk0eYFNebHtYlFVgAUj+7Indz3LU0aTWk1WKIjHmmMCIoTkyYp/k7kUG7moeEYKSitwQIi6Gjn+nyzM+PtoA3685ixzv0R7i5rjQi0YE0lf1oeie3bDiNHncmzosRM6SFiPzSvp6h/32xQuZsjcypmwsPSDtTPYcs0+YN/8BRi2/IcrxSpnWEXAMPLEXSDFTAQAM6Dl9zR0tXoybnlrZIwMLlMi1Kcgo5OytwU=",
        "Expiration": "2016-03-15T00:05:07Z",
        "AccessKeyId": "ASIAJEXAMPLEXEG2JICEA"
    }
}

And then you would need to export environment variables for setting the access key, secret key and session token.

export AWS_ACCESS_KEY_ID="ASIAJEXAMPLEXEG2JICEA"
export AWS_SECRET_ACCESS_KEY="9drTJvcXLB89EXAMPLELB8923FB892xMFI"
export AWS_SESSION_TOKEN="..."

So I looked for a simpler solution and I stumbled upon this StackOverfow question: AWS sts assume role in one command

Some suggestions use the very useful JQ utility which allows to retrieve information from JSON documents. But in case of AWS CLI commands, it is normally not necessary, since they all accept the --query option that supports JMESPath syntax. So, as one answer suggested, you can simply use the join built-in command to construct your export command, and let the shell evaluate it. Which is the solution I used for some months now:

eval $(aws sts assume-role \
 --role-arn $MY_ROLE_ARN \
 --role-session-name test \
 --query 'join(``, [`export AWS_ACCESS_KEY_ID=`,
 Credentials.AccessKeyId, ` ; export AWS_SECRET_ACCESS_KEY=`,
 Credentials.SecretAccessKey, `; export AWS_SESSION_TOKEN=`,
 Credentials.SessionToken])' \
 --output text)

This command has been really practical, so I decided to add an entry to my blog about it, so I will always remember where to look for it. So I returned to the StackOverflow site, and I found out that a simpler solution was suggested since. It uses the built-in printf shell function, that I had absolutely no idea existed:

export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \
$(aws sts assume-role \
--role-arn $MY_ROLE_ARN \
--role-session-name test \
--query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
--output text))

I guess you never stop learning.

Monday, May 16, 2022

WTF: Python Dict two liner

 When working with DynamoDB in AWS, you know that the values you put in there are not your usual JSON. Instead of you key/value pair, your rather have key/type/value. Something like this:

{
    'key': {
        'type': 'value'
    }
}

Of course, you can let your API build this mess for you. Or you can do it yourself, as I found in our code base. Except it had this little gem in it. For creating a map, you have to apply this key/type/value pattern to all values. And then you have to insert the map using the 'M' type. The code ended with these two lines:

data = dict({})
data['M'] = my_map

First, dict({}) is like creating en empty dict from an empty dict. You either use dict() or {}, but not both. But since you can create your dict inline, I'd rather use this one-liner:

data = {'M': my_map}

Does it look simpler only to me?

Thursday, April 21, 2022

WTF: Oops, already in my list

 I found an interesting loop pattern, used at several places in our code. It goes like this: 

  • I create an empty list
  • I start a loop to fill it
  • Inside my loop, I create an object and add it to my list
  • Still in the loop, I need to modify the object I just added, so I retrieve it from the list
An example here:
templates = []
index = 0

json_templates = json.loads(templates_as_str)
for template_file in json_templates:
    templates.add(get_template(template_file))

    template = templates[index]
    # Do some stuff on my template

    index += 1
So yes, I could have created my template reference before storing it in the list. But it's too late! I need to use an index now...

Wednesday, April 6, 2022

Python: Over formatting

In the Python project I work on, some colleague likes to use the string's format method. Sometimes a bit too much for my taste. For instance, I can often see this kind of code:

my_string = "{}-{}".format(part1, part2)

Usually, I prefer to use an f-string:

my_string = f"{part1}-{part2}"

But it could be a matter of taste. However, in some cases, I would prefer to use another approach.

For instance, in the case of the print method, there is already an existing pattern. I see often this kind of code:

print("The value is {}".format(value))

I replace it usually with this code:

print("The value is", value)

A bit more annoying is the use of format inside logging. I often see this code:

logger.info("The value is {}".format(value))

In case of logging, the pattern is here for a reason. If logging level is set to WARNING for instance, you want to avoid the string formatting, which takes some processing time. The preferred pattern is the following:

logger.info("The value is %s", value)

Finally, there are the cases where the use of format is completely insane. Here is an example:

my_string = "{}".format(value)

Maybe my knowledge of Python is too limited, but value being already a string, is there any difference with this code:

my_string = value


Friday, April 1, 2022

AWS: Read Timeout when Invoking Lambda

 I have an AWS Lambda that invokes another Lambda synchronously. Nothing was wrong with it until recently, when the other Lambda started performing more tasks and taking more time. Suddenly, the caller Lambda started failing with this weird message:

ReadTimeoutError: Read timeout on endpoint URL: "https://lambda.eu-central-1.amazonaws.com/2015-03-31/functions/MyOtherLambda/invocations"

Looking at the logs of my other Lambda, I could see that it ran fine, although it was executed several times. After some research on the Internet, I found this article from AWS support that explains that when invoking a Lambda synchronously, there is a default timeout of 60 seconds, and 3 retries. This can be configured when creating the client.

In Python using boto3 for instance, you have to use the Config object:

import boto3
from botocore.config import Config

lambda_client = boto3.client("lambda", config=Config(read_timeout=600))

Now, my invoked lambda has 10 minutes to perform its tasks.

Monday, March 7, 2022

WTF: retry by recursion

 Often, in an application, you'll want to retry some actions when they fail. If you start thinking: "why not use recursion?", stop right there. It's a bad idea. Filling the call stack is never a good idea. It might slow down your whole application. And using Stack Overflow errors to tell you that you should stop retrying is not so great.

Unfortunately for me, someone thought it would be a good idea to introduce this pattern into our production code. Almost everywhere, I find this type of code:

def myfunc(params, tries=0):
    sys.setrecursionlimit(500) #By default 1,000

    try:
        dosomestuff()
    except:
        if tries < sys.getrecursionlimit():
            return myfunc(params, tries+1)
        else:
            print("RECURSION LIMIT HIT for myfunc !")

Please use loops...

Saturday, February 19, 2022

WTF: Adding a dict to a list

 I found this strange way of adding a dict to a list in some production Python code:

result = [...]
dict_to_add = (
    "{\'ParameterKey\': \'"
    + param_key
    + "\', \'ParameterValue\': \'"
    + param_value
    + "\'}"
)
result.append(ast.literal_eval(dict_to_add))

As a start, those backslashes in the string are completely useless. My guess is that the guy used originally double quotes, which needed to be backslashed. Then he realized that it does not work, so he reverted to simple quotes, leaving the backslash characters.

But the big WTF is to use the ast library for transforming the string to a dict. Why would you do that in such a simple case? Why not simply build the dict directly:

result.append({'ParameterKey': param_key, 'ParameterValue': param_value})

That's a strange disease, I call it ast-ma.

Monday, February 7, 2022

WTF: Hair Splitting

 You know how sometimes, you find a very bad code, but it manages to teach you something? I just had this funny experience in a production code in python.

It happens that you have a full path to a file and you just need the filename. You do not need to be cross platform, or care for corner cases, so you just use the split method:

filename = filename.split('/')[-1]

You split around the slash characters, and you keep just the last part. Now I found this approach in production code:

boolean_slash_in_str = True
while boolean_slash_in_str:
    filename = filename.split('/', 1)[-1]
    if '/' not in filename:
        boolean_slash_in_str = False

It uses an ugly named and useless boolean value for the loop, but it also uses split with a second parameter I did not know about. So thanks to this code, I now know that you can specify the max number of splits. Here, it splits only around the first slash, keep the last part, and repeats until all slash characters are out.

Thanks for the lesson...