Thursday, April 21, 2022

WTF: Oops, already in my list

 I found an interesting loop pattern, used at several places in our code. It goes like this: 

  • I create an empty list
  • I start a loop to fill it
  • Inside my loop, I create an object and add it to my list
  • Still in the loop, I need to modify the object I just added, so I retrieve it from the list
An example here:
templates = []
index = 0

json_templates = json.loads(templates_as_str)
for template_file in json_templates:
    templates.add(get_template(template_file))

    template = templates[index]
    # Do some stuff on my template

    index += 1
So yes, I could have created my template reference before storing it in the list. But it's too late! I need to use an index now...

Wednesday, April 6, 2022

Python: Over formatting

In the Python project I work on, some colleague likes to use the string's format method. Sometimes a bit too much for my taste. For instance, I can often see this kind of code:

my_string = "{}-{}".format(part1, part2)

Usually, I prefer to use an f-string:

my_string = f"{part1}-{part2}"

But it could be a matter of taste. However, in some cases, I would prefer to use another approach.

For instance, in the case of the print method, there is already an existing pattern. I see often this kind of code:

print("The value is {}".format(value))

I replace it usually with this code:

print("The value is", value)

A bit more annoying is the use of format inside logging. I often see this code:

logger.info("The value is {}".format(value))

In case of logging, the pattern is here for a reason. If logging level is set to WARNING for instance, you want to avoid the string formatting, which takes some processing time. The preferred pattern is the following:

logger.info("The value is %s", value)

Finally, there are the cases where the use of format is completely insane. Here is an example:

my_string = "{}".format(value)

Maybe my knowledge of Python is too limited, but value being already a string, is there any difference with this code:

my_string = value


Friday, April 1, 2022

AWS: Read Timeout when Invoking Lambda

 I have an AWS Lambda that invokes another Lambda synchronously. Nothing was wrong with it until recently, when the other Lambda started performing more tasks and taking more time. Suddenly, the caller Lambda started failing with this weird message:

ReadTimeoutError: Read timeout on endpoint URL: "https://lambda.eu-central-1.amazonaws.com/2015-03-31/functions/MyOtherLambda/invocations"

Looking at the logs of my other Lambda, I could see that it ran fine, although it was executed several times. After some research on the Internet, I found this article from AWS support that explains that when invoking a Lambda synchronously, there is a default timeout of 60 seconds, and 3 retries. This can be configured when creating the client.

In Python using boto3 for instance, you have to use the Config object:

import boto3
from botocore.config import Config

lambda_client = boto3.client("lambda", config=Config(read_timeout=600))

Now, my invoked lambda has 10 minutes to perform its tasks.

Monday, March 7, 2022

WTF: retry by recursion

 Often, in an application, you'll want to retry some actions when they fail. If you start thinking: "why not use recursion?", stop right there. It's a bad idea. Filling the call stack is never a good idea. It might slow down your whole application. And using Stack Overflow errors to tell you that you should stop retrying is not so great.

Unfortunately for me, someone thought it would be a good idea to introduce this pattern into our production code. Almost everywhere, I find this type of code:

def myfunc(params, tries=0):
    sys.setrecursionlimit(500) #By default 1,000

    try:
        dosomestuff()
    except:
        if tries < sys.getrecursionlimit():
            return myfunc(params, tries+1)
        else:
            print("RECURSION LIMIT HIT for myfunc !")

Please use loops...

Saturday, February 19, 2022

WTF: Adding a dict to a list

 I found this strange way of adding a dict to a list in some production Python code:

result = [...]
dict_to_add = (
    "{\'ParameterKey\': \'"
    + param_key
    + "\', \'ParameterValue\': \'"
    + param_value
    + "\'}"
)
result.append(ast.literal_eval(dict_to_add))

As a start, those backslashes in the string are completely useless. My guess is that the guy used originally double quotes, which needed to be backslashed. Then he realized that it does not work, so he reverted to simple quotes, leaving the backslash characters.

But the big WTF is to use the ast library for transforming the string to a dict. Why would you do that in such a simple case? Why not simply build the dict directly:

result.append({'ParameterKey': param_key, 'ParameterValue': param_value})

That's a strange disease, I call it ast-ma.

Monday, February 7, 2022

WTF: Hair Splitting

 You know how sometimes, you find a very bad code, but it manages to teach you something? I just had this funny experience in a production code in python.

It happens that you have a full path to a file and you just need the filename. You do not need to be cross platform, or care for corner cases, so you just use the split method:

filename = filename.split('/')[-1]

You split around the slash characters, and you keep just the last part. Now I found this approach in production code:

boolean_slash_in_str = True
while boolean_slash_in_str:
    filename = filename.split('/', 1)[-1]
    if '/' not in filename:
        boolean_slash_in_str = False

It uses an ugly named and useless boolean value for the loop, but it also uses split with a second parameter I did not know about. So thanks to this code, I now know that you can specify the max number of splits. Here, it splits only around the first slash, keep the last part, and repeats until all slash characters are out.

Thanks for the lesson...

Wednesday, September 22, 2021

TLRU Cache in Python

Implementing an LRU cache in Python is quite easy, you just use the @lru_cache decorator from the functools library. For TLRU, where items have an expiry time, no standard exists. So here is my implementation:

from collections import OrderedDict
import time


class TLRU:
    def __init__(selffunc=Nonemaxsize=128ttl=120):
        self.func = func
        self.maxsize = maxsize
        self.ttl = ttl
        self.cache = OrderedDict()
        self.decorator_with_parameters = func == None

    def get_value(selfkey):
        result = self.cache.get(key)
        if not result:
            return None

        valueexpires_at = result

        if expires_at < time.time():
            del self.cache[key]
            return None

        self.cache.move_to_end(key)
        return value


    def put_value(selfkeyvalue):
        if len(self.cache) >= self.maxsize:
            self.cache.popitem(False)
        self.cache[key] = (valuetime.time() + self.ttl)

    def _call(self, *args):
        value = self.get_value(args)
        if value:
            return value

        value = self.func(*args)
        self.put_value(argsvalue)

        return value

    def __call__(self, *args):
        if self.decorator_with_parameters:
            self.func = args[0]
            return self._call

        return self._call(*args)

    def clear_cache(self):
        self.cache.clear()

Now for an explanation of the code, for those interested.

Our cache implementation is using the OrderedDict class from the collections lib. It behaves as a dict, but keeps the items sorted in the order of insertion. 

Here is the implementation of the value retrieval method:

    def get_value(selfkey):
        result = self.cache.get(key)
        if not result:
            return None

        valueexpires_at = result

        if expires_at < time.time():
            del self.cache[key]
            return None

        self.cache.move_to_end(key)
        return value

It first looks for the value in the cache. If it does not find it, it returns None immediately. Items in our cache are tuples packing both the value and the expiry time in seconds. So our second action is to check if the value expired. If yes, we remove it from the cache and answers back that we do not have it. If we do find that the value is fresh enough, we push it to the bottom of our OrderedDict using the move_to_end method.

Storing a value goes like this:

    def put_value(selfkeyvalue):
        if len(self.cache) >= self.maxsize:
            self.cache.popitem(False)
        self.cache[key] = (valuetime.time() + self.ttl)

We do two things here. First, if we reached the cache's maximum size, we remove the oldest value. The False parameter to the popitem method allows the OrderedDict to behave like a FIFO queue. Then, we compute the expiry date and pack it together with the value inside our cache.

Here is the method that wraps our cached function:

    def _call(self, *args):
        value = self.get_value(args)
        if value:
            return value

        value = self.func(*args)
        self.put_value(argsvalue)

        return value

It looks for our value in the cache. If we find it, we just return it. In the other case, we call the wrapped function, store the result into the cache before returning it.

Now for the trickier parts. First the constructor:

    def __init__(selffunc=Nonemaxsize=128ttl=120):
        self.func = func
        self.maxsize = maxsize
        self.ttl = ttl
        self.cache = OrderedDict()
        self.decorator_with_parameters = func == None

It doesn't look tricky, but you have to know that decorators have two ways of working, depending if you use it with parameters or not:

  • Decorators without parameters call the constructor with the wrapped function as the first parameter. 
  • Decorators with parameters call the constructor without parameters.

That is why the func parameter is optional. We detect which case is running by checking if func is None.

Now for the function called by the decorator pattern:

    def __call__(self, *args):
        if self.decorator_with_parameters:
            self.func = args[0]
            return self._call

        return self._call(*args)

Again, two cases:

  • Decorators without parameters call this function with the wrapped function's parameters, expecting the value as a result
  • Decorators with parameters call this function with the wrapped function's reference, expecting a reference to the wrapper as a result.

So  if we have parameters in our decorator, we store the wrapped function, which is our first method argument, into the func attribute, then return the wrapper reference. In the other case, we simply call the wrapper immediately.

You can use it like this:

@TLRU
def calc(vy):
    print("Calculating"vy)
    return v * y

Or with parameters:

@TLRU(maxsize=10000ttl=360)
def calc(vy):
    print("Calculating"vy)
    return v * y