Thursday, April 18, 2024

WTF: E-mail Validation

 E-mail validation is usually a hard task, but in our case, we had a simple regular expression that allowed us to accept a list of e-mails in a known format. Here is the regular expression that you could find in our code:

^$|^\s*[\w+.-]+@[a-zA-Z_-]+?\.[a-zA-Z]{2,3}(?:,[\w+.-]+@[a-zA-Z_-]+?\.[a-zA-Z]{2,3})*\s*$

Many things here, let's decompose:

  • ^$|: we accept an empty string
  • ^\s*: we ignore leading white space characters
  • [\w+.-]+: the user name part of the e-mail. We accept all words characters, plus sign, dot and dash. 
  • @: the at sign
  • [a-zA-Z_-]+?: the domain name, which can have any letter, dash and underscore. Note here the use of the +? pattern, which is very strange. I had to google it, it is the lazy expansion, which means take the minimum number of character needed to fulfill the pattern. Completely useless here since we are looking for a dot character afterward.
  • \. The dot character between the domain name and the extension
  • [a-zA-Z]{2,3}: the extension, which can be 2 or 3 letters (like .fr or .com)
  • (?:, ... )*: we repeat here the whole pattern to say that we can have any number of other e-mails separated by a comma. Note the strange use of the ?: pattern. I had to google that one too. This is the non capturing group, which means that it is a group that you can not retrieve later using group() functions. Useless here since we are not checking for capturing groups.
  • \s*$: we ignore all trailing space characters
A bit complicated, but still ok. But then, somebody complained that it is not supporting e-mails from our Japanese branch, which have extensions in the form of @domain.co.jp. So someone was set to the task, and came up with the following regular expression:
^$|^\s*[\w+.-]+@[a-zA-Z_-]+?\.[a-zA-Z]{2,3}(?:,[\w+.-]+@[a-zA-Z_-]+?\.[a-zA-Z]{2,3}\.[a-zA-Z]{2,3})*\s*$

The only difference with the previous one is that there is a new \.[a-zA-Z]{2,3} added within the parenthesis. Which mean that you can have japanese style e-mails, but only after the first e-mail of the list. Worse, you can only have japanese style e-mails from the second mail onward. I notified the person that commited the code, and he said that he will think about the problem. Of course, code went to prod...

So I decided to make a quick fix. I removed all the strange patterns, and set the following regular expression:

^$|^\s*[\w+.-]+@[a-zA-Z_-]+(\.[a-zA-Z]{2,3}){1,2}(,[\w+.-]+@[a-zA-Z_-]+(\.[a-zA-Z]{2,3}){1,2})*\s*$

The fix was made using the {1,2} pattern to say that we can have one or two extensions. Meanwhile, the guy who made the first change also started to make a fix. Small communication problem here, he didn't noticed that I already assigned the bug to myself. But the funny thing is that he had a fix on a branch that was never merged. It looked like this:

^$|^\s*[\w+.-]+@(?:domain)+?(\.[a-zA-Z]{2,3}|\.[a-zA-Z]{2,3}\.[a-zA-Z]{2,3})(?:,[\w+.-]+@(?:domain)+?(\.[a-zA-Z]{2,3}|\.[a-zA-Z]{2,3}\.[a-zA-Z]{2,3}))(?:,[\w+.-]+@(?:domain)+?(\.[a-zA-Z]{2,3}|\.[a-zA-Z]{2,3}\.[a-zA-Z]{2,3}))*\s*$

I don't even want to know if it is correct... 

Friday, March 15, 2024

AWS: Find Root Cause of Failure for CloudFormation Stacks

 When a CloudFormation stack fails, you have to scroll back trough the events to find the root cause of the failure. Recently, AWS even added a "Detect Root Cause" button to the Console to immediately scroll to the correct event. But how do you do it from a python script?

import boto3

def find_root_cause(stack_name):
    cf_client = boto3.client('cloudformation')

    next_values = "First Time"
    params = {
        "StackName": stack_name
    }
    root_cause = None

    while next_values:
        result = cf_client.describe_stack_events(**params)

        next_values = result.get("NextToken")
        params["NextToken"] = next_values

        for event in result["StackEvents"]:
            status = event.get("ResourceStatus", "")
            reason = event.get("ResourceStatusReason")

            # start of deployment
            if reason == "User Initiated":
                return root_cause
           
            if reason and "FAILED" in status:
                root_cause = reason

    return root_cause

You follow the same pattern as from the Console. You go back the events history, until you reach the oldest error message before the start of the deployment.

Sunday, March 3, 2024

JFileChooser and the Lost Folder Selection

This article was originally posted on JRoller on July 7, 2005

It might sound like an Indiana Jones movie title, but it is an interesting problem we came across. We have a third party product which at some point displays a JFileChooser, in which you must select a directory. In old Java 1.4, this dialog box was working properly. Now that we switched to brand new 5.0, when we select a folder and click on open, it does not come back with the folder as a selected value, but instead goes into the folder. The main difference in the behavior comes from the fact that when we selected a folder, its name was visible in the selected file textfield, and now it is not.

The colleague who had to solve the problem tried to execute the program by copying the 1.4 version of JFileChooser into the bootclasspath. It did not help, so I suggested him to try with the UI class instead. And oh suprise, it works as in the old days. So he started to compare the source code of both versions, and in the ListSelectionListener, he found an interesting difference. A property which was always true before is now set to false by default. So to solve the problem, he inserted the following line in the main method:

UIManager.put("FileChooser.usesSingleFilePane"new Boolean(true));

I wonder if these properties are documented somewhere. There seems to be so many of them...

I checked in my more recent version of Java. This parameter still exists, and still does not seem to be documented.

Wednesday, November 22, 2023

Dynamic Class Loading

 This article was originally posted on JRoller on June 30, 2005.

The other day, I wanted to write an Eclipse plugin (maybe more about that in a different post), in which I need to read a selected class file from the project I am working on and execute a method in it. Since I can not have my project in the classpath, I found out that the only solution is to have the class loaded dynamically. If there is a better solution in Eclipse, please somebody tell me.

Before starting to write my plugin, I decided to write a small test application, because I never used class loading before. So here is the class I want to load:

package hello;

public class HelloWorld
{
  public void run()
  {
    System.out.println ("Hello World!");
  }
}

To load it and execute the run method, you can then use the following lines of code:

        ClassLoader loader = new ClassLoader(getClass().getClassLoader())
        {
            public Class findClass(String name) {
                try
                {
                    String path = "C:\\mypath\\hello";
                    File file = new File(path, name + ".class");
                    RandomAccessFile raf = new RandomAccessFile(file, "r");
                    byte[] content = new byte[(int)file.length()];
                    raf.readFully(content);

                    return defineClass("hello." + name, content, 0, content.length);
                }
                catch (Exception e)
                {
                    e.printStackTrace();
                }
                
                return null;
            }
        };
        
        try
        {
            Class helloClass = loader.loadClass("HelloWorld");
            Object hello = helloClass.newInstance();
            Method m = helloClass.getMethod("run"new Class[0]);
            m.invoke(hello);
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }

 I did not try the code on more recent java, but since the whole Class Loader API was in the process of being removed, I guess there are other ways to perform this nowadays. I tried asking ChatGPT to produce this code, and the result is quite similar, except it was using the URLClassLoader object which handles reading the file content for us.

Wednesday, November 15, 2023

Python: ruamel.yaml lib has a problem handling comments

 In our project, we are using the ruamel.yaml library for handling reading/writing YAML files. The reason we are not using yaml basic lib from Python is that ruamel handles better yaml standard, keeps the comments and formatting, and always dumps the keys in the same order.

However, since version 0.18.3, we had some strange behavior in our file dump. Some newlines were removed from some files. I opened ticket #492, with the following code that replicates the problem:

import ruamel.yaml

y = ruamel.yaml.YAML()
with open("organizational_units.yaml", "r") as file:
    ou = y.load(file)

with open("organizational_units.yaml", "r") as file:
    content = y.load(file)

content["organizational_units"] = ou["organizational_units"]

with open("test.yaml", "w") as file:
    y.dump(content, file)
with open("test.yaml", "w") as file:
    y.dump(content, file)

with open("test.yaml", "r") as file:
    y.load(file)

It is of course an oversimplified version of what we are doing in our project. We are normally loading several YAML files and combine them into one big model. Then, when we need to save changes into one file, we first reload it into memory in order to retrieve the original comments at the beginning of the file before replacing the old content with the new one.

Then you can see that we are saving our file twice. In fact, we are really performing a first save into an in-memory string stream, before logging the content in the file (at least in debug mode). Then we are saving it. Again, this code here is a simplification just to display the problem.

The problem occurs on the second save. The first works fine. Using this file as an example:

# Organizational Unit Specification

organizational_units:

- Name: root
  Accounts:
  - FirstAccount
  - SecondAccount

After the second save, we have this result:

# Organizational Unit Specification

organizational_units: -
  Name: root
  Accounts:
  - FirstAccount
  - SecondAccount

Noticed the missing newlines?

The last line of the code is loading the resulting file, just to show that we can not read it back.

After opening the ticket, I got the answer (on the same day, nice reactivity!) that it is in fact the duplicate of ticket #410. The #410 is a bit different, because it duplicates the complete structure, while we are only replacing a part of it. So maybe that is why our code was still working. I think the part that broke it is coming from this fix: "fix issue with spurious newline on first item after comment + nested block sequence".

As the developer explains, the issue is coming from the way the library is storing comments internally. It seems that comments are stored in different places, with the same reference. And when they are dumped, to avoid saving them several times, there is some internal bookkeeping going on. When we replaced reference to the top key, we broke some comments reference.

As a workaround, I restored comments reference around the top key:

comments = content["organizational_units"].ca.comment
content["organizational_units"] = ou["organizational_units"]
content["organizational_units"].ca.comment = comments

Worked for me...

Friday, November 3, 2023

JComboBox Editor Listening

This article was posted originally on JRoller June 3, 2005

To listen to edition event in the editor component of a JComboBox:

((JTextComponent)comboBox.getEditor().getEditorComponent()).getDocument().addDocumentListener(listener);


Wednesday, October 25, 2023

Moto: Alias Issue when Creating S3 Access Point is Fixed

 Two days ago, I discovered a small bug in the moto library that I use to unit test my lambdas on AWS. I needed to create an S3 Access Point with boto3, and while retrieving the its alias, I had different results when using the return value of create_access_point or get_access_point.

So I wrote a small unit test:

from moto import mock_s3control
import boto3

@mock_s3control
def test_access_point_alias():
    client = boto3.client("s3control")

    alias_from_create = client.create_access_point(
        AccountId="123456789012",
        Name="my-access-point",
        Bucket="MyBucket",
    )["Alias"]

    alias_from_get = client.get_access_point(
        AccountId="123456789012",
        Name="my-access-point",
    )["Alias"]

    assert alias_from_create == alias_from_get

I create an S3 Access Point, and retrieve its alias in two ways: from the response of the create_access_point function, and from the get_access_point function. On my moto 4.2.6, this test fails.

So I opened an issue on the project's repository. It was fixed and closed on the same day. That's reactivity!