The other day, I tried to run an AWS Glue script from our Airflow instance. Nothing fancy, it would just convert a parquet file to CSV between two S3 buckets. Looking for an operator to use, I found that there is indeed a Glue Operator. It looks pretty easy to configure, so I tried it out:
After triggering the DAG, it would fail, telling me that I need to set the region name. Well, I thought I did!
It probably comes from the fact that recent boto3 versions require region to be set, while it was optional in the past. As airflow is open source, I decided to have a look at the code. As it turns out, the bug was not too hard to spot. The Glue Operator creates a Glue Hook, which declares this constructor:
The Hook derives from a base class, AwsBaseHook, that handles the common connection part for all AWS Hooks. The call to the constructor of the super class does not forward the region name. It should probably be called like this:
I opened a bug report. But in the meanwhile, I still needed my code to work. So I found quite an ugly patch. There is probably better, but since I could see that the boto3 session was created in the AwsBaseHook class, and since our Airflow instance is running from an EC2 and I can just inherit its profile, I made this simple workaround:
Hopefully, it won't stay here long...
No comments:
Post a Comment