Wednesday, December 12, 2018

PyMysql and the strange ubyte format error at connection

I didn't write much lately, mainly because I had to leave Java aside and turn to Python in my new work. However, interesting problems spring not only in Java, so this will probably become a not Java only blog.

The problem I met recently is linked to the pymysql library, that we use to connect to our SQL databases. Connection to the databases was always working fine, and then we started using dbdeployer to run a light SQL database for our unit tests. Running the tests on local machines would always work, but some tests would just fail on our Jenkins machine, with this strange error message:

struct.error: ubyte format requires 0 <= number <= 255
The good thing is that the python runner would just pinpoint the code in error:
data += struct.pack('B', len(connect_attrs)) + connect_attrs
The problem here is that we are trying to fit the length of a string into a byte. And the string is probably longer than 255 characters. A good thing with Pyhton being interpreted is that you have access to all the library's code. So let's have a look...
So what is this connect_attrs variable? It is a concatenation of keys and values from the _connect_attrs attribute of the Connection class. But this map is supposed to be quite small:
self._connect_attrs = { '_client_name': 'pymysql', '_pid': str(os.getpid()), '_client_version': VERSION_STRING, }
 Looking a little bit further shows an update to the map:
if program_name: self._connect_attrs["program_name"] = program_name elif sys.argv: self._connect_attrs["program_name"] = sys.argv[0]
If we do not define a program name, it retrieves it from sys.argv[0], which is the name of the file, including its complete path! That is why it failed sometimes in Jenkins, the workspace being quite deep under the root folder.
So the workaround becomes quite simple:
pymysql.connect( host=DB_HOST, port=DB_PORT, user=DB_USER, password=DB_PASSWORD, db=DB_NAME, program_name="my_program_name_that_fixes_the_bug" )

Monday, June 4, 2018

WTF: Modulo the String way

I found this old code in one of our application:
DecimalFormat df10 = new DecimalFormat("0000000000");
String idStr = df10.format(peer.execid);
idStr = idStr.substring(4);
exec.execId = Integer.parseInt(idStr);
It takes an int, format it into a String padded with zeroes on the left, then strip away the first 4 characters. Then of course we turn it back into an int.

Knowing that the id is always positive, this code is completely equivalent to that one:

exec.execId = peer.execId % 1000000;

Thursday, February 8, 2018

Yoda Pattern

In the previous part, I talked about the Medusa Pattern, where I limited the number of threads. Now I want to go even further: I want to remove all threads.
In fact, there is a particular case where threads are more of a nuisance. I am thinking about the small creature that likes the green color: the unit tests. As Yoda would say: “too much faith in threads you have”. When faced with multithreading in a unit test, you often have to revert to inserting sleep() commands, or performing some more complex tricks using wait() and notify() or equivalent.
But in reality, what you would really like to do is to get rid of the threads just for the testing. If you are using an Executor, it is not complicated to replace it with another one that executes the task in the current thread:
Executor forTest = new Executor() {
   public void execute(Runnable command) {
       command.run();
   }
};
Since we are using Java 8 (at least) and Executor is a functional interface, we can revert to use a lambda:
Executor forTest = command -> command.run;
And even better to use method reference:
Executor forTest = Runnable::run;

Sunday, February 4, 2018

Medusa Pattern

Last time, we saw the Star Trek Anti-Pattern, where the number of threads would go out of hands, to infinity and beyond.The easy solution to that is to limit the number of threads.

That is what I call the Medusa Pattern, from the name of that famous Gorgona that would freeze people by just looking at them. We are going to freeze the number of threads in a thread pool:

Executors.newFixedThreadPool(n);

Now comes the big question: what should be ‘n’? How many threads should I have in my pool? The usual accepted answer is: as many threads as the number of cores, plus one because one thread is often in waiting. In his book “Concurrency in Practice”, Brian Goetz answers this question with this formula:
t = c * (1 + w / s)


Where t is the number of threads, c is the number of cores in your machine, w is the waiting time, that is the average time your threads spend waiting, and s is the service time, that is the time your threads spend in average doing some useful work. You can see that if your threads are very busy, with almost zero waiting time, there should be as many threads as the number of cores, and we come close to the usual accepted rule of thumb.


On the other hand, if your threads spend a lot of time waiting, the value of t can be quite high. For instance, I work in a project where we have a monitoring application that monitors over 300 servers. Since monitoring threads spend their time waiting for a ping, we decided to have as many threads as servers.


We’ve seen cases with one thread, an infinity of threads, a fixed number of threads, but there is a special case where you do not want any thread. We’ll cover this in the Yoda Pattern. See you next time.

Saturday, January 27, 2018

Star Trek Anti-Pattern

In previous installments, we tried to limit the number of Threads to the minimum, even to one. But what happens if you go the opposite way?
startrek.jpg
Often, we try to parallelize some process, but do not want to pay too much attention to the number of Threads. We can try to spawn a new Thread each time we need one:

Executor myExecutor = Executors.newCachedThreadPool();

Often, things will work correctly at first. But as soon as the amount of data increases, you’ll get the following error:

OutOfMemoryError: unable to create new native thread

You probably know this story: support team has a problem on the tool, so they have a look at the logs. They see this message, so they call you to ask what is the parameter to increase your application’s memory. You tell them about Xmx, but 5 minutes later, they call you again to ask how to run the application in 64 bits. That’s when you get suspicious, and ask for the logs.

In fact, this message is quite misleading for the unawares. You have to know that Java reserves some memory for storing Thread stacks, and that is the memory spaces that got scarce. One way to solve it is by decreasing the Xss, the size of a Thread stack in memory, but the best way is still to avoid going to infinity and beyond, that is to avoid the Star Trek Anti-Pattern.

So how many threads should I have? We’ll talk about it in the Medusa Pattern.

Sunday, January 7, 2018

Zebra Pattern

Last time I explained the One Ring Pattern, where only one thread handles the data coming from a queue. I also explained the reasons why such a pattern might be preferred.Zebra.png


One of those reasons is when ordering is important. Some events must keep the order in which they arrive in the queue, so handling them with one thread is a must. But what If there are many events, and I would really like to handle them on several threads? And what if the ordering is not compulsory between all events, but only between some of them? Using again the example from last time with market prices, order must be kept between prices on the same instruments. However, prices coming for different instruments can be handled in any order.

That’s when the Zebra Pattern comes handy. Imagine that each stripe of the Zebra is a different thread, with its own queue. When a price arrives, you put it on the queue reserved for this instrument. In that way, prices for one instrument will be ordered between them, while different instruments will be handled by different threads. To reduce the number of threads, you can use some modulo calculation, using an algorithm similar to the way hashmaps are dispatching keys between their buckets.

If this Pattern interests you, have a look at Heinz Kabutz’s Striped Executor Service.

We have so far tried to tie execution of data to one thread. What if we go the opposite way? Wait for the Star Trek Anti-Pattern.

Monday, January 1, 2018

One Ring Pattern


Last time, I finished talking about all my Queue Patterns. Now I’ll start with my Thread Patterns. Once you have your queues filled with tasks, the question that arises is: how many Threads do I need to deal with them?oneRing.gif
Often enough, the answer to this question is: only one. The unique Thread. One Thread to bring them all and in the darkness bind them.

Executor myExecutor = Executors.newSingleThreadExecutor();

But why would I want only one Thread while I could maybe go faster with several working in parallel? The first reason is that if I come from a place where there was no queues, and I just introduced one (see the Marsupilami Pattern), using only one Thread means less changes to my code. Everything will work more or less as before in this part of the code. There is less chance of introducing a regression.
Secondly, one thread means no concurrency, and therefore no synchronization problems. This also means simpler code, less bugs and less maintenance. Plus, if you’ve read Martin Thompson’s blog Mechanical Sympathy, you probably heard that one big performance problems of having several Threads accessing the same queue is contention. So there are real cases where using one Thread brings better performances.
Another reason for using one Thread is that, even if you have many Threads processing the data at hyper speed, there might be only one Thread having to deal with the consequences at the end. For instance, if you are developing a GUI, there is only one Event Thread for drawing everything, and having several Threads dropping more and more data at it will not serve your purpose.
Last, an important reason for keeping only one Thread is when ordering is important. For instance, if you have an application that displays prices from the market, and you have some very volatile instrument, if many updates are handled by several Threads, a newer price might be handled faster than an older one, and you will end up with your older price displayed in the end.
Even if you feel you are stuck with one Thread because of ordering, a solution still exists. I’ll describe it in the Zebra Pattern next time.