Merging cron expressions

February 25, 2024

Have you ever wondered when 2 crons can be merged? Take some time to think about it for yourself, it's a fun problem.

Let's start by defining what merge means here. The cron expression that is produced by this operation would need to have all the trigger times of the first, plus all the trigger times of the second, without introducing any new ones.

Simple, right? It actually is. So long as the expressions differ in only one segment then they are pretty easy to merge:

'0 1 * * *' + '0 2-4 * * *' = '0 1-4 * * *'
'0 1 JAN * *' + '0 1 JUL * *' = '0 1 * JAN,JUL *'

If they differ in more than one segment then merging them would introduce new trigger times, which is unwanted.

Merging multiple cron expressions

Here is where it gets tricky. Consider the following 3 expressions:

  1. 0 1 * * *
  2. 30 1 * * *
  3. 0 2 * * *

Both 2 and 3 can be merged with 1 but, after doing so, the resulting cron would not be compatible with what is left. The order in which crons are merged matters! Thinking about crons as a list though does not help me, let's switch to using graphs for this. The expressions will be nodes and the edges will connect expressions that can be merged. Edges are bidirectional and include the segment over which the merge is possible.

Seeing the problem visualized, the first question that pops up in my head is: How many edges can there be between 2 nodes? It turns out there are only 3 options here. 0 if they cannot be merged, 1 if they can, and 5 if they are exactly the same. We can limit this to just the first 2 cases by considering only a unique set of crons, eliminating any duplicates.

The graph is great because it poses another very direct question: Which way do you merge? For this example it does not matter, you are going to end up with 2 expressions that cannot be further simplified, but let's look at a larger example to put things into perspective.

Coming up with an algorithm

Where do we even begin?

Before we start answering that we need to know how any merge affects the graph. We noted earlier that merging over the minute segment did produce a cron that was no longer able to merge over the hour segment. If you mentally go through the graph and do the merges you will notice that every merge of type A will result in a cron that procludes all merges of type B.

This is great, it let's us calculate the amount of "damage" we would be doing to the graph. I do call it damage because to perform the most amount of merges we do want the graph to preserve the most amount of connections between operations, otherwise we will end up with a sub-optimal solution.

On the other hand, edges of the same type are preserved, so we can count on those to stay after the merge. Knowing that we can keep following merges of one type does let us visualize paths in the graph and you might even recognize some Strongly Connected Components of the same edge type. It's an interesting concept but it goes against our principle of causing the least damage possible to the graph.

What if we started with the least harmful merge?

It's a simple idea, graph theory refers to this as a Minimum Cut, though ours is a little special... In order for minimum cut to work we need to alter it so that it fits our problem.

Minimum cut allows you to arbitarily split a graph in 2 but we want to limit this to just 2 nodes at a time. Our minimum cut would also need to take into account the edges that would not be lost because of the merge type, those don't really count as damage.

So, here is the process:

  1. Isolate 2 nodes through minimum cut
  2. Produce a new node by merging them
  3. Insert the new node into the graph
  4. Recalculate the edges
  5. Repeat until there are no more edges

Example of a minimum cut in the above graph:

Only 2 connections are permanently removed, 0 1 * * * -hour- 0 2 * * * and 0 1 * * * -hour- 0 3 * * *. There might be other pairs that would results in an equivalent minimum cut and at that point order should not really matter. Chosing one over the other could lead to slightly different results though everything should still be reduced down to the same number of expressions.