Intersection and diff of two structs

function reducer( condition ) { return function( a, b ) { return a.reduce(function(accumulator, key, value) { if (condition(b, key)) { accumulator[key] = value; } return accumulator; }, {}); }; }

I recently blogged about finding the union of two structs. Following on from that I wanted to look at finding the intersection (keys common to both) and diff (keys unique to the struct).

So without further ado, here’s how to find the intersection:

function intersection(a, b ) {
    return a.reduce(function(accumulator, key, value) {
        if (b.keyExists(key)) {
          accumulator[key] = value;
        }
        return accumulator;
    }, {});
}

set1 = {a:1,"b":2};
set2 = {"B":20,c:3};

// intersection - note only checks keys!
writeDump(intersection(set1,set2)) // output {b:2}

If you’ve not used structreduce before (or the member function version of mystruct.reduce as I do in the example), then it can be a bit hard to see what is happening. Essentially what it does is to iterate over each key/value pair in the struct and call the Anonymous function each time building up a result to return.

Set1 contains {"b":2} and set2 contains {"B":20}. CFML treats upper and lowercase keys the same, so that’s why we get the match. What the intersection does not check is the values – we are just interested in the keys. We can only have one value per key, so I’ve chosen to use the value from set1.

One thing we should check is if our original set1 and set2 have mutated. The answer is, no – they are not changed. Now you may be thinking that CFML passes structs by reference – and you’d be right – so why isn’t it mutated? One of the benefits of using structReduce is that it returns a new struct (the accumulator) and doesn’t change the original struct.

Finding the keys that only exist in set1 can be done with something like this:

function diff( a, b ) {
    return a.reduce(function(accumulator, key, value) {
        if (!b.keyExists(key)) {
          accumulator[key] = value;
        }
        return accumulator;
    }, {});
}

set1 = {a:1,"b":2};
set2 = {"B":20,c:3};

// intersection - note only checks keys!
writeDump(diff(set1,set2)) // output {a:1}

Running this does as expected, we just get back {a:1}.

Looking a bit closer, the code for diff and intersection are very similar – in fact, one extra character. That seems like unwanted duplication. Time to refactor!

Before refactoring, I know I have a working implementation so I’m not touching anything without tests (usually I’d have started off with a test before writing any code – forgive me reader!). Here’s a very simple test suite – normally I’d use a proper testing framework like TestBox, but I’m keeping it simple here for this blog post.

function assert( expected, actual ) {
     if (expected == actual) {
        writeoutput('PASS<br>');
     } else {
        writeoutput('FAIL - expected #expected# got #actual#<br>');
     }
}

set1 = {a:1,"b":2};
set2 = {"B":20,c:3};

// intersection - note only checks keys!
writeOutput("Intersection Tests...<br>");
result=intersection( set1, set2 );
assert( 1, structCount(result) );
assert( 2, result.b );

// diff - keys in set1 not in set2
writeOutput("Diff Tests...<br>");
result=diff( set1, set2 );
assert( 1, structCount(result) );
assert( 1, result.a );

writeOutput("Set1 Mutation Tests...<br>");
assert( 2, structCount(set1) );
assert( 1, set1.a );
assert( 2, set1.b );

writeOutput("Set2 Mutation Tests...<br>");
assert( 2, structCount(set2) );
assert( 20, set2.b );
assert( 3, set2.c );

The first step in refactoring is to extract the part that determines if I should include or exclude if the key is found – (b.keyExists(key)) vs (!b.keyExists(key)) – into new functions.

function include(b, key) {
    return b.keyExists(key);
}

and the inverse of that is:

function exclude(b, key) {
    return !include(b, key);
}

Now to use these functions in the diff and intersection functions.

function intersection( a, b ) {
    return a.reduce(function(accumulator, key, value) {
      if (include(b, key)) {
         accumulator[key] = value;
      }
      return accumulator;
    }, {});
}

function diff( a, b ) {
    return a.reduce(function(accumulator, key, value) {
        if (exclude(b, key)) {
          accumulator[key] = value;
        }
        return accumulator;
    }, {});
}

The tests still pass so onto the next refactoring step. I want to extract the common code and be able to pass in either the include or exclude function. In CFML functions are first class citizens, so we can pass the function in as an argument.

function reducer( condition ) {
    return function( a, b ) {
        return a.reduce(function(accumulator, key, value) {
            if (condition(b, key)) {
              accumulator[key] = value;
            }
            return accumulator;
        }, {});
    };
}

I’ve combined the previous intersection and diff functions into one function called reducer which accepts a function as the condition argument. The reducer function returns a new function. I can then call the reducer function passing in the condition I want to use and assign the returned function to a variable.

intersection = reducer(include);
diff = reducer(exclude);

The tests still pass, and we have no code duplication. For completeness here it is all together (without the testing code):

function include(b, key) {
    return b.keyExists(key);
}
function exclude(b, key) {
    return !include(b, key);
}

function reducer(condition) {
    return function( a, b ) {
        return a.reduce(function(accumulator, key, value) {
            if (condition(b, key)) {
              accumulator[key] = value;
            }
            return accumulator;
        }, {});
    };
}

intersection = reducer(include);
diff = reducer(exclude);

Now you may find this code confusing to read, but even if you choose not to write code like this hopefully you’ve found it interesting!