A UDF for sorting arrays of beans (instantiated CFC’s), or arrays of structures

27 09 2008

A while a ago, I posted a discussion about different ways to represent complex data, and a UDF to sort arrays of structures on multiple keys. After reading this post, Brian Fitzgerald asked me if I knew of any UDF that could provide the same kind of functionality for an array of beans (instantiated CFC’s). This made me realize that my blog post had omitted perhaps the most useful way of representing complex data in ColdFusion (version 6 and up): ColdFusion Components!

This is not the place for an in-depth discussion of CFC’s, but if you are interested I can recommend this introduction by Hal Helms. For me, the essential advantage of CFC’s is, that they provide the possibility to encapsulate both data and functionality inside one object(variable). This makes it possible for CFC’s to represent real-life objects more naturally than any other CF datatype. CFC’s also provide great possibilities for writing re-usable and portable code, because they can contain the logic that is necessary to handle the specific data they are holding.

As for Brian’s question, it inspired me to improve my sorting UDF, so that now it is able to sort arrays of objects, the object being either instantiated CFC’s (beans), or structures. The UDF, together with two usage examples, can be downloaded here. The code of the examples is shown below .

<!--- Example: sorting an array of instantiated CFCs --->
<!--- Note: this templates depends on person.cfc being in the same folder --->
<!--- Include UDF template --->
<cfinclude template="sortArrayOfObjects.cfm">
<cfscript>
//EXAMPLE USAGE:
//an array of 4 persons, with three properties...
variables.persons = arrayNew(1);
variables.person = createObject('component','person');
variables.person.setName('Martijn');
variables.person.setAge(29);
variables.person.setWeight(78);
arrayAppend(variables.persons, variables.person);
variables.person = createObject('component','person');
variables.person.setName('Jelle');
variables.person.setAge(29);
variables.person.setWeight(85);
arrayAppend(variables.persons, variables.person);
variables.person = createObject('component','person');
variables.person.setName('Lammert');
variables.person.setAge(51);
variables.person.setWeight(78);
arrayAppend(variables.persons, variables.person);
variables.person = createObject('component','person');
variables.person.setName('Bas');
variables.person.setAge(51);
variables.person.setWeight(78);
arrayAppend(variables.persons, variables.person);
// specify how to sort:
// first on age - descending
// second on weight - ascending
// third on name - ascending
variables.sortkeys = arrayNew(1);
variables.sortKey = structNew();
variables.sortKey.keyName = "age";
variables.sortKey.sortOrder = "descending";
arrayAppend(variables.sortKeys, variables.sortKey);
variables.sortKey = structNew();
variables.sortKey.keyName = "weight";
variables.sortKey.sortOrder = "ascending";
arrayAppend(variables.sortKeys, variables.sortKey);
variables.sortKey = structNew();
variables.sortKey.keyName = "name";
variables.sortKey.sortOrder = "ascending";
arrayAppend(variables.sortKeys, variables.sortKey);
// do the sorting
variables.persons_sorted = sortArrayOfObjects(
arrayToSort = variables.persons,
sortKeys = variables.sortKeys,
doDuplicate = false,
useGetterMethods = true);
</cfscript>
<h3>Unsorted:</h3>
<cfloop from="1" to="#arrayLen(variables.persons)#" index="variables.i">
<cfset variables.person = persons[variables.i]>
<cfoutput>
<p>
Position: #variables.i# <br />
Age: #variables.person.getAge()# <br />
Weight: #variables.person.getWeight()# <br />
Name: #variables.person.getName()# </br>
</p>
</cfoutput>
</cfloop>
<h3>Sorted:</h3>
<cfloop from="1" to="#arrayLen(variables.persons_sorted)#" index="variables.i">
<cfset variables.person = persons_sorted[variables.i]>
<cfoutput>
<p>
Position: #variables.i# <br />
Age: #variables.person.getAge()# <br />
Weight: #variables.person.getWeight()# <br />
Name: #variables.person.getName()# </br>
</p>
</cfoutput>
</cfloop>

<!--- Example: sorting an array of structures --->
<!--- Include UDF template --->
<cfinclude template="sortArrayOfObjects.cfm">
<cfscript>
//     EXAMPLE USAGE:
//an array of 4 persons, with three properties...
variables.persons = arrayNew(1);
variables.person = structNew();
variables.person.name = 'Martijn';
variables.person.age = 29;
variables.person.weight = 78;
arrayAppend(variables.persons, variables.person);
variables.person = structNew();
variables.person.name = 'Jelle';
variables.person.age = 29;
variables.person.weight = 85;
arrayAppend(variables.persons, variables.person);
variables.person = structNew();
variables.person.name = 'Lammert';
variables.person.age = 51;
variables.person.weight = 78;
arrayAppend(variables.persons, variables.person);
variables.person = structNew();
variables.person.name = 'Bas';
variables.person.age = 51;
variables.person.weight = 78;
arrayAppend(variables.persons, variables.person);
// specify how to sort:
// first on age - descending
// second on weight - ascending
// third on name - ascending
variables.sortkeys = arrayNew(1);
variables.sortKey = structNew();
variables.sortKey.keyName = "age";
variables.sortKey.sortOrder = "descending";
arrayAppend(variables.sortKeys, variables.sortKey);
variables.sortKey = structNew();
variables.sortKey.keyName = "weight";
variables.sortKey.sortOrder = "ascending";
arrayAppend(variables.sortKeys, variables.sortKey);
variables.sortKey = structNew();
variables.sortKey.keyName = "name";
variables.sortKey.sortOrder = "ascending";
arrayAppend(variables.sortKeys, variables.sortKey);
// do the sorting
variables.persons_sorted = sortArrayOfObjects(
arrayToSort = variables.persons,
sortKeys = variables.sortKeys,
doDuplicate = false,
useGetterMethods = false);
</cfscript>
<h3>Unsorted:</h3>
<cfdump var="#variables.persons#">
<h3>Sorted:</h3>
<cfdump var="#variables.persons_sorted#">





It looks like a minus but it is not: \u2013

24 09 2008

Today a colleague came to me with a rather curious error. He had made a UDF, containing the following line of code:
ZS = (arguments.rawScore – arguments.mean) / arguments.SD;
All referenced arguments existed (we checked that multiple times :) ) and – as you may see, the expression structure seems to be valid (the UDF was written in cfscript).

Still, calling the UDF consistently resulted in the following error:

Invalid CFML construct found on line XX at column XX.

ColdFusion was looking at the following text:

\u2013

After a few minutes of astonishment another colleague suggested to take a good look at the character at the position reported in the error message. This turned out to be the  – in the above line. Nothing seemed wrong with it, but replacing it with a – turned out to fix the problem.

Huh? I could not see the difference either, but the original – had been copied and pasted from a MS Word document, and in fact was not a minus but a unicode ‘EN DASH’ (U+2013). This character is not a minus so ColdFusion was correct in reporting the error. Apparently MS Word had figured out that my colleague actually meant to write an ‘EN DASH’ when he pressed the minus key on his keyboard. Just another reason not to use MS Word to wite down code. Not even short snippets of code, as part of taking notes in a discussion!





UDF for linear transformation of numeric values

26 08 2008

Today I wrote a little UDF linearly transforms a value from a given scale, to an arbitrary scale with the specified range.

Say what?

Let’s say I have a test for which the maximum possible score is 25, and the minimum possible score is 5. I might want to rescale that score so that is has a minimum of 0 and a maximum of 10. This could be because I have to report the scores of many different test scores, each with their own minimum and maximum, and I want an easy way to compare these scores.  This UDF is a simple way to accomplish this. You can download the code here.

UDF

<cfscript>
/**
Linearly transforms a value from a scale with a given mininum and maximum,
to a new scale with given minimum and maximum
*
* @param value - Numeric: the value that must be rescaled.
* @param originalMin - Numeric: the minimum of the original, untransformed scale.
* @param originalMax - Numeric: the maximum of the original, untransformed scale.
* @param newMin - Numeric: the desired minimum for the new, transformed scale.
* @param newMax - Numeric: the desired maximum for the new, transformed scale.
* @return - Numeric: the linearly transformed value
* @author Martijn van der Woud (http://martijnvanderwoud.wordpress.com) at Orga-Toolkit (http://www.orga-toolkit.nl)
* @version 1, August 26, 2008
*/
function linearRescale(value, originalMin, originalMax, newMin, newMax) {
return arguments.newMin +
(arguments.newMax - arguments.newMin) *
(arguments.value - arguments.originalMin) /
(arguments.originalMax - arguments.originalMin);
}
<cfscript>

Example usage

<!--- the properties of the original scale --->
<cfset variables.originalMin = 5>
<cfset variables.originalMax = 25>
<!--- the (desired) properties of the new, transformed scale --->
<cfset variables.newMin = 0>
<cfset variables.newMax = 10>
<!--- The value that must be transformed --->
<cfset variables.value = 20>
<!--- calculate the rescaled value --->
<cfset variables.rescaledValue = linearRescale(
variables.value,
variables.originalMin,
variables.originalMax,
variables.newMin,
variables.newMax)>
<cfoutput>#rescaledValue#</cfoutput>





How to calculate the Pearson correlation for two keys in an array of structures

9 07 2008

Today, a collegue asked me to write a UDF to calculate the Pearson correlation. Of course, this kind of assignments are fun for me. After all, I like statistics AND ColdFusion, so what could be better than combining these? OK, that sounds a bit dorky :) , but it’s always nice to write code for doing calculations that you would not usually do by hand. It makes it easier to understand what is going on when you use software like SPSS to do the calculations for you.

So here is my code for calculating the Pearson correlation. It can also be downloaded HERE, and I will submit this to CFLIB.org as well. Comments and criticism very welcome!

UDF

<cfscript>
/**
* Returns the pearson correlation between (the values inside) two keys in an array of structures .
* Values must be numeric, obviously
*
* @param arrayOfStructures - An array of structures containing the specified keys for every element.
* @param xKey - A string: the structKey containing the first variable.
* @param yKey - A string: the structKey containing the second variable.
* @return Struct with two keys:
*     pearsonCorrelation: a number ranging from -1 to +1, or an empty string if inputValid is false.
*     inputValid - boolean: indicates if the input was valid, so that the pearson correlation could be computed
*         (pearson correlation cannot be computed if at least one of the variables has zero variance).
* @author Martijn van der Woud (http://martijnvanderwoud.wordpress.com) at Orga-Toolkit (http://www.orga-toolkit.nl)
* @version 1, July 9, 2008
*/
function pearsonCorrelation (arrayOfStructs, xKey, yKey) {
// numeric: holds the mean value for the xKey
var xMean = 0;
// numercic: holds the mean value for the yKey
var yMean = 0;
// numeric: just a loop index
var i=0;
// numeric: holds the sum of all values for the xKey
var xSum = 0;
// numeric: holds the sum of all values for the yKey
var ySum = 0;
// numeric: the number of elements in arrayOfStructs
var length = arrayLen(arguments.arrayOfStructs);
// numeric: the sum of squared deviations for the xKey
var sqDevX = 0;
// numeric: the sum of squaried deviations for the yKey
var sqDevY = 0;
// numeric: the sum of cross-products
var crossProductSum = 0;
// numeric: holds the deviation from the mean for the xKey in a specific element
var xDeviation = 0;
// numeric: holds the deviation from the mean for the yKey in a specific element
var yDeviation = 0;
// numeric: the Pearson correlation
var pearsonCorrelation = 0;
// struct: the results to return
var results = structNew();
// loop over elements in argument arrayOfStructs
for(i = 1; i lte length; i = i+1) {
// add the xKey and yKey values of the current element to their corresponding sum variable
xSum = xSum + arguments.arrayOfStructs[i][arguments.xKey];
ySum = ySum + arguments.arrayOfStructs[i][arguments.yKey];
} // end of loop over elements in argument arrayOfStructs
// calculate the means of xKey and yKey
xMean = xSum / length;
yMean = ySum / length;
// again, loop over elements in argument arrayOfStructs
for(i = 1; i lte length; i = i+1) {
// calculate deviations from the mean for the current element
xDeviation = arguments.arrayOfStructs[i][arguments.xKey] - xMean;
yDeviation = arguments.arrayOfStructs[i][arguments.yKey] - yMean;
// update sums of squared deviations and cross-products
sqDevX = sqDevX + xDeviation^2;
sqDevY = sqDevY + yDeviation^2;
crossProductSum = crossProductSum + xDeviation * yDeviation;
} // end of loop over elements in argument arrayOfStructs
// if there is no variation in either xKey or yKey, the pearson correlation cannot be computed, so indicate an error
if (min(sqDevX, sqDevY) eq 0) {
results.inputValid = false;
results.pearsonCorrelation = "";
} else { // otherwise, calculatie the pearson correlation
pearsonCorrelation = (crossProductSum / (length-1));
pearsonCorrelation = pearsonCorrelation / sqr(sqDevX / (length-1));
pearsonCorrelation = pearsonCorrelation / sqr(sqDevY / (length-1));
results.inputValid = true;
results.pearsonCorrelation = pearsonCorrelation;
}
return results;
} // end of function pearsonCorrelation()

Example usage

</cfscript>
<!--- example --->
<!--- An array of structs, with keys "X" and "Y"; all values in X and Y are numeric--->
<cfset variables.arrayOfStructs = arrayNew(1)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 3>
<cfset variables.element.Y = 1>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 6>
<cfset variables.element.Y = 2>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 7>
<cfset variables.element.Y = 3>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 4>
<cfset variables.element.Y = 4>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 5>
<cfset variables.element.Y = 5>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 3>
<cfset variables.element.Y = 6>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 7>
<cfset variables.element.Y = 7>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 6>
<cfset variables.element.Y = 8>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 5>
<cfset variables.element.Y = 9>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 2>
<cfset variables.element.Y = 1>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 3>
<cfset variables.element.Y = 2>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 5>
<cfset variables.element.Y = 3>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 6>
<cfset variables.element.Y = 4>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.element = structNew()>
<cfset variables.element.X = 7>
<cfset variables.element.Y = 5>
<cfset arrayAppend(variables.arrayOfStructs, variables.element)>
<cfset variables.pearsonCorrelation = pearsonCorrelation(
arrayOfStructs = variables.arrayOfStructs,
xKey = "X",
yKey = "Y")>
<cfdump var="#variables.pearsonCorrelation#">





Representation and sorting of complex data

6 07 2008

Yesterday I saw a blog post by Ray Camden, showing an example of how to do a bubble sort on a two-dimensional array. I like the fact that this kind of sort is so easy to implement, with so few lines of code. As Ray pointed out, however, the bubble sort not suitable for sorting large sets of data. As an alternative, Ray suggests creating the array as a query, which of course can be sorted on multiple keys by using ColdFusion’s in-memory query capabilities.

Another alternative is to replace the two-dimensional array with an array of structures. Representing a set of objects as an array of structures is something I often choose to do, and in this blog post I will try to explain my thoughts on this matter.

In cases such as described by Ray, where the information is about objects which have one or more properties, arrays of structures result into more readable and maintable code than two-dimensional arrays. Compare, for instance, the following blocks of code:

<cfset persons = arrayNew(2)>
<cfset persons[1][1] = "Martijn">
<cfset persons[1][2] = 29>
<cfset persons[2][1] = "Franca">
<cfset persons[2][2] = 33>

<cfset persons = arrayNew(1)>
<cfset personData = structNew()>
<cfset personData.firstName = "Martijn">
<cfset personData.age = 29>
<cfset arrayAppend(persons, personData)>
<cfset personData = structNew()>
<cfset personData.firstName="Franca">
<cfset personData.age = 32>
<cfset arrayAppend(persons, personData)>

OK, so the second block of code is longer, but it also provides more information: from the first block of code we would have to guess that the number at position 2 of the second index is in fact the age of the person. In the second block of code, this is explicitly stated. Dumping the array would show all properties of the persons by name, so if we pass the data around in an application it will be instantly obvious what properties we are dealing with, even without looking at the code that created the array. This also applies to debugging your code: error messages like “Element firstName is undefined in a CFML structure referenced as part of an expression. ” are much easier to decode than “Element 1 is undefined in a Java object of type class coldfusion.runtime.Array. “

Creating the array like in the second block is also less error-prone: often we would write the code for creating one record, than copy-paste-edit to create the rest of the record. Using arrays of structures, we would only have to edit the property values. When using two-dimensional arrays we would also have to edit the record number (the first index) and if you are not far more accurate than I am you are going to end up skipping or duplicating a row number, or worse: editing the wrong index.

So what about presenting the data as a query object? You could do something like this:

<cfset rs = queryNew("firstName,age")>
<cfset queryAddRow(rs)>
<cfset querySetCell(rs,"firstName","Martijn")>
<cfset querySetCell(rs,"age",29)>
<cfset queryAddRow(rs)>
<cfset querySetCell(rs,"firstName","Franca")>
<cfset querySetCell(rs,"age",33)>

I am aware of two advantages of arrays of structures over queries. The first advantage is, that an array of structures can be represented in almost any programming language, whereas query object are very specific to ColdFusion. So if you are building a webservice that is to be consumed by many different types of client technology, you might want to pick structures over query objects.

The second advantage is, that – using an array – you can pass all data pertaining to one record in one reference, like this:

<cfset secondPerson = persons[2]>
… instead of
<cfset secondPerson.firstName = persons.firstName[2]>
<cfset secondPerson.age = persons.age[2]

It is easy to see that the code referring to the query representation would have to be updated if we were to add a third property to each person. The code referring to the array-of-structs representation would stay the same.

So what are the advantages of the query object representation? The most important one to me is that queries provide a very easy way to select records based on one or more properties: you can query them as if they are database tables, like this:

<cfquery dbtyp="query" name="selectedPerson">
SELECT * FROM persons WHERE age >=30
</cfquery>

Using the array-of-structs representation, you would have to loop through the entire set of records and determine for every record wether it should be selected. Obviously, this is more tedious to implement and will certainly perfom poorly when using large datsets.

The second advantage would be that data in queries can easily be sorted on multiple key by adding an ORDER BY clause the query-of-query SQL. To overcome this, I have written a UDF that will do an insertion sort on an array of structures on multiple keys. Sort order can be specified seperately per key. The code containing the UDF and a simple example can be downloaded here. Enjoy!