Notes from the field on ColdFusion (or related) technical issues.

Wednesday, March 30, 2005

CFDirectory can be Slow with Many Files

CFDirectory can be rather slow when working with directories containing very many files (as in, thousands or tens of thousands of files.)

The underlying method used, java.io.File.listFiles(), does not allow a Java application to pass a filter to the underlying operating system, but rather forces the application to filter the directory listing on a case-by-case basis by creating a FileFilter object.

If the thousands of file names exist across a network share, the problem is compounded by the added latency of the network file system.

If you're only looking for, say, a date stamp for a single file, the entire directory is iterated looking for the file, and only then are the details of the file made available to the ColdFusion application.

If you only want information about a single file, it's fairly easy to get that directly via direct calls to Java from within ColdFusion. Here is a ColdFusion function that will efficiently get the date stamp from a single file:

<cfscript>
function getFileDate(fn) {
    theFile = createObject("java","java.io.File");
    theFile.init(fn);
    if( theFile.exists() ){
        theDate=createObject("java","java.util.Date");
        theDate.init(theFile.lastModified());
        dateString = 1+theDate.getMonth() & "/" & theDate.getDate() & "/" & 1900+theDate.getYear() & " " & theDate.getHours() & ":" & theDate.getMinutes() & ":" & theDate.getSeconds();
    }
    else {
        dateString="";
    }
    return dateString;
}
function getFileSize(fn) {
    theFile = createObject("java","java.io.File");
    theFile.init(fn);
    if( theFile.exists() ){
        return theFile.length();
    else {
        dateString="";
    }
    return dateString;
}
</cfscript>

TCP/IP Efficiency

Ran into an interesting problem the other day.

Seems that the TCP connection between JRun (which underpins ColdFusion) and its Web server connector can get somewhat overwhelmed by the bursty nature of ColdFusion pages.

(Normally, ColdFusion doesn't send any data back to the Web server until the page is completely finished, as the last line of a template could be a CFLOCATION or CFHEADER tag, and once the HTTP header is sent, it can't be altered nor retracted.)

Using Ethereal to analyze the communication, it became clear that the JRPP connections (JRPP is the protocol used for JRun's Web server connectors to talk to JRun itself) were stalling because of something called "Silly Window Syndrome" avoidance (see RFC 1122), when the receive TCP buffer at the Web server was getting quickly filled. The stalling was prevented by increasing the TCP max window size in Windows by making a few registry settings.

Here's the contents of the registry file that adjusts the TCP max window size to roughly 128k (instead of the default ~17k):
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"GlobalMaxTcpWindowSize"=dword:00020148
"TcpWindowSize"=dword:00020148
"Tcp1323Opts"=dword:00000003
This also helped speed up my Internet downloads for my home PC. I'd recommend applying the change to any Windows box with more than 256MB RAM.

For more on TCP options under Windows, see
this TechNet article.

Avoid Evaluate()

The ColdFusion Evaluate() function is often used to resolve dynamically named variables, such as form variables.

The evaluate function, however, forces ColdFusion to compile the expression being evaluated every time you call it, which can be a strain on performance, especially since parts of the compilation process are necessarily single-threaded.

Almost all instances of Evaluate() can be avoided by using structure syntax; instead of writing:

<cfset value=evaluate("form.field#i#")>

You can write:

<cfset value=form["field#i#"]>

Which is much faster, and better yet, simpler.

This doesn't work quite so well when looping over a query, however. If you use structure syntax to access a query column, you must also specify the row number. So, where this would work:

<cfoutput query="q">
#evaluate("q.#fieldname#")#<br>
</cfoutput>

You'd have to use q.currentrow to get the correct row when avoiding evaluate():
<cfoutput query="q">
#q[fieldname][q.currentrow]#<br>
</cfoutput>

Update: 7Apr2005

Classes created for evaluate() [and de() and setVariable()] are cached on a literal string basis (sorta like cached queries work for the exact string of the query), so the performance impact isn't quite so bad, so long as the literal string for evaluate() doesn't change every time.