Thursday, July 13, 2006

Faking a Left Outer Join in Query-of-Queries

Just picked up on Rip's post about Query-of-Queries tricks and thought I'd add share a trick I came up with some months ago.

One of the big things missing from Queries-of-Queries, IMHO, is support for Left Outer Joins. You can join two queries together using an "old-style" implied join like so:

<cfquery dbtype="query">
SELECT (fields)
FROM myQuery1, myQuery2
WHERE myQuery1.fooID = myQuery2.fooID
</cfquery>


...but this "implied join" syntax is only an INNER JOIN - i.e. it will only return rows where both queries have matching rows.

Often, you need the functionality of a LEFT OUTER JOIN. For the example above, a left outer join would return a row for each record in myQuery1, plus each matching record in myQuery2 if a matching row exists, or NULL in the myQuery2 fields if no matching row exists.

For instance, you might want to list all products, with the order numbers of any orders for each product, but if the product has no orders you would still want it to appear in the query. This is a perfect situation for a LEFT OUTER JOIN.

Unfortunately, query-of-queries only supports the implied join syntax, therefore it doesn't support left outer joins. But left outer joins are just so mind-buggeringly useful that I had to find a way to fake them.... hyukhyukhyuk...

Q-of-Q does support the UNION construct, for joining two result sets together. By default, a UNION filters out duplicate rows, unless you specify it as a UNION ALL - which returns all rows including duplicates.

So you do one q-of-q to get all the rows which are matched, and then another q-of-q to get all the rows in your "left" query that DON'T have a matching row in the "right" query, and you UNION ALL them together. There's a little bit of fun you have to go through in order to work out the columns in the "right" query that aren't in the "left" query, and fill them with "NULL" values - especially as you have to work around the fact that you can't really do NULLs....

...anyway, here we go. This code has been written for MX 6.1, and works pretty well on that, and on 7. I'm sure someone could make it more bulletproof if they really wanted to, and it would be fairly simple to introduce support for explicitly-named column types, but frankly, I couldn't be arsed :) Feel free to tinker with at your leisure.




<!------------------------------------------------->
<!--- leftOuterJoin --->
<!--- Emulates a left outer join in QofQ's --->
<!--- @author : Al Davidson --->
<!------------------------------------------------->

<cffunction name="leftOuterJoin" access="public" returntype="query" output="yes">
<cfargument name="qry_left" type="query" required="yes" />
<cfargument name="qry_right" type="query" required="yes" />
<cfargument name="sLeftJoinColumn" type="string" required="true" />
<cfargument name="sRightJoinColumn" type="string" required="true" />
<cfargument name="sOrderBy" type="string" required="false" default="" />

<cfscript>
// var'ed struct so that we don't have to var every local variable
var stLocals = structNew();

// check for an empty left query
if( arguments.qry_left.recordcount EQ 0 ){
return arguments.qry_left;
}

// get all the fields in qry_right that AREN'T in qry_left
stLocals.lstRightColumns = getUnMatchedListElems( lstFilter=arguments.qry_right.columnlist, lstCompareTo=arguments.qry_left.columnlist );

</cfscript>

<cfquery name="stLocals.qryDistinct" dbtype="query">
SELECT DISTINCT (#sRightJoinColumn#) AS sValues
FROM qry_right
</cfquery>
<cfset stLocals.lstRValues = valuelist( stLocals.qryDistinct.sValues ) />
<cfset stLocals.sEmptyClause = "0" />

<!--- numeric or string values? --->
<cfif NOT isNumeric( stLocals.qryDistinct.sValues[1] )>
<cfset stLocals.lstRValues = listQualify( stLocals.lstRValues, "'" ) />
<cfset stLocals.sEmptyClause = "''" />
</cfif>

<cfif listLen( stLocals.lstRValues ) EQ 0 >
<Cfset stLocals.lstRValues = stLocals.sEmptyClause />
</cfif>

<!--- try and guess the right type for each column in qry_right --->
<!--- by getting the first element in qry_right for each --->
<cfset stLocals.lstNullClause = "" />

<cfloop list="#stLocals.lstRightColumns#" index="stLocals.sCol">
<cfif isNumeric(arguments.qry_right[stLocals.sCol][1])
AND compareNoCase( stLocals.sCol, "project_code" )>
<cfset stLocals.sElem = "0 AS #stLocals.sCol#" />
<cfelse>
<cfset stLocals.sElem = "'' AS #stLocals.sCol#" />
</cfif>
<cfset stLocals.lstNullClause = listAppend( stLocals.lstNullClause, stLocals.sElem ) />
</cfloop>

<cfquery name="stLocals.qryUnion" dbtype="query">
<cfif arguments.qry_right.recordcount GT 0>
SELECT qry_left.*,
#stLocals.lstRightColumns#
FROM qry_left, qry_right
WHERE qry_left.#sLeftJoinColumn# = qry_right.#sRightjoinColumn#

UNION ALL
</cfif>
SELECT qry_left.*,
#stLocals.lstNullClause#
FROM qry_left
<cfif arguments.qry_right.recordcount GT 0>
WHERE qry_left.#sLeftJoinColumn# NOT IN( #stLocals.lstRValues# )
</cfif>
<cfif len(Trim(arguments.sOrderby))>
ORDER BY #arguments.sOrderBy#
</cfif>
</cfquery>

<cfreturn stLocals.qryUnion />
</cffunction>


<!--------------------------------------------------------->
<!--- getUnMatchedListElems --->
<!--- Returns a list containing all elements of --->
<!--- lstFilter that are NOT in lstCompareTo --->
<!--- @author : Al Davidson --->
<!--------------------------------------------------------->

<cffunction name="getUnMatchedListElems" access="public" returntype="string" output="no">
<cfargument name="lstFilter" type="string" required="true" />
<cfargument name="lstCompareTo" type="string" required="true" />

<cfset var stLocals = structNew() />
<cfset stLocals.lstReturn = "" />
<cfloop list="#arguments.lstFilter#" index="stlocals.sThisElem">
<cfif listFindNoCase( arguments.lstCompareTo, stlocals.sThisElem ) EQ 0>
<cfset stLocals.lstReturn = listAppend( stLocals.lstReturn, stlocals.sThisElem ) />
</cfif>
</cfloop>
<cfreturn stLocals.lstReturn />
</cffunction>

10 comments:

Anonymous said...

Hi I have problem in this case but I wonder if I will do in MYSQL How will I do? Thanks you so much

Alistair Davidson said...

MySQL supports LEFT OUTER JOIN syntax natively. This post is all about joining two *existing* query recordsets with a query-of-queries

Anonymous said...

Hi,

How would you handle the display part? I have a Region field from Q1 and CityName and Customer from Q2. I'm using this structure: (a [ replaces a <)

[cfoutput query="allRecords" group="Aregion"]
#Aregion#
[cfoutput group="BcityName"]
#BcityName#
[cfoutput]
#Bcustomer#
[/cfoutput]
[/cfouput]
[/cfoutput]

How would you display these same results using your function?

Thanks.

Alistair Davidson said...

Well, assuming that the join depends on a field called "region" in both queries, I guess you'd join them together like:

<cfset qry_allrecords = LeftOuterJoin( qry_left = Q1,
qry_right = Q2,
sLeftJoinColumn = "region",
sRightJoinColumn = "region",
sOrderBy = "region ASC,cityname ASC,customer ASC"
) /&rt;

and then your output code would work fine. You'd probably want to indent them a bit though:

<cfoutput query="qry_allrecords" group="region">
<h3 class="region">#qry_allrecords.Region#</h3>
<cfoutput group="cityname">
<h4 class="city">#qry_allrecords.CityName#</h4>
<ul class="customers>
<cfoutput>
<li>#qry_allrecords.customer#<>
</cfoutput>
</ul>
</cfoutput>
</cfoutput>

Alistair Davidson said...

whoops, missed a couple of bits in the HTML there, but you get the idea.

Anonymous said...

Thanks for the answer, I get the idea. My two queries are joined by a "regionID" field, which I don't display in the output. Now, I'm assuming that "qry_left" and "qry_right" in your original function refer to a (not displayed) set of 2 queries (not QoQ) you performed before the function.

The new snippet in your latest message, is that complementing your function, so that I include the snippet after it, or is it replacing your function? If the second, should I replace "Q1" and "Q2" with the syntax I used in those queries, or does the "qry_left = Q1" part just transfers the results from the original Q1 to the "qry_left" variable?

Sorry for all the questions, I'm trying to get it straight before implementing it :-)

BTW, I'm using Access, if that makes a difference.

Alistair Davidson said...

No problem Roberto - but don't be afraid to dive straight in there and start messing around with stuff until it works, then curse the person who didn't write it properly in the first place. It's the way I learnt :-)

Now, in your first question, you said :

"I have a Region field from Q1 and CityName and Customer from Q2"

so I took those names and put them in the example call to the function.

If you have a query of regions called, say, qry_regions, and a query of cities and customers called qry_citiesandcustomers, you would join them together by calling the leftOuterJoin function like this:

<cfset qry_allrecords = leftOuterjoin(
qry_left = qry_regions,
qry_right = qry_citiesandcustomers,
sLeftJoinColumn = "regionID",
sRightJoinColumn = "regionID",
sOrderBy = "region ASC,cityname ASC,customer ASC"
) />

The qry_left and qry_right parameters are the *actual query objects* that hold the results of previous queries - not the names of the variables, but the actual query recordsets themselves.

hope that helps!

Anonymous said...

Hi Allister,
I know I'm jumping in a bit late here, however, I've come across the circumstance where my two queries are joined by two different columns (EmplID, OCID). I can join them together as you mention, then get everything out of query 1 where the emplID is not in (get all EmplIDs from query 2). However, this will miss all rows that exist in query 1 where the emplID/OCID combination is NOT found in query 2, but the same emplID IS found in query 2 (with a different OCID). It doesn't happen that often, but often enough so that the resultset will return a few less rows as it gets larger and larger. I've thought about this for awhile, and can't seem to find a way to make sure ALL emplID/OCIC combinations that aren't found in query 2 will turn up in the second half of the union statement. If Q of Q would support T-SQL or PL/SQL, I could easily concatenate the two IDs, and then search for everything in query 1 where this concatenation is not in (get all emplID/OC concatenations from query 2). But there isn't T-SQl for Q of Q.
Anyway, nice post. I thought you might have a viewpoint on this one.

Thanks,

Peter

Anonymous said...

Hi Allister,

I settled on:

<cfset ColumnValues = arraynew(1)>
<cfset temp = queryAddColumn(getMarketingPromotions,"TotHrsBilled",columnvalues)>
<cfloop query="getMarketingPromotions">
<cfloop query="getHoursWorked">
<cfif getMarketingPromotions["EmplID"][getMarketingPromotions.currentrow] is getHoursWorked.EmplID and getMarketingPromotions["OCID"][getMarketingPromotions.currentrow] is getHoursWorked.fo_assignment_id>
<cfset getMarketingPromotions["TotHrsBilled"][getMarketingPromotions.currentrow] = getHoursWorked.TotHrsBilled>
</cfif>
</cfloop>
</cfloop>

Kind of a lame solution considering I'm doing a double loop, but I couldn't think of a better way to match the records.

Anyway, I like your solution, and have used it on single-column joins in Q of Q.

Thanks,

Peter

Anonymous said...

Have you enhanced this wonderful help to accommodate column data types? I have a timestamp in Q2 that getUnMatchedListElems adds as varchar :/