Else Logic was conceived while trying to work out how to add outer joins to other languages. While the link between outer joins and subqueries may not be immediately obvious, the origins of the current outer join syntax lies in trying avoid having to specify subqueries for them. In applying outer joins to other languages we will have to retrace the steps taken by the database world and start with explaining a more primitive approach to outer joins than the one in current use in SQL.
This page will first describe this ancient method - christened the outer product - and then explain how we use this method to perform outer joins. To show that this method is fairly language independent - and Else Logic as well - I will show the examples in both SQL and XQuery. Elsewhere on this site the Outer Join Tutorial will go into more detail on the outer product and its uses, but the tutorial covers only SQL.
| Sources | Product of Sources | Outer Products | Product of Outer Products |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Image 1: Outer Product Examples
The outer join combines data from multiple sources, both where the data from both sources is linked, as well as where the data in one or both sources is not linked to the other source. It does this quite well - and I do not propose to replace it - but few other languages implement outer joins as implementing them does little else than implementing outer joins. For this small advantage you require extra syntax and you have to deal with null values as in SQL.
The advantages of the outer product are that it is easier to implement than outer joins, does not need any special syntax and that it offers a wider range of results. The disadvantage is that using the outer product for outer joins and many other applications requires subqueries.
All the outer product does is append an empty row in SQL or "nothing" in XQuery behind a list of items. Image 1 shows examples with two tables R1 and R2 the outer products of these tables R1* and R2*. Furthermore image 1 shows the product of the two tables (R1 x R2) with the product of the outer products of those tables (R1* x R2).
Note that the outer product works on a single source while the normal product operator combines two (or more) sources. The reason to call it the outer product is because the normal use of the outer product is as input for a product operation, where it produces a result set similar to the result from outer joins, outer unions and other outer versions of operators. There are some cases where we can use the fact that the outer product has only one source as input, but the major advantage is that in contrast to the outer join we do not have to worry about composition when using the result of an outer product as input for a product operation. Also the outer product of two or more sources is just the product of the outer products of the input sources.
| SQL | XQuery |
|---|---|
FROM (
SELECT * FROM R1
UNION ALL
SELECT NULL, NULL
) AS R1
| for $r1 in (
doc('r.xml')/rs
|
<?null?>
) |
In SQL we can create an outer product by appending an empty list of NULLs to a table using the UNION operator. In XQuery this is slightly trickier because XQuery does not have an entity comparable to SQL's NULLs. However, we can temporarily use the processing instruction <?null?> with little fear of getting in the way of 99.99% of all real data. The union operator in XQuery is the "|" pipe operator so assuming there are two fields in R1, table 1 shops the SQL and XQuery we use for an outer product:
| Product of Outer Products | Full Outer Join | Left Outer Join | Right Outer Join |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Image 2: Outer Product versus Joins
Image 2 illustrates that you can get all the types of outer joins from the product of two outer products by writing the right row filter. This filter is not simple - as it contains a subquery - but it does follow a fixed formula. E.g. left outer join for the outer product consists of these three filter parts:
| (1) | the inner join |
| or | |
| (2a) | empty rows from the right source (i.e. having a null value in their primary key field) |
| and | |
| (2b) | rows from the left source not in the inner join |
| SQL | XQuery |
|---|---|
SELECT *
FROM R1*, R2*
WHERE
C1 = C3 -- (1)
OR
(
C3 IS NULL -- (2a)
AND
C1 NOT IN -- (2b)
(SELECT C3 FROM R2
WHERE C3 IS NOT NULL)
-- (end of 2b)
)
| <overview> {
for $r1 in $r1s*,
$r2 in $r2s*
where
($r1/C1 = $r2/C3) (: 1 :)
or
(
$r2 != <?null?> (: 2a :)
and
empty($r2s[C4 = $r1/C1]) (: 2b :)
)
return
<contact> {
$r1/C1, $r/C2, $2/C3, $r2/C4
} </contact>
} </overview> |
Table 2 shows how we can express this in SQL and XQuery - using table_name* as a shorthand for the outer product.
For the right outer join we just need to switch the right and the left columns in the formula. For the full outer join we combine the left column version with the right column version.
The previous queries do not look like candidates for shortening using Else Logic. The subqueries use only part of the main data source not the whole so there is little scope for code reuse. The trick is to rewrite the filter statements into queries that can use Else Logic.
| SQL | XQuery |
|---|---|
...
WHERE
C1 = C3 -- (1)
OR
(
C3 IS NULL -- (2a)
AND
C1 NOT IN -- (2b)
(SELECT C1 FROM R1*, R2*
WHERE C1 = C3)
-- (end of 2b)
)
| ...
where
($r1/C1 = $r2/C3) (: 1 :)
or
(
$r2 != <?null?> (: 2a :)
and
not($r1 (: 2b :)
intersect (
for $r1 in $r1s*,
$r2 in $r2s*
where ($r1/C1 = $r2/C3)
return $r1
)
) (: end of 2b :)
... |
Table 3 shows how to rewrite the subqueries to prepare them for Else Logic. This is not an obvious step as the 2B subqueries in table 3 are more complex than those in table 2. However, for Else Logic these queries are a huge improvement.
| SQL | XQuery |
|---|---|
...
WHERE
C1 = C3 -- (1)
OR ELSE
(
C3 IS NULL -- (2a)
AND
C1 NOT EXISTS() -- (2b)
)
| ...
where
($r1/C1 = $r2/C3) (: 1 :)
orelse
(
$r2 != <?null?> (: 2a :)
and
not($r1 intersect (for $*)) (: 2b :)
)
... |
All these changes mean that we can use Else Logic to shorten the queries. The result is in table 4 (the for $* in the XQuery example is used to recall the default data source). Of course we do not have to use the exact syntax as used in there - in particular for XQuery - but table 4 does show that using subqueries for outer joins does not have to involve writing long subqueries.
The syntax proposed in table 4 also demonstrates that this approach to outer joins will work for joins that compare more than one field from each source. The C1 = C3 statement on the right side of the OR ELSE is reused in the subquery. Combine multiple expressions using a pair of brackets and all these statements will be part of the subquery. This approach works even on filtered outer joins, i.e. when the right source (in case of the left outer join) is filtered before combining it with the left source.
This approach solves another problem. The SQL 92 standard adopted the current specification for outer joins in the FROM statement because the previous method of appending (+) to the outer joined fields did not allow the programmer to specify the relationship between multiple outer joins in a query. In Else Logic you can specify this order using brackets and the new logical operators to specify this relationship. The Outer Join Tutorial contains examples.
The conclusion is that outer product / Else Logic approach does give the programmer a simple method for implementing outer joins. That the approach also gives us a much wider choice of result sets than just the outer join is a very nice bonus. All this is demonstrated in the tutorial .
| SQL | XQuery |
|---|---|
... WHERE C1 = C3 -- (1) OR ELSE (C1 LEFT JOIN C3) -- (2) | ... where ($r1/C1 = $r2/C3) (: 1 :) orelse $r1 leftJoin $r2 (: 2 :) ... |
As the expression in table 4 has a fixed pattern for left, right and full outer joins we can use syntax (or functions, or macros) to simplify these expressions. Table 5 demonstrates this, combining 2a and 2b into 2.
So it comes to extra syntax in the end? Well, it is just an option. I think not using the extra syntax still results in manageable code. Furthermore: in this case all the extra syntax rules do more than just deliver the outer join.
I was tempted to put the whole outer join expression in a single statement, but this would only work in SQL when C1 and C3 are both the primary key of their data sources. This situation does not occur very often. So we stop at the (join filter expression) OR ELSE (join type expression) syntax as the end simplification in expressing joins in SQL. XQuery might have more room for this approach because the "rows" are single elements that we can compare whole and the join filter expression consists of XPath expressions starting at those row elements. However, a solution would have to work for more complex join types than those on a single field. Something I currently don't see a solution for.