Group selection made easy, Outer joins made powerful
Site owned & produced by MagnaFacta

Else Logic for Outer Joins

Else Logic was conceived while trying to work out how to add outer joins to other languages. While the link between outer joins and subqueries may not be immediately obvious, the origins of the current outer join syntax lies in trying avoid having to specify subqueries for them. In applying outer joins to other languages we will have to retrace the steps taken by the database world and start with explaining a more primitive approach to outer joins than the one in current use in SQL.

This page will first describe this ancient method - christened the outer product - and then explain how we use this method to perform outer joins. To show that this method is fairly language independent - and Else Logic as well - I will show the examples in both SQL and XQuery. Elsewhere on this site the Outer Join Tutorial will go into more detail on the outer product and its uses, but the tutorial covers only SQL.

The outer product

SourcesProduct
of Sources
Outer
Products
Product of
Outer Products
R1
C1C2
A1
B2
R1 x R2
C1C2C3C4
A1B10
A1D20
B2B10
B2D20
R1*
C1C2
A1
B2
NullNull
R1* x R2*
C1C2C3C4
A1B10
A1D20
A1NullNull
B2B10
B2D20
B2NullNull
NullNullB10
NullNullD20
NullNullNullNull
R2
C3C4
B10
D20
R2*
C3C4
B10
D20
NullNull

Image 1: Outer Product Examples

The outer join combines data from multiple sources, both where the data from both sources is linked, as well as where the data in one or both sources is not linked to the other source. It does this quite well - and I do not propose to replace it - but few other languages implement outer joins as implementing them does little else than implementing outer joins. For this small advantage you require extra syntax and you have to deal with null values as in SQL.

The advantages of the outer product are that it is easier to implement than outer joins, does not need any special syntax and that it offers a wider range of results. The disadvantage is that using the outer product for outer joins and many other applications requires subqueries.

All the outer product does is append an empty row in SQL or "nothing" in XQuery behind a list of items. Image 1 shows examples with two tables R1 and R2 the outer products of these tables R1* and R2*. Furthermore image 1 shows the product of the two tables (R1 x R2) with the product of the outer products of those tables (R1* x R2).

Note that the outer product works on a single source while the normal product operator combines two (or more) sources. The reason to call it the outer product is because the normal use of the outer product is as input for a product operation, where it produces a result set similar to the result from outer joins, outer unions and other outer versions of operators. There are some cases where we can use the fact that the outer product has only one source as input, but the major advantage is that in contrast to the outer join we do not have to worry about composition when using the result of an outer product as input for a product operation. Also the outer product of two or more sources is just the product of the outer products of the input sources.

Table 1: Executing the outer product
SQLXQuery
FROM (
  SELECT * FROM R1
    UNION ALL
  SELECT NULL, NULL   
) AS R1
for $r1 in (
  doc('r.xml')/rs   
    | 
  <?null?>
)

In SQL we can create an outer product by appending an empty list of NULLs to a table using the UNION operator. In XQuery this is slightly trickier because XQuery does not have an entity comparable to SQL's NULLs. However, we can temporarily use the processing instruction <?null?> with little fear of getting in the way of 99.99% of all real data. The union operator in XQuery is the "|" pipe operator so assuming there are two fields in R1, table 1 shops the SQL and XQuery we use for an outer product:

Subqueries for outer joins

Product of
Outer Products
Full Outer
Join
Left Outer
Join
Right Outer
Join
R1* x R2*
C1C2C3C4
A1B10
A1D20
A1NullNull
B2B10
B2D20
B2NullNull
NullNullB10
NullNullD20
NullNullNullNull
R1 fojC1=C3 R2
C1C2C3C4
A1NullNull
B2B10
NullNullD20
R1 lojC1=C3 R2
C1C2C3C4
A1NullNull
B2B10
R1 rojC1=C3 R2
C1C2C3C4
B2B10
NullNullD20

Image 2: Outer Product versus Joins

Image 2 illustrates that you can get all the types of outer joins from the product of two outer products by writing the right row filter. This filter is not simple - as it contains a subquery - but it does follow a fixed formula. E.g. left outer join for the outer product consists of these three filter parts:

(1)the inner join
or
(2a)empty rows from the right source (i.e. having a null value in their primary key field)
and
(2b)rows from the left source not in the inner join
Table 2: Left join filters for outer product
SQLXQuery

SELECT *
FROM  R1*, R2*
WHERE
  C1 = C3             -- (1)
OR
  (
     C3 IS NULL       -- (2a)
   AND 
     C1 NOT IN        -- (2b)
      (SELECT C3 FROM R2 
       WHERE C3 IS NOT NULL) 
                      -- (end of 2b)
  )


<overview> {
  for $r1 in $r1s*, 
      $r2 in $r2s*
  where 
    ($r1/C1 = $r2/C3)          (: 1 :)
  or 
    (
      $r2 != <?null?>          (: 2a :)
    and 
      empty($r2s[C4 = $r1/C1]) (: 2b :)
    )
  return
    <contact> {
      $r1/C1, $r/C2, $2/C3, $r2/C4
    } </contact>
} </overview>

Table 2 shows how we can express this in SQL and XQuery - using table_name* as a shorthand for the outer product.

For the right outer join we just need to switch the right and the left columns in the formula. For the full outer join we combine the left column version with the right column version.

Else Logic for outer joins

The previous queries do not look like candidates for shortening using Else Logic. The subqueries use only part of the main data source not the whole so there is little scope for code reuse. The trick is to rewrite the filter statements into queries that can use Else Logic.

Table 3: Left join filters prepared for Else Logic filtering
SQLXQuery
...
WHERE
  C1 = C3             -- (1)
OR
  (
     C3 IS NULL       -- (2a)
   AND 
     C1 NOT IN        -- (2b)
      (SELECT C1 FROM R1*, R2* 
       WHERE C1 = C3)
                      -- (end of 2b)
  )




...
where 
  ($r1/C1 = $r2/C3)     (: 1 :)
or 
  (
    $r2 != <?null?>     (: 2a :)
  and 
    not($r1             (: 2b :)
      intersect (
        for $r1 in $r1s*, 
            $r2 in $r2s*
        where ($r1/C1 = $r2/C3)
        return $r1
        )
    )                   (: end of 2b :)
...

Table 3 shows how to rewrite the subqueries to prepare them for Else Logic. This is not an obvious step as the 2B subqueries in table 3 are more complex than those in table 2. However, for Else Logic these queries are a huge improvement.

  • The subquery membership test is on the same column in the sub query as in the main query (C1).
  • The data source of the subquery is identical to the main query.
  • The filter in the subquery reuses the inner join filter from 1.
Table 4: Left join filters using Else Logic filtering
SQLXQuery
...
WHERE
  C1 = C3             -- (1)
OR ELSE
  (
     C3 IS NULL       -- (2a)
   AND 
     C1 NOT EXISTS()  -- (2b)  
  )

...
where 
  ($r1/C1 = $r2/C3)              (: 1 :)
orelse 
  (
    $r2 != <?null?>              (: 2a :)
  and 
    not($r1 intersect (for $*))  (: 2b :)
  )
...

All these changes mean that we can use Else Logic to shorten the queries. The result is in table 4 (the for $* in the XQuery example is used to recall the default data source). Of course we do not have to use the exact syntax as used in there - in particular for XQuery - but table 4 does show that using subqueries for outer joins does not have to involve writing long subqueries.

The syntax proposed in table 4 also demonstrates that this approach to outer joins will work for joins that compare more than one field from each source. The C1 = C3 statement on the right side of the OR ELSE is reused in the subquery. Combine multiple expressions using a pair of brackets and all these statements will be part of the subquery. This approach works even on filtered outer joins, i.e. when the right source (in case of the left outer join) is filtered before combining it with the left source.

This approach solves another problem. The SQL 92 standard adopted the current specification for outer joins in the FROM statement because the previous method of appending (+) to the outer joined fields did not allow the programmer to specify the relationship between multiple outer joins in a query. In Else Logic you can specify this order using brackets and the new logical operators to specify this relationship. The Outer Join Tutorial contains examples.

The conclusion is that outer product / Else Logic approach does give the programmer a simple method for implementing outer joins. That the approach also gives us a much wider choice of result sets than just the outer join is a very nice bonus. All this is demonstrated in the tutorial .

Else Logic outer join syntax

Table 5: Left join filter syntax
SQLXQuery
...
WHERE
  C1 = C3            -- (1)
OR ELSE
  (C1 LEFT JOIN C3)  -- (2)  

...
where 
  ($r1/C1 = $r2/C3)     (: 1 :)
orelse
  $r1 leftJoin $r2      (: 2 :)
...

As the expression in table 4 has a fixed pattern for left, right and full outer joins we can use syntax (or functions, or macros) to simplify these expressions. Table 5 demonstrates this, combining 2a and 2b into 2.

So it comes to extra syntax in the end? Well, it is just an option. I think not using the extra syntax still results in manageable code. Furthermore: in this case all the extra syntax rules do more than just deliver the outer join.

I was tempted to put the whole outer join expression in a single statement, but this would only work in SQL when C1 and C3 are both the primary key of their data sources. This situation does not occur very often. So we stop at the (join filter expression) OR ELSE (join type expression) syntax as the end simplification in expressing joins in SQL. XQuery might have more room for this approach because the "rows" are single elements that we can compare whole and the join filter expression consists of XPath expressions starting at those row elements. However, a solution would have to work for more complex join types than those on a single field. Something I currently don't see a solution for.

Conditions of use, Copyrights © & Intellectual Property Rights