Antijoins, semijoins, and some generalized quantifiers in SQL | DbSchema



Finding rows in one table that don't match any rows in another table is a common task, corresponding to questions like:

  • Which customers haven't placed any order?
  • Which orders have no shipments?

The SQL standard does not offer a built-in shortcut to write queries like these, which are usually called antijoins. A few database systems (such as Spark) support that term as a keyword in their SQL dialect.

In this article, we shall take a closer look at antijoins and some related problems, using both standard SQL and the visual query builder of DbSchema.

We shall use as running example a database with two tables: customers and orders. The relationship between these two entities is enforced by a foreign key constraint on customer_id in the Orders table.

Diagram showing Orders references Customer

If you'd like to review the foreign key concept, including how you can visually draw such diagrams in DbSchema, please refer to our prior tutorial on foreign keys.

'Not exists' using the DbSchema Query Builder

The DbSchema Query Builder allows you to easily construct an antijoin by picking 'not exists' on the connective arrow that links two tables.

Demo of  DbSchema Query Builder using 'not exists' (antijoin)

In the SQL generated by DbSchema query builder, namely

SELECT c.customer_id, c.name, c.email, c.country
FROM Customers c
WHERE  NOT EXISTS (
	SELECT 1
	FROM Orders o
	WHERE o.customer_id = c.customer_id  )

the inner SELECT 1 query refers to the outer table Customers alias c. This is called a correlated subquery, which produces fairly intuitive SQL that spells out what should be returned for every customer. SELECT 1 here merely returns the value 1 if there are any matching customer_id values in the Orders table for every customer in the outer query.

Brief diversion to set operators

A correlated subquery is not the only approach to the problem. If you are familiar with basic set theory, the SQL set operator EXCEPT (only available as MINUS in older Oracle versions, 19 and before) looks like a quicker solution to write down, especially if you're only interested in the IDs of the customers.

SELECT customer_id FROM Customers
EXCEPT
SELECT customer_id FROM Orders;

There is a downside to this approach if you need more than the customer_id value from Customers though. The SQL set operators only work on tables with the exact same number of columns. And these columns must be of compatible types. Since the Customer and Orders tables don't really have other columns in common besides customer_id, to get other information about the customers with no orders, we need another outer query, such as

SELECT * FROM Customers WHERE customer_id IN (
    SELECT customer_id FROM Customers
    EXCEPT
    SELECT customer_id FROM Orders
);

Antijoins using outer joins

There is a third way to write antijoins using left (or right) outer joins and selecting only the rows corresponding to the null values found in the referencing table (Orders in our example) in the so joined tables.

This type of query can also be built visually with the DbSchema query builder.

Demo of  DbSchema Query Builder writing an antijoin as a filtered left-join

The resulting SQL is slightly shorter, but requires a good recall of outer joins in order to comprehend.

SELECT c.customer_id, c.name, c.email, c.country
FROM Customers c
	LEFT OUTER JOIN Orders o ON ( o.customer_id = c.customer_id  )
WHERE o.order_id IS NULL

Semijoins, briefly

You may have noticed already in our DbSchema demos that Exists may also be selected instead of Not exists on the arrows connecting tables in the DbSchema query builder. The resulting queries are obvious counterparts to the antijoins. And they are usually called semijoins. In fact, some authors insist on calling antijoins anti-semijoins to emphasize that the antijoins are not the counterparts of some other kind of join.

The 'semi' prefix in the name of semijoin (and anti-semijoin, if you prefer to call them that) is meant to emphasize that only columns from one of the tables are included in the result of semijoin.

To recap, here is how semijoins and antijoins relate to other kinds of joins, as well as to set operators, in the form of a diagram.

R S A B C
OutputSQL set operators
(match on all attributes )
SQL join operators
(match on some attributes)
AR EXCEPT SR left anti-semijoin S
B (roughly)R INTERSECT SR NATURAL JOIN S or semijoins
(depends on attributes projected)
CS EXCEPT RR right anti-semijoin S
A, BRS LEFT OUTER JOIN R
B, CSS RIGHT OUTER JOIN R
A, B, CR UNION SS FULL OUTER JOIN R

In the summarizing table above, natural joins and semijoins occupy the same cell. This happened because there are in fact two dimensions of the output when it comes to joins: which rows are we interested in and which attributes. For antijoins, it is natural to keep (project on) just the attributes of the table we are subtracting from, since the subtracted table will have null attributes in the matched column. For natural joins and semijoins there are however at least three interesting choices: keep attributes from R, keep attributes from S, or from both.

Output rowsOutput attributesSQL join operators
BA, BR left semijoin S
BB, CR right semijoin S
BA, B, CR NATURAL JOIN S

Threshold counts (generalized quantifiers)

Instead of considering only the existence or non-existence of customers with orders, we may be interested in more nuanced categories, such as customers with few orders, where "few" is defined by a specified numeric threshold. To tackle such a problem, we can easily change the previous query we built in DbSchema query builder to something like the following.

Demo of  DbSchema Query Builder writing a generalized quantifier

Resulting in generated SQL like:

SELECT c.customer_id, c.name, c.email, c.country, count(DISTINCT o.order_id)
FROM Customers c
	LEFT OUTER JOIN Orders o ON ( o.customer_id = c.customer_id  )
GROUP BY c.customer_id, c.name, c.email, c.country
HAVING count(DISTINCT order_id) < 2;

In logic, 'there exists' (at least one) is called a quantifier. More general constraints like "there exist fewer than three" that we're dealing with in this section are called generalized quantifiers.

Finally, it is possible to write a correlated query equivalent of the above.

SELECT  c.customer_id, c.name, c.email, c.country,
        (SELECT COUNT(DISTINCT  o.order_id) FROM Orders o
         WHERE  o.customer_id = c.customer_id) AS order_count
FROM Customers c
WHERE  order_count < 2;

Such correlated subqueries are more difficult to represent graphically because they allow arbitrarily long SQL expressions to appear in column positions.

Conclusion

In this article, we explored antijoins, a common SQL pattern used to find rows in one table that have no matching rows in another. Using the Customers and Orders example, we demonstrated several equivalent approaches, including correlated NOT EXISTS subqueries, set operators such as EXCEPT, and selections on left outer joins. We also showed how several of these techniques can be constructed visually using the DbSchema Query Builder.

Along the way, we briefly introduced the related concept of semijoins and extended the discussion beyond simple existence tests to count-based conditions, such as finding customers with fewer than a specified number of orders. These generalized quantifier queries illustrate how SQL and the DbSchema Query Builder can express a wide range of relationship-based constraints between tables.

While different database systems and optimizers may favor one formulation over another, understanding these alternative representations helps you choose the style that is most readable and maintainable for your use case. Whether you prefer writing SQL by hand or building queries visually in DbSchema, antijoins and semijoins are essential tools for querying relational data effectively.

DbSchema ER Diagrams & Team Collaboration

Visual Desktop Software
DbSchema ER Diagram Try DbSchema
Visual Design & Schema Diagram

✓ Create and manage your database schema visually through a user-friendly graphical interface.

✓ Easily arrange tables, columns, and foreign keys to simplify complex database structures, ensuring clarity and accessibility.

GIT & Collaboration
Version Control & Collaboration

✓ Manage schema changes through version control with built-in Git integration, ensuring every update is tracked and backed up.

✓ Collaborate efficiently with your team to maintain data integrity and streamline your workflow for accurate, consistent results.

Data Explorer & Query Builder
Relational Data & Query Builder

✓ Seamlessly navigate and visually explore your database, inspecting tables and their relationships.

✓ Build complex SQL queries using an intuitive drag-and-drop interface, providing instant results for quick, actionable insights.

Interactive Documentation & Reporting
HTML5 Documentation & Reporting

✓ Generate HTML5 documentation that provides an interactive view of your database schema.

✓ Include comments for columns, use tags for better organization, and create visually reports.