The PostgreSQL CROSS JOIN
clause is a type of join that returns the Cartesian product of two tables.
What is a Cartesian product?
A Cartesian product is the result of combining every entry in one dataset with every entry in another. If dataset A has m
entries and dataset B has n
, the Cartesian product will have m*n
combinations. For instance, if m = 3
and n = 3
, the Cartesian product will have 3 * 3 = 9
combinations.
The CROSS JOIN
generates every possible combination between two datasets. Some examples include creating a list of clothing items and their variations for a clothing store or pairing students with all available subjects.
Let’s see an example using a PostgreSQL database.
Note: This article assumes you have a basic understanding of SQL queries and access to a PostgreSQL database. The queries are being executed using TablePlus.
Suppose we have an e-commerce store and want to store basic product details separately from the available options. We can do this using two tables: products to store the product details and product_options for the different product options.
The following SQL queries will create and populate the tables:
Create a products table.
CREATE TABLE Products ( id INT PRIMARY KEY, name VARCHAR(255) );
Create a product_options table.
CREATE TABLE Product_options ( id INT PRIMARY KEY, option VARCHAR(255) );
Populate the products table.
INSERT INTO products (id, name) VALUES(1, 'UrbanWave Slim-Fit Jeans'), (2, 'VelvetBliss Maxi Dress'), (3, 'StrideFlex Running Shoes');
Populate the product_options table.
INSERT INTO product_options (id, option) VALUES(1, 'Size X'), (2, 'Size XL'), (3, 'Color Red'), (4, 'Color Yellow'), (5, 'Material Cotton'), (6, 'Material Wool');
You should now have two tables - products and product_options - with three and six entries, respectively.
products:
product_options:
For this e-commerce store, we want to display a list of all possible product combinations, including the product id, name and option. We can achieve this with a query using the CROSS JOIN
statement.
Run the statement below:
SELECT
products.*, product_options.option
FROM
products
CROSS JOIN product_options;
This will return a cartesian product of the two tables.
The returned table contains 18 rows (3×6), where each row combines an entry from the products table (id, name) with an entry from the product_options table (options). For example, we can see that UrbanWave Slim-Fit Jeans
has an entry for every option in the product_options table and vice-versa.
It is important to note that using a cross-join clause can have a significant impact on the database performance as the query will go through the second table every time for each entry on the first table. If one or both tables has entries entering millions, this could significantly slow doe the performance.
CROSS-JOIN Empty tables
One observation I made while experimenting with the CROSS JOIN
clause is that it returns an empty set if either of the two datasets is empty. This is expected because if either dataset has zero rows, the Cartesian product (m\n*) will also have zero rows. However, it would be useful to have a workaround for cases where one dataset is empty but you still want to see the available data from the non-empty dataset. For example, if I have products but no options added yet, it would be helpful if my CROSS JOIN
query could be modified to return the list of products as is.
My current approach involves using separate queries: one to select all products, another to select all product options, and a CROSS JOIN
to generate the Cartesian product, which returns an empty set if one of the tables is empty.
It is important to note that using a CROSS JOIN
can significantly impact database performance, as the query processes every entry in the second table for each entry in the first table. If either or both tables contain millions of entries, this can greatly reduce performance and slow down the query execution.
Have you used the CROSS JOIN
clause before? Feel free to share any insights or ideas in the comments!