Tableau 2020.2 has recently been released, and the feature I’m most excited about is Relationships. If you’ve ever have to work with a data source that contains a many-to-many relationship, or multiple one-to-many relationships, you would have encountered the problem of data duplication, and having to resort to LOD or COUNTD calculations to report measures in the correct level-of-detail. Relationships allow us to avoid these challenges.

You may have already come across several blogs that provides an introduction to relationships. This post aims to do the same, but from a slightly different angle. Instead of just explaining what Tableau is doing, we are going to do a little bit deeper to see how Tableau deals with relationships. To do so, we are going to use the performance recorder to analyse the SQL that Tableau issues to the data source.

If you’re not familiar with the terminology yet (e.g. what is a logical layer vs. a physical layer), I recommend reading through the ‘Layers of the data model’ and ‘Understanding the data model’ sections in Tableau’s documentation first.

The Problem with Joins

To follow along with the examples, you can download the Bookshop data set on Tableau’s website, which also describes how you should build up the data model, although we will only use a few of the sheets here:

  • Book
  • Award
  • Checkouts
  • Ratings
  • Edition
  • Sales Q1 / Q2 / Q3 / Q4

Our main problem here is that our data set consists of multiple one-to-many relationships. For example, a book can have multiple awards, but also multiple checkouts, ratings and sales. If we join these sheets in our data model, we end up with a lot of duplication. Just to make sure we’re on the same page, let’s go ahead and create a data model using traditional joins anyway.

Note that Tableau defaults to the logical layer from version 2020.2 onwards. This is the layer where we build relationships. To build joins instead, we need to access the physical layer by double-clicking on a logical table.

Click on the GIF for a higher-resolution image.

If we build out our joins based on the tables above:

We will end up with a data source with more than 1 billion rows:

This is obviously not what we want. We also can’t report on the measures, such as number of sales or number of ratings without resorting to LODs or count distinct measures, which is complex to build and bad for performance.

Relationships

This is the type of problem relationships are designed to solve. Building relationships is similar to building joins in previous versions of Tableau. Drag the tables onto the canvas and select the columns in the relationships. One important difference is that you don’t have to define if you want an inner, left, right or full join. You will have access to the full data in each logical table in the model, and what data you see in your viz is dependent on which dimensions and measures you use. We will look at this in more detail in our next post.

Click on the GIF for a higher-resolution image.

Let’s have a look at the Data Preview pane. You’ll notice that each logical table has its own data set. This is different to joins, where we end up with a single data set with all the columns. This separation of logical tables is something we’ll see in other areas of the user interface. For example, if we go to a worksheet and examine the data pane, we see that the data is organized by logical table first, and each logical table has its own set of dimensions and measures. There is also not a single overall [Number of records] field anymore; instead each logical table now has its own (Count) measure.

We can drag measures from the different logical tables into a single worksheet, and Tableau will now report the measures on the correct level-of-detail.

That is awesome, but what is Tableau really doing when we use relationships? To answer that question, we’ll create a performance recording and refresh the query. In the Performance Summary worksheet, make sure to adjust the slider at the top to include events that took less than 0.01 sec to finish. If we then only keep the ‘Executing Query’ events, we see that Tableau issues 5 different queries to the data source.

Instead of joining all the tables first before returning the aggregated measures, Tableau now issues a separate query to each of the logical tables. Each of these queries will then return an aggregated result set on the correct level-of-detail, before they are merged in Tableau. This, in essence, is what happens when you use relationships, and it is how Tableau avoids duplicating the data.

Since we are not using any dimension right now in our viz, the queries are now very simple. For example, this is the query to determine the number of sales:

 SELECT SUM(1) AS "cnt:Sales Q1_70FA8C1B42CD4457AA0864B75255D71A:ok"
 FROM "TableauTemp"."Sales Q1+" "Sales Q1+"
 HAVING (COUNT(1) > 0)

These queries get more complex when we start using dimensions in our view. For example, let’s add the field [Title] from the Book logical table and look at the Sales query.

 SELECT "t0"."Title" AS "Title",
  SUM(1) AS "cnt:Sales Q1_70FA8C1B42CD4457AA0864B75255D71A:ok"
 FROM "TableauTemp"."Sales Q1+" "Sales Q1+"
  INNER JOIN (
   SELECT "Sales Q1+"."ISBN" AS "ISBN (Sales Q1+)",
     "Book"."Title" AS "Title"
   FROM "TableauTemp"."Sales Q1+" "Sales Q1+"
    LEFT JOIN "TableauTemp"."Edition$" "Edition" ON ("Sales Q1+"."ISBN" = "Edition"."ISBN")
    LEFT JOIN "TableauTemp"."Book$" "Book" ON ("Edition"."BookID" = "Book"."BookID")
   GROUP BY 1,
     2
 ) "t0" ON ("Sales Q1+"."ISBN" IS NOT DISTINCT FROM "t0"."ISBN (Sales Q1+)")
 GROUP BY 1

The subquery "t0" contains the part which we are interested in. Because we are using the Title dimension from the Book logical table, and the Count measure from the Sales logical table, the query now has to pick up information from multiple tables. The joins within the subquery is then what we have defined as relationships in our model.

The query above may be a bit more complex than what you expected. This is because Tableau assumes there is a many-to-many relationship by default. Therefore Tableau runs the query in two steps. The subquery "t0" does not contain any measure, but instead returns all the unique ISBN – Title combinations. The query then retrieves the Sales for each of the Titles. This works, but this subquery can cause performance issues if we have a very large data set, especially if the number of combinations in our "t0" subquery is also very large.

However, all the relationships in our model can be defined as a one-to-many relationship, e.g. a books can have multiple editions (ISBNs), but each edition (ISBN) can only be related to a single book, etc. So let’s go ahead and change the cardinality for all the relationships.

Click on the GIF for a higher-resolution image.

After we changed the cardinality settings, this is what the Sales query looks like:

 SELECT "Book"."Title" AS "Title",
  SUM(1) AS "cnt:Sales Q1_70FA8C1B42CD4457AA0864B75255D71A:ok"
 FROM "TableauTemp"."Sales Q1+" "Sales Q1+"
  LEFT JOIN "TableauTemp"."Edition$" "Edition" ON ("Sales Q1+"."ISBN" = "Edition"."ISBN")
  LEFT JOIN "TableauTemp"."Book$" "Book" ON ("Edition"."BookID" = "Book"."BookID")
 GROUP BY 1

Since we have a one-to-many relationship (or more accurately, a many-to-one relationship if we start from the Sales table), Tableau now understands that using straightforward joins will not duplicate the Sales data. You should obviously only change the setting if you are sure you have a one-to-many relationship. If this setting is incorrect, you will get incorrect results due to data duplication. So if you’re unsure about the cardinality, leave it in the default setting.

Summary

Relationships in Tableau help us to avoid data duplication and to report measures in the correct level of detail when we have a many-to-many relationship, or multiple one-to-many relationships. Tableau does this by issuing separate queries to each of the logical tables that we use in our viz. Once you grasp this brilliantly simple concept, relationships become much less of a mystery, and we can feel more confident when building our models and exploring different use cases.

In our next post, we will explore in more detail how relationships work in the front end. An important concept we’ll explore is how Tableau deals with INNER / OUTER / FULL joins, so stay tuned!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.