Connect to a Custom SQL Query
For most databases, you can connect to a specific query rather than the entire data set. Because databases have slightly different SQL syntax from each other, the custom SQL you use to connect to one database might be different from the custom SQL you might use to connect to another. However, using custom SQL can be useful when you know exactly the information you need and understand how to write SQL queries.
Though there are several common reasons why you might use custom SQL, you can use custom SQL to union your data across tables, recast fields to perform cross-database joins, restructure or reduce the size of your data for analysis, etc.
For Excel and text file data sources, this option is available only in workbooks that were created before Tableau Desktop 8.2 or when using Tableau Desktop on Windows with the legacy connection. To connect to Excel or text files using the legacy connection, connect to the file, and in the Open dialog box, click the Open drop-down menu, and then select Open with Legacy Connection.
NOTE: Beginning with Tableau 2020.2, legacy Excel and Text connections are no longer supported. See the Legacy Connection Alternatives document in Tableau Community for alternatives to using the legacy connection.
After connecting to your data, double-click the New Custom SQL option on the Data Source page.
Type or paste the query into the text box. The query must be a single SELECT* statement.
When finished, click OK.
When you click OK, the query runs and the custom SQL query table appears in the logical layer of the canvas. Only relevant fields from the custom SQL query display in the data grid on the Data Source page.
For more information about the logical and physical layers of the canvas, see The Tableau Data Model.
Combine your tables vertically (union)
If you need to append data to each other, you can use the union option in the physical layer of the canvas in Tableau. In some cases your database does not support this option, so you can use custom SQL instead.
For example, suppose you have the following two tables: November and December.
NOVEMBER | DECEMBER |
---|---|
You can use the following custom SQL query to append the second table, December, to the first table, November:
SELECT * FROM November UNION ALL SELECT * FROM December
The result of the query looks like this in the data grid:
For more information about the union option, see Union Your Data.
Change the data type of a field to do a cross-database join
When you want to perform a join between two tables in the physical layer of the canvas, the data type of the fields you join on must be the same. In cases when the data type of the fields are not the same, you can use custom SQL to change the data type (cast) the field before performing the join.
For example, suppose you want to join two tables, Main and Sub, using the Root and ID fields, respectively. The Root field is a number type and the ID field is a string type. You can use the following custom SQL query to change the data type of Root from a number to a string so that you can join the Main and Sub tables using the Root and ID fields.
SELECT [Main].[Root] AS [Root_Number]
CAST([Main].[Root] AS INT] AS [Root_String]
FROM [Main]
The result of this query shows the original Root field and the Root field cast as a string.
For more information about joins and cross-database joins, see Join Your Data.
Reduce the size of your data
When working with very large data sets, sometimes you can save time while working with your data if you reduce its size first.
For example, suppose you have a large table called FischerIris. You can use the following custom SQL query to retrieve the specified columns and records thereby reducing the size of the data set that you connect to from Tableau.
SELECT
[FischerIris].[Species] AS [Species],
[FischerIris].[Width] AS [Petal Width],
COUNT([FischerIris].[ID]) AS [Num of Species]
FROM [FischerIris]
WHERE [FischerIris].[Organ] = 'Petal'
AND [FischerIris].[Width] > 15.0000
GROUP BY [FischerIris].[Species], [FischerIris].[Width]
Restructure your data (pivot)
In some cases, you might be working with a table that needs to be restructured before analysis. Though this type of task can be done in the physical layer of the canvas in Tableau by using options like pivot, your database might not support it. In this case, you can use custom SQL instead.
For example, suppose you have the following table:
To change its structure and optimize your data for analysis in Tableau, you can use the following custom SQL query:
SELECT Table1.Season ID AS [Season ID],
Table1.Items - Don't like AS [Quantity],
"Don't Like" AS [Reason]
FROM Table1
UNION ALL
SELECT Table1.Season ID AS [Season ID],
Table.Items - Defective AS [Quantity],
"Defective" AS [Reason]
FROM Table1
UNION ALL
SELECT Table1.Season ID AS [Season ID],
Table1.Items - Too big AS [Quantity],
"Too Big" AS [Reason]
FROM Table1
UNION ALL
SELECT Table1.Season ID AS Season ID,
Table1.Items - Too small AS [Quantity]
"Too Small" AS [Reason]
FROM Table1
The result of the query looks like this in the data grid:
For more information about the pivot option, see Pivot Data from Columns to Rows.
Combine (join) and aggregate your data
If you need to combine tables and aggregate your data, you can use both a join and default aggregation type options in the physical layer of the canvas in Tableau. In some cases you might need to use custom SQL instead.
For example, suppose you have the following two tables: Orders and Vendors.
ORDERS | VENDORS |
---|---|
You can use the following custom SQL query to find a count on the number of orders and do a left join on the Orders and Vendors tables:
SELECT Vendors.Name,COUNT(Orders.Order) AS Number Of Orders
FROM Orders
LEFT JOIN Vendors
ON Orders.VendorID=Vendors.VendorID
GROUP BY Name;
The result of the query looks like this:
For more information about joins, see Join Your Data.
Errors when duplicate columns are referenced
If your custom SQL query references duplicate columns, you may get errors when trying to use one of the columns in your analysis in Tableau. This will happen even if the query is valid. For example, consider the following query:
SELECT * FROM authors, titleauthor WHERE authors.au_id = titleauthor.au_id
The query is valid, but the au_id field is ambiguous because in this case it exists in both the “authors” table and the “titleauthor” table. Tableau will connect to the query but you will get an error anytime you try to use the au_id field. This is because Tableau doesn’t know which table you are referring to.
Note: It is a best practice to define column aliases with an AS clause whenever possible in a Custom SQL Query. This is because each database has its own rules when it comes to automatically generating a column name whenever an alias is not used.
To edit a custom SQL query
- On the data source page, in the canvas, double-click the custom SQL query in the logical layer.
- Hover over the custom SQL table in the physical layer until the arrow displays.
- Click the arrow and then select Edit Custom SQL Query.
- In the dialog box, edit the custom SQL query.
To change a custom SQL query name
When you drag a custom SQL query to the logical layer of the canvas, Tableau gives it a default name: Custom SQL Query, Custom SQL Query1, and so on. You can change the default name to something more meaningful.
- On the data source page, in the logical layer of the canvas, select the drop-down arrow in the custom SQL query table and select Rename.
- Enter the name you want to use for your custom SQL query.
You can use parameters in a custom SQL query statement to replace a constant value with a dynamic value. You can then update the parameter in the workbook to modify the connection. For example, you may connect to a custom SQL query that provides web traffic data for a particular page that is specified by a pageID. Instead of using a constant value for the pageID value in the SQL query, you can insert a parameter. Then after finishing the connection, you can show a parameter control in the workbook. Use the parameter control to switch out the pageID and pull in data for each page of interest without having to edit or duplicate the connection.
In Tableau Desktop, you can create a parameter directly from the Custom SQL dialog box or use any parameters that are part of the workbook. If you create a new parameter, it becomes available for use in the workbook just like any other parameter. See Create Parameters to learn more.
For web authoring (in Tableau Online or Tableau Server), you can use an existing parameter published from Tableau Desktop. You cannot create a new parameter in web authoring.
To add a parameter to a custom SQL query
- On the data source page, in the canvas, hover over the table until the edit icon displays, and then click the edit button.
- At the bottom of the dialog box, click Insert Parameter.
- Select a constant value in the SQL statement and then, from the Insert Parameter drop-down menu select the parameter you want to use instead. If you have not created a parameter yet, select Create a new parameter. Follow the instructions in Create Parameters to create a parameter.
Note: Parameters can only replace literal values. They cannot replace expressions or identifiers such as table names.
In the example below, the custom SQL query returns all orders that are marked as Urgent priority. In the custom SQL statement, the order priority is the constant value. If you want to change the connection to see the High priority orders, you would have to edit the data source.
Instead of creating and maintaining many variations of the same query, you can replace the constant order priority value with a parameter. The parameter should contain all of the possible values for Order Priority.
After you create a parameter, you can insert it into the SQL statement to replace the constant value.
After you finish editing the connection, the new parameter is listed in the Parameters area at the bottom of the Data pane and the parameter control displays on the right side of the view. As you select different values, the connection updates.
Note: If you are using an extract, you must refresh the extract in order to reflect changes to the parameter. Publishing a data source that uses Custom SQL parameters includes the parameters. The parameters are transferred to any workbooks that connect to the data source.
Tableau Catalog support for custom SQL
Starting in 2019.3, Tableau Catalog is available as part of the Data Management offering for Tableau Server and Tableau Online. For more information about Tableau Catalog, see "About Tableau Catalog" in the Tableau Server or Tableau Online Help.
Supported queries
Catalog supports custom SQL queries that meet the ANSI SQL-2003 standard, with three known exceptions:
- Time zone expressions
- Multiset expressions
- Tableau parameters
Starting in 2021.4, Tableau Catalog also supports use of the Transact-SQL (T-SQL) dialect in Custom SQL, with the following exceptions:
- Hints
- FOR clauses
- OPENROWSET, OPENXML, and OPENJSON functions
- ODBC scalar functions
- FOR SYSTEM_TIME
- TABLESAMPLE
- MATCH expression
- CONTAINS expression
- FREETEXT expression
Supported features and functions
Catalog supports the following additional functionality for data sources, workbooks, and flows with connections that use the MySQL or PostgreSQL drivers, for example, Amazon Aurora for MySQL, Amazon RedShift, Pivotal Greenplum Database, MemSQL, Denodo, and others.
- MySQL GROUP_CONCAT function
- PostgreSQL arrays
- PostgreSQL EXTRACT() function
Other custom SQL scenarios and functionality might work, but Tableau doesn't specifically test for or support them.
Supported lineage
When an asset uses custom SQL, a message with a Show Custom SQL Query button appears on the Lineage tab of the asset page. Click the button to see the custom SQL used in the connection. Then, if you would like to copy the custom SQL to your clipboard, click Copy.
Some types of custom SQL can cause the upstream lineage to be incomplete. When this happens, a message appears with that information. Field details cards might not contain links to connected columns, or might not show any connected columns at all. Column details cards might not contain links to fields that use the column, or might not show any fields at all.
If you are examining a table’s lineage, note that Catalog doesn't support showing column information in the lineage for table metadata gathered using custom SQL. However, if other assets use the same table and don’t use custom SQL, Tableau Catalog might be able to display information about the columns that it has discovered through these other assets.
In the following screenshot, the factAccountOpportunityByQuarter table was indexed because it’s used by a data source. However, because it’s referenced by a custom SQL query, the column information isn't available.
In a case where more than one data source, workbook, or flow uses a table, any of the assets downstream from that table that uses a custom SQL query are excluded when column-level filters are applied. As a result, fewer downstream assets show in the lineage than are actually used.
No comments:
Post a Comment