We were excited to see the power of BigQuery receive a further boost in late 2020 with the release of 12 new BigQuery SQL functions. Google Cloud described these as “user-friendly SQL capabilities”.
What are SQL functions in BigQuery?
SQL functions are user-defined rules or requests used within the BigQuery platform to return results from large data sets. Using these functions makes it quicker for users to gather the insights they need from often complicated or overly large sources of data.
Try these 12 BigQuery SQL functions
So, let’s take a look at what’s now possible with these new SQL features.
1. Add table columns via DDL
New to BigQuery is the ability to add new columns via the ALTER TABLE DDL statement. This is something data professionals with a background in traditional on-prem database platforms would expect as standard, so nice to see that Google Cloud has acknowledged this.
We like the syntax for only adding a column if one doesn’t already exist, handy for idempotent deployments. Full support for records too. Learn more from the BigQuery documentation. |
2. TRUNCATE table
TRUNCATE TABLE is now supported, which will please those from an on-prem background. Unlike a DELETE DML statement in BigQuery which incurs a scan cost, TRUNCATE falls under the growing list of BigQuery free operations. We will certainly be using this one a lot.
|
To reduce query scan costs we recommend using TRUNCATE over delete whenever you need to delete the entire contents of a table. |
3. Unicode table naming
BigQuery now supports Unicode table naming. Acknowledging multi-language support, table names are no longer just restricted to letters, numbers and underscores.
Be careful with table naming and try and make sure they are consistent, readable and follow a naming convention within your organisation.
|
4. BigQuery federated tables just got (even) better
At Ancoris we love federated (EXTERNAL) tables in BigQuery; they act as a powerful, zero-data-engineering approach to easily ingest files from Google Cloud Storage, including commonly seen formats such as JSON and CSV.
Great to see that you can now create external tables using a DDL statement.
|
Here’s an example to create a table that reads csvs from two different storage URIs:
|
To learn more, see these examples of creating a federated table. We plan to do a tech how-to to over the coming weeks to show just what is possible.
5. EXPORT DATA from BigQuery to Google Cloud Storage using SQL
This is a little buried in the release notes, but in our opinion is a really powerful feature. You can now export the results of a BigQuery query to Google Cloud Storage using the new EXPORT DATA SQL command; all of the Bigquery supported data formats and compression types are supported.
Here’s an example taken from the BigQuery EXPORT DATA documentation.
|
Great to see there are no egress costs; you only pay for any BigQuery data scanned and data stored in GCS.
|
This appears to reinforce Google Cloud’s aspirations of reducing the need for complex data engineering, something we (and many of our clients) share. If you couple this with, say, a Cloud Function triggered from the target GCS bucket, this could provide a simple mechanism for delivering data into data pipelines outside of BigQuery.
6. EXECUTE IMMEDIATE (dynamic SQL)
BigQuery now supports the execution of dynamic SQL. The syntax is very familiar, especially to those from a MS SQL background.
|
See BigQuery dynamic SQL support for further examples.
As a general rule, we are very cautious of using dynamic SQL and we try and avoid it where possible (mainly because of the risk of SQL injection and it tends to make code more difficult to read and debug).
|
7. Authorized User-Defined Functions (UDFs)
For those of you not familiar with BigQuery UDFs, these are a scalar function in BigQuery that allow you to operate over data using either SQL or Javascript (and associated Javascript libraries).
BigQuery now supports authorized UDFs, which allows authorized consumers (via IAM) to query tables in the dataset, even if the user who calls the UDF does not have access to those tables. Those of you from an MS SQL background, this is similar to permissioning say, a table-value function in SQL Server; often these are used as a security layer over (locked down) tables and enforce row/column level security.
8. Duplicate column names in query results
BigQuery will now allow you to select the same (unaliased) column more than once in a query by appending a suffix of _n where n is the number of duplications observed.
9. New BigQuery LAST_DAY DATE function
BigQuery already has excellent support for DATE manipulations using SQL. It looks like in this release, Google Cloud has recognised a common business-lead use case; finding the last date for a given period e.g. the last day of the month or last day of the week.
The new LAST_DAY function has intuitive syntax, and is easy to read:
|
Note by default, this will return the next Saturday because weeks start on Sunday in BigQuery. To force the week to start on Monday: |
select last_day(’2020-10-23’, |
10. Date arithmetic in BigQuery
We are used to using data_add and date_sub functions for doing date arithmetic, but you can now do this using the + and - operators.
|
11. New BigQuery string functions
A number (14 in all) of new BigQuery SQL functions for manipulating strings. Here are the ones we particularly like:
BigQuery concatenation operator “||”
As a team we think this method for concatenating strings is easier to read than the concat function.
|
BigQuery INSTR
This returns the index of a search value in a string. Note the first character in the string is indexed as 1 and 0 is returned if the string is not found.
The third argument is the position and denotes where to start the search. The default is 1. If a negative number is supplied, this denotes to start -n chars from the end of the string.
The last argument is the occurrence and if supplied specifies the occurrence of the string to search for e.g. 2 implies to find the index of the second occurrence of the search value.
|
BigQuery Soundex
This made our list because we all thought it’s pretty neat. This function returns the phonetic of a word represented as a soundex code (alpha followed by 3 digits). The algorithm was first patented in 1918 and is useful in fuzzy logic matching, for example to detect typos of people’s name:
|
12. Expanded INFORMATION SCHEMA
INFORMATION_SCHEMA, specified in the ANSI SQL standard, allows users to query the metadata about various objects or entities that contain or reference data, such as tables, views, stored procedures and user-defined functions. Here is the BigQuery INFORMATION_SCHEMA documentation.
BigQuery has made this easier with the introduction of a number of new objects that cover tables, views, routines (stored procedures, UDFs) and datasets.
|
Note the tables are uppercase (BigQuery is case sensitive). |
Our opinion
In summary, a really good addition to the BigQuery SQL arsenal and we will certainly be using this on client projects. Some of these topics we will cover in our upcoming Data how-to series.
Read the official release notes