Understanding the difference between standard and legacy SQL in the context of Google Cloud BigQuery
BigQuery is a managed Data Warehouse service on Google Cloud Platform that allows organisations to persist their data as well as analysts to access it and extract valuable information.
While I’ve started working with BigQuery I came across with two terms that were not very clear to me back then, namely Standard and Legacy SQL. These are essentially two dialects supported by BigQuery and have different syntax, semantics and functions.
Legacy vs Standard SQL
In the past, BigQuery used to execute queries using a non-standard SQL dialect, called BigQuery SQL. However, since the release of BigQuery 2.0, the service now supports standard SQL whilst the previous BigQuery SQL was renamed to legacy SQL.
Standard SQL is an ANSI compliant query language. It’s important to mention that at the moment, the preferred dialect for running queries on BigQuery is Standard SQL.
Legacy SQL is still maintained mostly for backward compatibility purposes therefore, it is recommended to migrate to Standard SQL since we expect that at some point in the future, Legacy SQL will be deprecated. Features such as Data Definition Language (DDL) and Data Model Language (DML) are only supported by Standard SQL
The main differences
Every type in legacy SQL has an equivalent in standard SQL (and vice versa) which means that a type in legacy SQL has a different name in standard dialect. For the exact mapping between standard and legacy dialects you can refer to the official documentation.
Additionally, the standard dialect has a smaller range of valid values of type
TIMESTAMP compared to legacy SQL. The former only accepts values in the range in between
0001-01-01 00:00:00.000000 and
The way you can escape characters — such as hyphens — in queries is also different between standard and legacy dialects. In the former, we use backticks (
`) character whilst in the latter square brackets (
Additionally, legacy SQL uses a colon
: as a separator when referencing the project name whilst standard dialect requires a period
Advantages of Standard SQL
As mentioned already, Standard SQL has several advantages over the Legacy SQL dialect. More specifically it supports
- User-defined SQL functions
- Sub-queries in
- Inserts, Updates and Deletes
- More data types such as
- More accurate
COUND(DISTINCT ..)clause (compared to
EXACT_COUNT_DISTINCTof Legacy SQL dialect that used to have many significant limitations)
- Correlated subqueries
- and more complex
For hands-on examples around this functionality of Standard SQL dialect you can refer to the official BigQuery documentation.
Changing the default dialect
The default dialect on BigQuery is Standard SQL. However, this can be changed by including a prefix as part of your SQL queries. If you wish to switch to Legacy SQL you need to include the
#legacySQL prefix before you specify the query. The corresponding prefix for Standard SQL is
Note that these prefixes must precede the query, are case-insensitive and a newline character is expected to be present in between the query and the prefix itself.
As an example, consider the following query that makes use of the Legacy SQL dialect:
BigQuery is definitely among the most popular cloud services on Google Cloud Platform since pretty much every modern organisation needs a managed Data Warehouse service.
It is therefore quite important to take advantage of most of the features available that will help you solve problems efficiently and at scale.
In today’s article, we discussed about one of the most fundamental aspects of BigQuery which is the SQL dialect used in order to run operations of the service. Note that we’ve only covered a small subset of the overall differences between legacy and standard SQL dialects.
If you are still running on Legacy SQL I’d personally highly recommend migrating to the Standard SQL dialect since it is the recommended dialect by Google that also offers more powerful functionality. For the full list of differences, refer to the official guide “Migrating to standard SQL”.
Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.
Related articles you may also like
Standard vs Legacy SQL in BigQuery Republished from Source https://towardsdatascience.com/standard-vs-legacy-sql-bigquery-6d01fa3046a9?source=rss----7f60cf5620c9---4 via https://towardsdatascience.com/feed