More

    Standard vs Legacy SQL in BigQuery

    Understanding the difference between standard and legacy SQL in the context of Google Cloud BigQuery

    Photo by Sunder Muthukumaran on Unsplash

    BigQuery is a managed Data Warehouse service on Google Cloud Platform that allows organisations to persist their data as well as analysts to access it and extract valuable information.

    While I’ve started working with BigQuery I came across with two terms that were not very clear to me back then, namely Standard and Legacy SQL. These are essentially two dialects supported by BigQuery and have different syntax, semantics and functions.

    Legacy vs Standard SQL

    In the past, BigQuery used to execute queries using a non-standard SQL dialect, called BigQuery SQL. However, since the release of BigQuery 2.0, the service now supports standard SQL whilst the previous BigQuery SQL was renamed to legacy SQL.

    Standard SQL is an ANSI compliant query language. It’s important to mention that at the moment, the preferred dialect for running queries on BigQuery is Standard SQL.

    Legacy SQL is still maintained mostly for backward compatibility purposes therefore, it is recommended to migrate to Standard SQL since we expect that at some point in the future, Legacy SQL will be deprecated. Features such as Data Definition Language (DDL) and Data Model Language (DML) are only supported by Standard SQL

    The main differences

    Every type in legacy SQL has an equivalent in standard SQL (and vice versa) which means that a type in legacy SQL has a different name in standard dialect. For the exact mapping between standard and legacy dialects you can refer to the official documentation.

    Additionally, the standard dialect has a smaller range of valid values of type TIMESTAMP compared to legacy SQL. The former only accepts values in the range in between 0001-01-01 00:00:00.000000 and9999-12-31 23:59:59.999999.

    The way you can escape characters — such as hyphens — in queries is also different between standard and legacy dialects. In the former, we use backticks (`) character whilst in the latter square brackets ([]).

    Additionally, legacy SQL uses a colon : as a separator when referencing the project name whilst standard dialect requires a period .

    #standardSQL
    SELECT *
    FROM `bigquery-public-data.samples.shakespeare`;
    #legacySQL
    SELECT *
    FROM `bigquery-public-data:samples.shakespeare`;

    It is also worth mentioning that the standard dialect does not support table decorators and some other wildcard functions.

    Advantages of Standard SQL

    As mentioned already, Standard SQL has several advantages over the Legacy SQL dialect. More specifically it supports

    • WITH clauses
    • User-defined SQL functions
    • Sub-queries in SELECT and WHERE clauses
    • Inserts, Updates and Deletes
    • More data types such as ARRAY and STRUCT
    • More accurate COUND(DISTINCT ..) clause (compared to EXACT_COUNT_DISTINCT of Legacy SQL dialect that used to have many significant limitations)
    • Correlated subqueries
    • and more complex JOIN predicates

    For hands-on examples around this functionality of Standard SQL dialect you can refer to the official BigQuery documentation.

    Changing the default dialect

    The default dialect on BigQuery is Standard SQL. However, this can be changed by including a prefix as part of your SQL queries. If you wish to switch to Legacy SQL you need to include the #legacySQL prefix before you specify the query. The corresponding prefix for Standard SQL is #standardSQL.

    Note that these prefixes must precede the query, are case-insensitive and a newline character is expected to be present in between the query and the prefix itself.

    As an example, consider the following query that makes use of the Legacy SQL dialect:

    #legacySQL
    SELECT
    weight_pounds,
    state,
    year,
    gestation_weeks
    FROM
    [bigquery-public-data:samples.natality]
    ORDER BY
    weight_pounds DESC
    LIMIT
    10;

    Final Thoughts

    BigQuery is definitely among the most popular cloud services on Google Cloud Platform since pretty much every modern organisation needs a managed Data Warehouse service.

    It is therefore quite important to take advantage of most of the features available that will help you solve problems efficiently and at scale.

    In today’s article, we discussed about one of the most fundamental aspects of BigQuery which is the SQL dialect used in order to run operations of the service. Note that we’ve only covered a small subset of the overall differences between legacy and standard SQL dialects.

    If you are still running on Legacy SQL I’d personally highly recommend migrating to the Standard SQL dialect since it is the recommended dialect by Google that also offers more powerful functionality. For the full list of differences, refer to the official guide “Migrating to standard SQL”.

    Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.

    Related articles you may also like

    Standard vs Legacy SQL in BigQuery Republished from Source https://towardsdatascience.com/standard-vs-legacy-sql-bigquery-6d01fa3046a9?source=rss----7f60cf5620c9---4 via https://towardsdatascience.com/feed

    Recent Articles

    spot_img

    Related Stories

    Stay on op - Ge the daily news in your inbox