There’s a lot of things to like about Snowflake, such as its ease of use, scalability and performance. What I also like are the many functionalities packed into its SQL dialect. One of my favourites I’ve encountered so far is the ability to easily query semi-structured data (e.g. JSON, XML).


For this tutorial, we will use the weather data sets which you get access to when you use the free trial, and which are stored in JSON format. The data is contained in the schema WEATHER in the SNOWFLAKE_SAMPLE_DATA database. There are several tables in the schema, but the table we’ll use is called DAILY_14_TOTAL.

Inspecting the JSON

Let’s inspect the JSON first by retrieving a single row of data. Let’s also create a table with only that one record while we’re at it, so that we retrieve the same record throughout the tutorial, even across different Snowflake sessions, as the LIMIT function does not always return the same result. Note that your result sets will be different from mine in this tutorial.

create table demo_db.public.sample_data
as
select *
from snowflake_sample_data.weather.daily_14_total
limit 1

This returns the following result:

In this case, the JSON string which we want to extract data from is populated in the column V. This column is defined with the VARIANT data type, which is used in Snowflake to store semi-structured data. If you are using the Snowflake Web Interface, you can click on the cell to get the JSON in a nice format:

There’s a lot of information contained in this one JSON string. When inspecting such a large JSON string, I recommend copying it into a good text editor of your choice, such as Atom or SublimeText. Below is a screenshot taken from Atom, where the majority of the data is collapsed, so we have a better overview of what information is in there, and which keys we need to refer to.

We can look at the indentation to identify the top-level keys. Another easier way is to collapse everything up until the top-level using the text editor:

There are 3 top-level keys — and conveniently enough for this tutorial — they all return a different type:

  • city returns another JSON string (which we can identify by the curly brackets).
  • data returns an array (which we can identify by the square brackets).
  • time returns a simple numeric value.

We’ll leave the data array for later, and have a look at the values for city and time first.

Querying JSON

Querying JSON in Snowflake is simple enough. Use the : operator to get the values associated with the keys we specify:

select
  V:city as city
, V:time as forecast_date
from
  demo_db.public.sample_data

The city key contains another JSON string:

{
  "coord": {
    "lat": 12.38333,
    "lon": 77.083328
  },
  "country": "IN",
  "id": 1264136,
  "name": "Malavalli"
}

We can chain together multiple : operators to get to the lower-level keys and retrieve the corresponding values, e.g.

select
  V:city:name as city_name
, V:city:country as country
, V:city:coord:lat as latitude
, V:city:coord:lon as longitude
, V:time as forecast_date
from
  demo_db.public.sample_data

Specifying and Converting the Data Types

By default Snowflake returns the values as a VARIANT data type. Use the :: operator at the end to specify the data types for each field.

We will also convert the time value into a timestamp. You may have noticed that the time field contains a large integer value. This value represents the number of seconds that have elapsed since the Unix epoch (1970-01-01 00:00:00 UTC). After converting the field into an INTEGER, use the TO_TIMESTAMP function to convert it into a timestamp.

select
  V:city:name::string as city_name
, V:city:country::String as country
, V:city:coord:lat::float as latitude
, V:city:coord:lon::float as longitude
, to_timestamp(V:time::integer) as forecast_date
from
  demo_db.public.sample_data

Flattening the Arrays

Now let’s see what to do with the array stored in the data key. An array is often used to store lists of data. Use the LATERAL FLATTEN function to flatten the data. We’ll also assign the v1 alias to the flattened structure.

select
  v1.*
from
  demo_db.public.sample_data
, lateral flatten(input => v:data) as v1

In this case, the JSON string in the data key basically contains 16 rows of data, one for each forecasted day. Each of these rows in turn contains another JSON string in the VALUE column.

{
  "clouds": 48,
  "deg": 7,
  "dt": 1489816800,
  "humidity": 53,
  "pressure": 938.77,
  "speed": 1.49,
  "temp": {
    "day": 293.52,
    "eve": 293.52,
    "max": 293.52,
    "min": 289.56,
    "morn": 293.52,
    "night": 289.56
  },
  "uvi": 14,
  "weather": [
    {
      "description": "scattered clouds",
      "icon": "03n",
      "id": 802,
      "main": "Clouds"
    }
  ]
}

Once we’ve flattened the data, we can query on it as if it were a derived table. So first we’ll query on the v1.value column to retrieve the JSON string, and then use the : operator again to extract the values we want. Note that temperatures are stored in Kelvin, so we have to subtract 273.15 to get the equivalent in Celsius.

select
  v:city:name::string as city_name
, v:city:country::String as country
, v:city:coord:lat::float as latitude
, v:city:coord:lon::float as longitude
, to_timestamp(V:time::integer) as forecast_date
-- new fields after flattening the data
, to_timestamp(v1.value:dt::integer) as forecasted_date
, v1.value:humidity::integer as humidity
, v1.value:temp:min::float - 273.15 AS temp_celsius_min
, v1.value:temp:max::float - 273.15 AS temp_celsius_max
, v1.value:weather as weather_array
from
  demo_db.public.sample_data
, lateral flatten(input => v:data) as v1

We almost have all the information we need now. However, the value in v1.value:weather is stored as another array, so we’ll have to flatten that as well to get the data in there (even if it only contains one row of data). To do so, we add another LATERAL FLATTEN function where we refer to the first one via its alias v1.

select
  v:city:name::string as city_name
, v:city:country::String as country
, v:city:coord:lat::float as latitude
, v:city:coord:lon::float as longitude
, to_timestamp(V:time::integer) as forecast_date
-- new fields after flattening the data
, to_timestamp(v1.value:dt::integer) as forecasted_date
, v1.value:humidity::integer as humidity
, v1.value:temp:min::float - 273.15 AS temp_celsius_min
, v1.value:temp:max::float - 273.15 AS temp_celsius_max
, v2.value:description::string as weather_desc
from
  demo_db.public.sample_data
, lateral flatten(input => v:data) as v1
, lateral flatten(input => v1.value:weather) as v2

Referencing Aliases

One other thing I’d like to point out, though it’s not specific to querying semi-structured data, is that Snowflake supports the referencing of aliases within the same query without the need for a derived table. Some other databases also support this (e.g. Teradata, EXASOL) but many others do not (due to historical reasons).

To illustrate how this works, let’s use the query we’ve written, but now on the original table instead of our one-row temp table. We’ll add a few WHERE clauses where we reference the aliases we defined in our SELECT clause.

select
  v:city:name::string as city_name
, v:city:country::String as country
, v:city:coord:lat::float as latitude
, v:city:coord:lon::float as longitude
, to_timestamp(V:time::integer) as forecast_date
-- new fields after flattening the data
, to_timestamp(v1.value:dt::integer) as forecasted_date
, v1.value:humidity::integer as humidity
, v1.value:temp:min::float - 273.15 AS temp_celsius_min
, v1.value:temp:max::float - 273.15 AS temp_celsius_max
, v2.value:description::string as weather_desc

from
  snowflake_sample_data.weather.daily_14_total
, lateral flatten(input => v:data) as v1
, lateral flatten(input => v1.value:weather) as v2

where city_name = 'Amsterdam'
and country = 'NL'
and date_trunc('day', forecast_date) = current_date

This retrieves the forecasts for Amsterdam of today for the next 2 weeks. These tables are actually kept up-to-date by Snowflake, so today really means today as of when you’re reading this.


As you can see, Snowflake makes it easy to work with data stored in JSON — or in another semi-structured — format. This can make life easier, as you can simply load raw JSON data (e.g. from an API) into Snowflake and start analysing. And when / if you decide on a data structure, you can then easily use SQL to transform the data without the need for any complex code.

One thought on “Snowflake: Querying JSON

Leave a Reply to Brett Lapierre Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.