Back to Browse

Flattening nested arrays into Snowflake

2.9K views
Jan 14, 2022
4:33

In this video, see simple it is with Upsolver to flatten nested data and output it to Snowflake. Engineering teams at data-intensive companies choose Upsolver to make their complex data incredibly simple. Discover why. Start your free trial with no credit card required at https://app.upsolver.com/signup, or visit www.upsolver.com to learn more. TRANSCRIPTION Welcome to this Upsolver presentation, where we’re going to demonstrate flattening nested data in Upsolver, vs. doing it in Snowflake. We’ve already connected to an AWS S3 bucket that’s storing a large number of sales order records in JSON format. Let’s go ahead and take a look at some of the details for these records. Here are the ITEMS elements in the ORDER object. We only have two items that have been ordered. One has multiple delimited values per category, while the other is a blank. Let’s check out the schema that Upsolver auto-parsed the data from. Here you can see the field names, values, and types that Upsolver generated. In addition to that, Upsolver will by default give you a lot of information about your data and how it breaks down. Now we’ll look at the data specifically in ITEMS. Here we’ve broken out the schema elements for the ITEMS field on the left into their respective attributes below. As you can see, each one has already been given a type automatically; shows the density of values in it – top values, distinct values – and various other useful information. Now we’re going to illustrate how simple it is to flatten that nested data to send it to Snowflake. You have to generate a new output. In our databases section we select Snowflake. We’ve prefilled the data in for you, so we’ll just click NEXT. Now we’re going to go ahead and add all of the data fields, so we can see what Upsolver does with the nested data. Upsolver’s going to automatically ask us what we want to do with each of these nested elements. In this case we’re going to select LAST ELEMENT, which will flatten the entire array, as opposed to taking the first or concatenating values. This looks correct, so let’s add the fields. Here you see the schema information that was generated for the transformation. And here we see the flattened data. Let’s take a look at the SQL that Upsolver generated to see the details. Here you can see all the data items that are being mapped and the LAST ELEMENT function that flattens those nested arrays. Upsolver did this all for you, without you having to type anything. We’re going to go ahead and launch this job and check out the output in Snowflake. We’ve set up our connection information, and we’re going to kick it off. There’s nearly 7 billion events in this sample set, so we’re going to go ahead and trim it down just a little bit with this slider. We’ve pulled this over. We’ve got 200 million now. That’s fine for this simple task. Our job’s running now, so let’s check out the Upsolver output that’s in Snowflake. And then we’ll take a look at what it takes in Snowflake to generate the same result. So now that we’re in Snowflake we see our table nested to Snowflake has arrived. And here are all the data definitions that Upsolver generated automatically, including those nested items as flattened fields. Go ahead and preview that data. And this looks good. So that was how simple it is. Just a few clicks in Upsolver. Now let’s see what would be involved in doing it in Snowflake. First we have to create a table that’s a VARIABLE type that will hold the input JSON file of orders. Then we have to load the JSON data into that intermediary table. Next we create a new table that has all the discrete fields we need to load from our intermediary table. These dozens of statements had to be hand-coded. Now we have to manually map every field from the VARIANT field in the first table into the explicitly-named fields in the second table. If you make any mistakes, this will fail. If the inbound data is missing a field and you didn’t account for it in your code, this will fail. There’s a lot more error-prone manual work to design this within Snowflake directly. You can imagine how much more difficult this would be on an extremely wide table. The schema-on-read functionality in Upsolver eliminated all of this manual processing and flattened that data effortlessly – literally just a few clicks. What’s important to remember is that Upsolver is working WITH Snowflake in this regard. We are simplifying the task. And we’re reducing the cost to run that transformation. Thank you for watching this nested data example. Come to our Web site to find out more.

Download

1 formats

Video Formats

360pmp47.1 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Flattening nested arrays into Snowflake | NatokHD