Select Star Logo
March 20, 2018

SQL Queries to Complex Objects with ARRAY() Function

Generic Placeholder for Profile Picture
March 20, 2018
Jacob Cohen
Director IT & Operations at HarperDB

Table of Contents

Blog Update

The HarperDB SQL ARRAY() function was released in HarperDB 1.1.0 on 4/18/2018 and will continue to be available in all newer releases of HarperDB.


How many times have you run into a situation where you wish you could do a SQL join without getting duplicate rows back? What if we could get a list “column” returned instead? HarperDB’s ARRAY() function enables just that. In this blog post we’re going to take a look at a basic example of people with addresses and phone numbers. 

Use Cases for the ARRAY() Function

Most existing systems have trouble transforming relational data into hierarchical data. Typically large batch processes or ETL jobs exist to perform these data transformations. HarperDB can perform these transformations out-of-the-box with a single SQL query. This query effectively performs the job of an ORM without the need for bloated software. Don’t think this is possible? Keep reading.  

How the ARRAY() Function Works

The HarperDB ARRAY() function, forthcoming in a future release, is an aggregate function, similar to COUNT, SUM, AVG. The difference is that while standard aggregate functions will return computation results, ARRAY() returns a list of data as a field. While this may not be intuitive to those, like myself, who have been using SQL for years, it does enable the developer to create complex JSON objects with a single query. Let’s take a look at an example use case… 

Example Data

We’ll be working with People, Phone Numbers, and Addresses. Each Address and/or Phone Number links back to a single Person. We have 10 person records, each with one or more phone numbers and addresses for a total of 20 addresses and 24 phone numbers.

Array Example ERD
Array Example ERD

Connecting Person and Phone Number

Let’s say I want to get all of the phone numbers for a person with ID 1. That’s fairly simple, I just query the phone number table for that person. But what happens if I also want to get the person data? I have to execute two queries and connect the data in my application.

SELECT * FROM person WHERE person_id = 1
SELECT * FROM phone WHERE person_id = 1

Now what happens if I want to get all people and all of their phone numbers. While I’d like to do a simple join, I can’t, because I’d end up with duplicate person data. 

SELECT * FROM person LEFT JOIN phone ON person.person_id = phone.person_id

So, again, I have to run two queries and aggregate the data together in my application. 

In HarperDB, we have the ARRAY() aggregate function which allows us to return this data, with no duplicates, in a single query. Remember, because ARRAY() is an aggregate function that we need to have a GROUP BY clause specified. In this case, since we are selecting multiple person fields, we need to specify all of them in our GROUP BY clause. Since we included our hash, person_id, we will safely retrieve each person record.

SELECT
  person.person_id,
  person.first_name,
  person.middle_name,
  person.last_name,
  person.saluation,
  person.dob,
  ARRAY({
    type: addr.address_type,
    addressLine1: addr.address_line_1,
    addressLine2: addr.address_line_2,
    city: addr.city,
    state: addr.state,
    zip: addr.zip_code,
  }) as address
FROM
  arr.person AS person
    LEFT JOIN arr.address AS addr
      ON person.person_id = addr.person_id
GROUP BY
  person.person_id,
  person.first_name,
  person.middle_name,
  person.last_name,
  person.saluation,
  person.dob

This returns a list of complex JSON objects where each Person object contains a list of Phone objects. For example, the complex object for person ID 1 would look like this: 

{
  "person_id": 1,
  "first_name": "Doug",
  "middle_name": "James",
  "last_name": "Henley",
  "saluation": "Mr.",
  "dob": "8/15/57",
  "address": [
    {
      "type": "MAILING",
      "addressLine1": "94317 Roxbury Court",
      "addressLine2": "Apt 102",
      "city": "Tampa",
      "state": "FL",
      "zip": 33625
    },
    {
      "type": "MAILING",
      "addressLine1": "35 Elgar Court",
      "city": "Arvada",
      "state": "CO",
      "zip": 80005
    }
  ]
}

Connecting Person, Phone Number, and Address

Now that we’ve shown how to aggregate list data from a single table let’s take a look at how we can retrieve multiple lists within our complex JSON objects. Ordinarily, if I wanted to pull data for person, phone, and address, then I would need three SQL queries.

SELECT * FROM person WHERE person_id = 1
SELECT * FROM phone WHERE person_id = 1
SELECT * FROM address WHERE person_id = 1

Now, if I were to put all three of those tables into a JOIN statement, I would receive a lot of duplicate data across all three tables. Take a look, here, at what is returned by the below SQL statement.

SELECT *
FROM person
  LEFT JOIN phone
    ON person.person_id = phone.person_id
  LEFT JOIN address
    ON person.person_id = address.person_id

Moving back to HarperDB we can query with the ARRAY() function to help us out with this. However, because we are joining across multiple tables we may still see some duplicate data in the phone and address lists. This is the inherent nature of SQL JOINS. In order to solve this problem, HarperDB created the DISTINCT_ARRAY() wrapper function. This function can be placed around a standard ARRAY() function call to ensure a distinct (deduplicated) results set is returned. Now to create our complex Person object with lists of both Phone and Address we can write a SQL statement like this:  

SELECT
  person.person_id,
  person.first_name,
  person.middle_name,
  person.last_name,
  person.saluation,
  person.dob,
  DISTINCT_ARRAY(ARRAY({
    type: addr.address_type,
    addressLine1: addr.address_line_1,
    addressLine2: addr.address_line_2,
    city: addr.city,
    state: addr.state,
    zip: addr.zip_code,
  })) as address,
  DISTINCT_ARRAY(ARRAY({
    type: phone.phone_type,
    num: phone.number,
    primaryFlag: phone.primary_flag,
  })) as phone
FROM arr.person AS person
  LEFT JOIN arr.address AS addr
    ON person.person_id = addr.person_id
  LEFT JOIN arr.phone AS phone
    ON person.person_id = phone.person_id
GROUP BY
  person.person_id,
  person.first_name,
  person.middle_name,
  person.last_name,
  person.saluation,
  person.dob

The complex object for Person ID 1 returned from the above query looks like this

{
  "person_id": 1,
  "first_name": "Doug",
  "middle_name": "James",
  "last_name": "Henley",
  "saluation": "Mr.",
  "dob": "8/15/57",
  "address": [
    {
      "type": "MAILING",
      "line1": "94317 Roxbury Court",
      "line2": "Apt 102",
      "city": "Tampa",
      "state": "FL",
      "zip": 33625
    },
    {
      "type": "MAILING",
      "line1": "35 Elgar Court",
      "city": "Arvada",
      "state": "CO",
      "zip": 80005
    }
  ],
  "phone": [
    {
      "type": "REFERENCE",
      "num": "926-647-6907",
      "primaryFlag": 1
    },
    {
      "type": "HOME",
      "num": "737-377-6038",
      "primaryFlag": 0
    }
  ]
}

With a single query in HarperDB we were able to transform SQL data into a complex JSON object that can be used in your modern application! 

Sample Data

Here are links to CSVs of each table used in the above example. You can also view the data below.
person.csv
address.csv
phone.csv

Person Table

person_idfirst_namemiddle_namelast_namesaluationdob
1 Doug James Henley Mr. 8/15/57
2 Megan Creech 6/29/66
3 Michael Samuel Lang Mr. 9/18/68
4 Charles Jay Cohen Mr. 1/12/76
5 Gabby Sarah Hughes 9/30/82
6 Emily Alexandra Wood Mrs. 1/18/64
7 Samantha Grace Choi Mrs. 5/25/64
8 Hana Smith Ms. 3/12/72
9 Kent Richard Garrett 9/24/79
10 Kara Caitlin May Ms. 9/17/90

Address Table

address_idperson_idaddress_typeaddress_line_1address_line_2citystatezip_code
11MAILING94317 Roxbury CourtApt 102TampaFL33625
22BILLING9 Mayer Plaza#277WashingtonDC20430
33MAILING99 Cascade CrossingHartfordCT6152
44MAILING39094 Hoard Center#418FlushingNY11388
55MAILING6 Waubesa PointAuroraCO80045
66BILLING94209 Kinsman Place#135AtlantaGA30311
77MAILING526 Barnett HillWacoTX76711
88BILLING9 Luster Trail#348NashvilleTN37240
99BILLING33553 Talmadge HillBakersfieldCA93386
1010MAILING21900 Rusk DriveApt 8HarrisburgPA17121
111MAILING35 Elgar CourtArvadaCO80005
123BILLING9 Tennessee StreetTrentonNJ8619
136MAILING0 Old Gate AlleyApt 439Wilkes BarrePA18768
147BILLING3918 Messerschmidt WayApt 234Oklahoma CityOK73173
159BILLING41778 Stephen CircleSalt Lake CityUT84145
1610BILLING50 Tony TerraceSioux FallsSD57198
172MAILING6 Hanson TrailNashvilleTN37240
183BILLING0 Darwin Terrace#144MontpelierVT5609
195MAILING59265 Dakota CenterPittsburghPA15279
209MAILING369 Badeau RoadMiamiFL33283

Phone Table

phone_idperson_idphone_typenumberprimary_flag
1 1 REFERENCE 926-647-6907 1
2 2 REFERENCE 864-324-2292 1
3 3 WORK 540-908-1691 1
4 4 HOME 253-590-9734 1
5 5 CELL 302-785-7313 1
6 6 REFERENCE 670-198-4073 1
7 7 CELL 923-662-5491 1
8 8 REFERENCE 176-225-5902 1
9 9 WORK 228-536-6858 1
10 10 REFERENCE 175-549-9915 1
11 1 HOME 737-377-6038 0
12 2 WORK 603-492-5375 0
13 4 WORK 192-656-9676 0
14 5 REFERENCE 537-446-7971 0
15 7 WORK 627-936-7236 0
16 8 WORK 762-324-7571 0
17 10 WORK 521-906-6326 0
18 2 REFERENCE 390-785-1962 0
19 4 CELL 787-954-6675 0
20 7 WORK 168-382-4627 0
21 8 WORK 199-264-7443 0
22 2 REFERENCE 212-508-4836 0
23 4 WORK 493-724-1771 0
24 8 CELL 156-617-7276 0

While you're here, learn about HarperDB, a breakthrough development platform with a database, applications, and streaming engine in one unified solution.

Check out HarperDB