SUPPORT CALL US AT 650-479-5641 DOWNLOAD

Insights and Updates

Read the HarperDB teams’ thoughts on database technology topics such as SQL, NOSQL, Schema-less architecture, IoT, Big Data, and more

SQL Queries to Complex Objects with the ARRAY() Function

Posted by Jacob Cohen on March 20, 2018
Jacob Cohen
Find me on:

How many times have you run into a situation where you wish you could do a SQL join without getting duplicate rows back? What if we could get a list column” returned instead? HarperDB’s ARRAY() function enables just that. In this blog post we’re going to take a look at a basic example of people with addresses and phone numbers. 

SQl queries blog.png

Use Cases for the ARRAY() Function

Most existing systems have trouble transforming relational data into hierarchical data. Typically large batch processes or ETL jobs exist to perform these data transformations. HarperDB can perform these transformations out-of-the-box with a single SQL query. This query effectively performs the job of an ORM without the need for bloated software. Don’t think this is possible? Keep reading. 
 

How the ARRAY() Function Works

The HarperDB ARRAY() function, forthcoming in a future release, is an aggregate function, similar to COUNT, SUM, AVG. The difference is that while standard aggregate functions will return computation results, ARRAY() returns a list of data as a field. While this may not be intuitive to those, like myself, who have been using SQL for years, it does enable the developer to create complex JSON objects with a single query. Let’s take a look at an example use case…
 

Example Data

We’ll be working with People, Phone Numbers, and Addresses. Each Address and/or Phone Number links back to a single Person. We have 10 person records, each with one or more phone numbers and addresses for a total of 20 addresses and 24 phone numbers.
example data
{
  "person_id": 1,
  "first_name": "Doug",
  "middle_name": "James",
  "last_name": "Henley",
  "saluation": "Mr.",
  "dob": "8/15/57",
  "address": [
    {
      "type": "MAILING",
      "addressLine1": "94317 Roxbury Court",
      "addressLine2": "Apt 102",
      "city": "Tampa",
      "state": "FL",
      "zip": 33625
    },
    {
      "type": "MAILING",
      "addressLine1": "35 Elgar Court",
      "city": "Arvada",
      "state": "CO",
      "zip": 80005
    }
  ]
}

Connecting Person and Phone Number

Let’s say I want to get all of the phone numbers for a person with ID 1. That’s fairly simple, I just query the phone number table for that person. But what happens if I also want to get the person data? I have to execute two queries and connect the data in my application.
 
SELECT * FROM person WHERE person_id = 1
SELECT * FROM phone WHERE person_id = 1
 
Now what happens if I want to get all people and all of their phone numbers. While I’d like to do a simple join, I can’t, because I’d end up with duplicate person data.
 
SELECT * FROM person LEFT JOIN phone ON person.person_id = phone.person_id
So, again, I have to run two queries and aggregate the data together in my application.
 
In HarperDB, we have the ARRAY() aggregate function which allows us to return this data, with no duplicates, in a single query. Remember, because ARRAY() is an aggregate function that we need to have a GROUP BY clause specified. In this case, since we are selecting multiple person fields, we need to specify all of them in our GROUP BY clause. Since we included our hash, person_id, we will safely retrieve each person record.
SELECT
  person.person_id, 
  person.first_name,
  person.middle_name,
  person.last_name,
  person.saluation,
  person.dob, 
  ARRAY({
    type: addr.address_type,
    addressLine1: addr.address_line_1,
    addressLine2: addr.address_line_2,
    city: addr.city,
    state: addr.state,
    zip: addr.zip_code,
  }) as address 
FROM 
  arr.person AS person 
    LEFT JOIN arr.address AS addr 
      ON person.person_id = addr.person_id
GROUP BY 
  person.person_id,
  person.first_name,
  person.middle_name,
  person.last_name,
  person.saluation,
  person.dob
This returns a list of complex JSON objects where each Person object contains a list of Phone objects. For example, the complex object for person ID 1 would look like this: 
SELECT 
  person.person_id,
  person.first_name,
  person.middle_name,
  person.last_name,
  person.saluation,
  person.dob,
  DISTINCT_ARRAY(ARRAY({
    type: addr.address_type,
    addressLine1: addr.address_line_1,
    addressLine2: addr.address_line_2,
    city: addr.city,
    state: addr.state,
    zip: addr.zip_code,
  })) as address,
  DISTINCT_ARRAY(ARRAY({
    type: phone.phone_type,
    num: phone.number,
    primaryFlag: phone.primary_flag,
  })) as phone
FROM arr.person AS person 
  LEFT JOIN arr.address AS addr
    ON person.person_id = addr.person_id
  LEFT JOIN arr.phone AS phone
    ON person.person_id = phone.person_id
GROUP BY
  person.person_id,
  person.first_name,
  person.middle_name,
  person.last_name,
  person.saluation,
  person.dob

Connecting Person, Phone Number, and Address

Now that we’ve shown how to aggregate list data from a single table let’s take a look at how we can retrieve multiple lists within our complex JSON objects. Ordinarily, if I wanted to pull data for person, phone, and address, then I would need three SQL queries.
SELECT * FROM person WHERE person_id = 1
SELECT * FROM phone WHERE person_id = 1
SELECT * FROM address WHERE person_id = 1
Now, if I were to put all three of those tables into a JOIN statement, I would receive a lot of duplicate data across all three tables. Take a look, here, at what is returned by the below SQL statement.
SELECT * 
FROM person 
  LEFT JOIN phone 
    ON person.person_id = phone.person_id 
  LEFT JOIN address
    ON person.person_id = address.person_id 
Moving back to HarperDB we can query with the ARRAY() function to help us out with this. However, because we are joining across multiple tables we may still see some duplicate data in the phone and address lists. This is the inherent nature of SQL JOINS. In order to solve this problem, HarperDB created the DISTINCT_ARRAY() wrapper function. This function can be placed around a standard ARRAY() function call to ensure a distinct (deduplicated) results set is returned. Now to create our complex Person object with lists of both Phone and Address we can write a SQL statement like this:
  
SELECT
  person.person_id,
  person.first_name,
  person.middle_name,
  person.last_name,
  person.saluation,
  person.dob,
  DISTINCT_ARRAY(ARRAY({
    type: addr.address_type,
    addressLine1: addr.address_line_1,
    addressLine2: addr.address_line_2,
    city: addr.city,
    state: addr.state,
    zip: addr.zip_code,
  })) as address,
  DISTINCT_ARRAY(ARRAY({
    type: phone.phone_type,
    num: phone.number,
    primaryFlag: phone.primary_flag,
  })) as phone
FROM arr.person AS person
  LEFT JOIN arr.address AS addr
    ON person.person_id = addr.person_id
  LEFT JOIN arr.phone AS phone
    ON person.person_id = phone.person_id
GROUP BY
  person.person_id,
  person.first_name,
  person.middle_name,
  person.last_name,
  person.saluation,
  person.dob
 
The complex object for Person ID 1 returned from the above query looks like this
 {
  "person_id": 1,
  "first_name": "Doug",
  "middle_name": "James",
  "last_name": "Henley",
  "saluation": "Mr.",
  "dob": "8/15/57",
  "address": [
    {
      "type": "MAILING",
      "line1": "94317 Roxbury Court",
      "line2": "Apt 102",
      "city": "Tampa",
      "state": "FL",
      "zip": 33625
    },
    {
      "type": "MAILING",
      "line1": "35 Elgar Court",
      "city": "Arvada",
      "state": "CO",
      "zip": 80005
    }
  ],
  "phone": [
    {
      "type": "REFERENCE",
      "num": "926-647-6907",
      "primaryFlag": 1
    },
    {
      "type": "HOME",
      "num": "737-377-6038",
      "primaryFlag": 0
    }
  ]
}
With a single query in HarperDB we were able to transform SQL data into a complex JSON object that can be used in your modern application!
 

Sample Data

Person Table

person_id
first_name
middle_name
last_name
saluation
dob
1
Doug
James
Henley
Mr.
8/15/57
2
Megan
 
Creech
 
6/29/66
3
Michael
Samuel
Lang
Mr.
9/18/68
4
Charles
Jay
Cohen
Mr.
1/12/76
5
Gabby
Sarah
Hughes
 
9/30/82
6
Emily
Alexandra
Wood
Mrs.
1/18/64
7
Samantha
Grace
Choi
Mrs.
5/25/64
8
Hana
 
Smith
Ms.
3/12/72
9
Kent
Richard
Garrett
 
9/24/79
10
Kara
Caitlin
May
Ms.
9/17/90

 

Address Table

address_id
person_id
address_type
address_line_1
address_line_2
city
state
zip_code
1
1
MAILING
94317 Roxbury Court
Apt 102
Tampa
FL
33625
2
2
BILLING
9 Mayer Plaza
#277
Washington
DC
20430
3
3
MAILING
99 Cascade Crossing
Hartford
CT
6152
 
4
4
MAILING
39094 Hoard Center
#418
Flushing
NY
11388
5
5
MAILING
6 Waubesa Point
Aurora
CO
80045
 
6
6
BILLING
94209 Kinsman Place
#135
Atlanta
GA
30311
7
7
MAILING
526 Barnett Hill
Waco
TX
76711
 
8
8
BILLING
9 Luster Trail
#348
Nashville
TN
37240
9
9
BILLING
33553 Talmadge Hill
Bakersfield
CA
93386
 
10
10
MAILING
21900 Rusk Drive
Apt 8
Harrisburg
PA
17121
11
1
MAILING
35 Elgar Court
Arvada
CO
80005
 
12
3
BILLING
9 Tennessee Street
Trenton
NJ
8619
 
13
6
MAILING
0 Old Gate Alley
Apt 439
Wilkes Barre
PA
18768
14
7
BILLING
3918 Messerschmidt Way
Apt 234
Oklahoma City
OK
73173
15
9
BILLING
41778 Stephen Circle
Salt Lake City
UT
84145
 
16
10
BILLING
50 Tony Terrace
Sioux Falls
SD
57198
 
17
2
MAILING
6 Hanson Trail
Nashville
TN
37240
 
18
3
BILLING
0 Darwin Terrace
#144
Montpelier
VT
5609
19
5
MAILING
59265 Dakota Center
Pittsburgh
PA
15279
 
20
9
MAILING
369 Badeau Road
Miami
FL
33283
 

 

Phone Table

phone_id
person_id
phone_type
number
primary_flag
1
1
REFERENCE
926-647-6907
1
2
2
REFERENCE
864-324-2292
1
3
3
WORK
540-908-1691
1
4
4
HOME
253-590-9734
1
5
5
CELL
302-785-7313
1
6
6
REFERENCE
670-198-4073
1
7
7
CELL
923-662-5491
1
8
8
REFERENCE
176-225-5902
1
9
9
WORK
228-536-6858
1
10
10
REFERENCE
175-549-9915
1
11
1
HOME
737-377-6038
0
12
2
WORK
603-492-5375
0
13
4
WORK
192-656-9676
0
14
5
REFERENCE
537-446-7971
0
15
7
WORK
627-936-7236
0
16
8
WORK
762-324-7571
0
17
10
WORK
521-906-6326
0
18
2
REFERENCE
390-785-1962
0
19
4
CELL
787-954-6675
0
20
7
WORK
168-382-4627
0
21
8
WORK
199-264-7443
0
22
2
REFERENCE
212-508-4836
0
23
4
WORK
493-724-1771
0
24
8
CELL
156-617-7276
0

Topics: SQL, ORM


Comments: