- Home
- MySQL & MariaDB
- Relational Databases
- Converting comma separated fields to MySQL JSON – a case study
Converting comma separated fields to MySQL JSON – a case study
Feed: Planet MySQL.
Author: Gabriela D’Ávila.
This post is a case study of a job I had to do in a legacy application, it doesn’t mean it will apply to you, but it might.
This is a table of contents:
There are many ways to store a tree in a relational database, this is not by far the best option to do it, however it is still common to see it happen.
One way it is called materialized path, which consist with values separated by a delimiter, in my case a comma ,
.
You would have a tree stored in a manner like this:
id | parent_id | user_id | depth | asc_path | node |
---|---|---|---|---|---|
1 | 0 | 1 | 1 | ||
2 | 1 | 2 | 2 | ,1, | L |
3 | 1 | 13 | 2 | ,1, | R |
4 | 2 | 3 | 3 | ,2,1, | L |
5 | 2 | 61 | 3 | ,2,1, | R |
6 | 13 | 23 | 3 | ,13,1, | L |
7 | 13 | 22 | 3 | ,13,1, | R |
8 | 3 | 4 | 4 | ,3,2,1, | L |
9 | 3 | 156 | 4 | ,3,2,1, | R |
10 | 22 | 1568 | 4 | ,22,13,1, | L |
11 | 22 | 26 | 4 | ,22,13,1, | R |
12 | 23 | 1476 | 4 | ,23,13,1, | L |
13 | 23 | 690716 | 4 | ,23,13,1, | R |
14 | 61 | 1051 | 4 | ,61,2,1, | L |
15 | 61 | 62 | 4 | ,61,2,1, | R |
The column asc_path
stands for ascending path of a three in where which node has two other ones, not necessarily being a binary three, being stored in the database.
This column has commas in the beginning and in the end because how queries are made to search if an element is present in the path or not by using LIKE "%,id,%"
. If someone did a search to know if the number 2
was a node in any of the paths, without the commas, it would also return 23
, 62
and any other number containing 2.
Performance
The only way to make it a bit faster is having a FULLTEXT
index created in asc_path
. Because a BTREE
index starts indexing in the beginning of a string, since the presence of the wildcard %
in the string search it makes it in possible to use said index.
This is the graphical representation of the example above:
Searching
To search an specific element the query would be:
SELECT parent_id, user_id, depth, asc_path, node FROM tree WHERE asc_path LIKE '%,13,%';
Result:
parent_id | user_id | depth | asc_path | node |
---|---|---|---|---|
13 | 23 | 3 | ,13,1, | L |
13 | 22 | 3 | ,13,1, | R |
22 | 1568 | 4 | ,22,13,1, | L |
22 | 26 | 4 | ,22,13,1, | R |
23 | 1476 | 4 | ,23,13,1, | L |
23 | 690716 | 4 | ,23,13,1, | R |
Converting to a JSON array
Some databases, like PostgresSQL (section 9.42) have more modifiers functions to convert strings to JSON, in my case I wanted to store the ascending tree path in a JSON field which would give me the possibility of using JSON_CONTAINS(json_doc, val)
to know the records that have a given node in its path.
To do it, I had to transform the string in a JSON array.
1st step: remove the leading commas
Removing the leading commas, but before any update, lets test what we are doing:
SELECT parent_id, user_id, depth, asc_path, TRIM(BOTH ',' FROM asc_path) AS trimmed_commas FROM tree
Results:
parent_id | user_id | depth | asc_path | trimmed_commas |
---|---|---|---|---|
0 | 1 | 1 | ||
1 | 2 | 2 | ,1, | 1 |
1 | 13 | 2 | ,1, | 1 |
2 | 3 | 3 | ,2,1, | 2,1 |
2 | 61 | 3 | ,2,1, | 2,1 |
13 | 23 | 3 | ,13,1, | 13,1 |
13 | 22 | 3 | ,13,1, | 13,1 |
3 | 4 | 4 | ,3,2,1, | 3,2,1 |
3 | 156 | 4 | ,3,2,1, | 3,2,1 |
22 | 1568 | 4 | ,22,13,1, | 22,13,1 |
2nd step: add brackets to the string
A JSON array is formed around brackets []
, and we need to have it in our string to be a valid JSON document:
SELECT parent_id, user_id, depth, asc_path, TRIM(BOTH ',' FROM asc_path) AS trimmed_commas, CONCAT("[", TRIM(BOTH ',' FROM asc_path), "]") AS added_brackets FROM tree;
Results:
parent_id | user_id | depth | asc_path | trimmed_commas | added_brackets |
---|---|---|---|---|---|
0 | 1 | 1 | |||
1 | 2 | 2 | ,1, | 1 | [1] |
1 | 13 | 2 | ,1, | 1 | [1] |
2 | 3 | 3 | ,2,1, | 2,1 | [2,1] |
2 | 61 | 3 | ,2,1, | 2,1 | [2,1] |
13 | 23 | 3 | ,13,1, | 13,1 | [13,1] |
13 | 22 | 3 | ,13,1, | 13,1 | [13,1] |
3 | 4 | 4 | ,3,2,1, | 3,2,1 | [3,2,1] |
3 | 156 | 4 | ,3,2,1, | 3,2,1 | [3,2,1] |
22 | 1568 | 4 | ,22,13,1, | 22,13,1 | [22,13,1] |
3rd step: validate if the changes works
Let’s use JSON_VALID()
to see if it will accept our new string as a JSON, keep in mind that when the argument is NULL
the return is also NULL
:
SELECT parent_id, user_id, depth, asc_path, TRIM(BOTH ',' FROM asc_path) AS trimmed_commas, CONCAT("[", TRIM(BOTH ',' FROM asc_path), "]") AS added_brackets, JSON_VALID(CONCAT("[", TRIM(BOTH ',' FROM asc_path), "]")) AS json_valid FROM tree;
Results:
parent_id | user_id | depth | asc_path | trimmed_commas | added_brackets | json_valid |
---|---|---|---|---|---|---|
0 | 1 | 1 | ||||
1 | 2 | 2 | ,1, | 1 | [1] | 1 |
1 | 13 | 2 | ,1, | 1 | [1] | 1 |
2 | 3 | 3 | ,2,1, | 2,1 | [2,1] | 1 |
2 | 61 | 3 | ,2,1, | 2,1 | [2,1] | 1 |
13 | 23 | 3 | ,13,1, | 13,1 | [13,1] | 1 |
13 | 22 | 3 | ,13,1, | 13,1 | [13,1] | 1 |
3 | 4 | 4 | ,3,2,1, | 3,2,1 | [3,2,1] | 1 |
3 | 156 | 4 | ,3,2,1, | 3,2,1 | [3,2,1] | 1 |
22 | 1568 | 4 | ,22,13,1, | 22,13,1 | [22,13,1] | 1 |
22 | 26 | 4 | ,22,13,1, | 22,13,1 | [22,13,1] | 1 |
23 | 1476 | 4 | ,23,13,1, | 23,13,1 | [23,13,1] | 1 |
23 | 690716 | 4 | ,23,13,1, | 23,13,1 | [23,13,1] | 1 |
61 | 1051 | 4 | ,61,2,1, | 61,2,1 | [61,2,1] | 1 |
61 | 62 | 4 | ,61,2,1, | 61,2,1 | [61,2,1] | 1 |
Replacing 1st step and 2nd step with a function
So that your query gets easier to use and not messy, you can create a function, I decided to create to_json_array(input_string, delimiter_char)
:
Running the query only with to_json_array
on MySQL:
SELECT parent_id, user_id, depth, asc_path, to_json_array(asc_path, ',') AS to_json_array, JSON_VALID(to_json_array(asc_path, ',')) AS is_to_json_array_valid, node FROM tree;
Result:
parent_id | user_id | depth | asc_path | to_json_array | is_to_json_array_valid | node |
---|---|---|---|---|---|---|
0 | 1 | 1 | ||||
1 | 2 | 2 | ,1, | [1] | 1 | L |
1 | 13 | 2 | ,1, | [1] | 1 | R |
2 | 3 | 3 | ,2,1, | [2, 1] | 1 | L |
2 | 61 | 3 | ,2,1, | [2, 1] | 1 | R |
13 | 23 | 3 | ,13,1, | [13, 1] | 1 | L |
13 | 22 | 3 | ,13,1, | [13, 1] | 1 | R |
3 | 4 | 4 | ,3,2,1, | [3, 2, 1] | 1 | L |
3 | 156 | 4 | ,3,2,1, | [3, 2, 1] | 1 | R |
22 | 1568 | 4 | ,22,13,1, | [22, 13, 1] | 1 | L |
Disclaimer
This function is not native, and its use in production is not guaranteed.
Notice that the database returns the JSON as valid making it possible to convert that TEXT
to a new column asc_path_json
:
ALTER TABLE tree ADD COLUMN asc_path_json JSON AFTER asc_path; UPDATE tree SET asc_path_json = to_json_array(asc_path, ',');
Which gives us the ability to check more quickly if an item is in the path for that node:
SELECT * FROM tree WHERE json_contains(asc_path_json, "13");
Result:
id | parent_id | user_id | depth | asc_path | asc_path_json | node |
---|---|---|---|---|---|---|
6 | 13 | 23 | 3 | ,13,1, | [13, 1] | L |
7 | 13 | 22 | 3 | ,13,1, | [13, 1] | R |
10 | 22 | 1568 | 4 | ,22,13,1, | [22, 13, 1] | L |
11 | 22 | 26 | 4 | ,22,13,1, | [22, 13, 1] | R |
12 | 23 | 1476 | 4 | ,23,13,1, | [23, 13, 1] | L |
13 | 23 | 690716 | 4 | ,23,13,1, | [23, 13, 1] | R |
Recent Articles
- Using explainability methods to understand (some part) of the spread of COVID-19 in a landlocked country
- Stored Procedure Overview
- Into the Light: Data Integration Brings Value to Dark Data
- Amazon Connect now provides an out-of-the-box chat user interface for your website
- AWS Elemental MediaLive launches Workflow Wizard
- Amazon EC2 D3 instances with dense local HDD storage now available in Asia Pacific (Singapore, Sydney and Tokyo), and Europe (Frankfurt) regions
- Urban Institute and IBM Team Up to Reveal Neighborhood Change, Arming Policy Makers with Timely Insights using AI
- Why I choose significance over success – and you should too
- E-Signature Solution signNow Empowers AWS Customers to Send and Sign Documents Anytime, Anywhere
- AWS Proton makes new fields available for Jinja parametrization
- AWS Proton now supports services without pipelines
- AWS Proton now supports tagging capabilities
- Amazon Quicksight Launches Sankey Diagram, Field folders and more
- AWS Shield Advanced now supports resource tagging
- Amazon EKS now supports creation and management of add-ons using AWS CloudFormation
- AWS IoT Device Defender now supports CloudFormation
- Amazon EMR introduces instance fleets to help lower cost and improve cluster provisioning in AWS GovCloud (US) Regions
- AWS Glue DataBrew adds binning, skewness, binarization, and transpose transformations for pre-processing data for machine learning and analytics
- AWS IoT Device SDK for Embedded C now supports over-the-air update, AWS IoT Device Defender custom metrics, and multiple shadows for a single IoT device
- Marco Slot: Citus 10: Columnar for Postgres, rebalancer, single-node, & more
- AWS CloudFormation now supports Amazon S3 on Outposts
- NIMBLE virtual short course, May 26-28
- Using Compound Statements Outside of Stored Programs
- What is Data Governance?
- Developer Velocity: Key lessons from digital leaders on software excellence for 2021 and beyond
- Empowering every developer to innovate with Azure
- Afterthoughts of e-Rum2020 in The R Journal
- 3 techniques to improve AML transaction monitoring strategies
- #GraphCast: Introduction to Graph Theory: What Is a Graph?
- Introduction to MySQL UPDATE Statement
Leave a Reply
You must be logged in to post a comment.