March 12, 2017
For starters, let me say:
- SequoiaDB, the company, is my client.
- SequoiaDB, the product, is the main product of SequoiaDB, the company.
- SequoiaDB, the company, has another product line SequoiaCM, which subsumes SequoiaDB in content management use cases.
- SequoiaDB, the product, is fundamentally a JSON data store. But it has a relational front end …
- … and is usually sold for RDBMS-like use cases …
- … except when it is sold as part of SequoiaCM, which adds in a large object/block store and a content-management-oriented library.
- SequoiaDB’s products are open source.
- SequoiaDB’s largest installation seems to be 2 PB across 100 nodes; that includes block storage.
- Figures for DBMS-only database sizes aren’t as clear, but the sweet spot of the cluster-size range for such use cases seems to be 6-30 nodes.
- SequoiaDB, the company, was founded in Toronto, by former IBM DB2 folks.
- Even so, it’s fairly accurate to view SequoiaDB as a Chinese company. Specifically:
- SequoiaDB’s founders were Chinese nationals.
- Most of them went back to China.
- Other employees to date have been entirely Chinese.
- Sales to date have been entirely in China, but SequoiaDB has international aspirations
- SequoiaDB has >100 employees, a large majority of which are split fairly evenly between “engineering” and “implementation and technical support”.
- SequoiaDB’s marketing (as opposed to sales) department is astonishingly tiny.
- SequoiaDB cites >100 subscription customers, including 10 in the global Fortune 500, a large fraction of which are in the banking sector. (Other sectors mentioned repeatedly are government and telecom.)
Unfortunately, SequoiaDB has not captured a lot of detailed information about unpaid open source production usage.
While I usually think that the advantages of open source are overstated, in SequoiaDB’s case open source will have* an additional benefit when SequoiaDB does go international — it addresses any concerns somebody might have about using Chinese technology.
*Edit: Actually, this claim is overstated based on SequoiaDB’s current open source practices. Please see the comment thread below.
SequoiaDB’s technology story starts:
- SequoiaDB is a layered DBMS.
- It manages JSON via update-in-place. MVCC (Multi-Version Concurrency Control) is on the roadmap.
- Indexes are B-tree.
- Transparent sharding and elasticity happen in what by now is the industry-standard/best-practices way:
- There are many (typically 4096) logical partitions, many of which are assigned to each physical partition.
- If the number of physical partitions changes, logical partitions are reassigned accordingly.
- Relational OLTP (OnLine Transaction Processing) functionality is achieved by using a kind of PostgreSQL front end.
- Relational batch processing is done via SparkSQL.
- There also is a block/LOB (Large OBject) storage engine meant for content management applications.
- SequoiaCM boils down technically to:
- SequoiaDB, which is used to store JSON metadata about the LOBs …
- … and whose generic-DBMS coordination capabilities are also used over the block/LOB engine.
- A Java library focused on content management.
SequoiaDB’s relationship with PostgreSQL is complicated, but as best I understand SequoiaDB’s relational operations:
- SQL parsing, optimization, and so on rely mainly on PostgreSQL code. (Of course, there are some hacks, such as to the optimizer’s cost functions.)
- Actual data storage is done via SequoiaDB’s JSON store, using PostgreSQL Foreign Data Wrappers. Each record goes in a separate JSON document. Locks, commits and so on — i.e. “write prevention” — are handled by the JSON store.
- PostgreSQL’s own storage engine is actually part of the stack, but only to manage temp space and the like.
PostgreSQL stored procedures are already in the SequoiaDB product. Triggers and referential integrity are not. Neither, so far as I can tell, are PostgreSQL’s datatype extensibility capabilities.
I neglected to ask how much of that remains true when SparkSQL is invoked.
SequoiaDB’s use cases to date seem to fall mainly into three groups:
- Content management via SequoiaCM.
- “Operational data lakes”.
- Pretty generic replacement of legacy RDBMS.
Internet back-ends, however — and this is somewhat counter-intuitive for an open-source JSON store — are rare, at least among paying subscription customers. But SequoiaDB did tell me of one classic IoT (Internet of Things) application, with lots of devices “phoning home” and the results immediately feeding a JSON-based dashboard.
To understand SequoiaDB’s “operational data lake” story, it helps to understand the typical state of data warehousing at SequoiaDB’s customers and prospects, which isn’t great:
- 2-3 years of data, and not all the data even from that time period.
- Only enough processing power to support structured business intelligence …
- … and hence little opportunity for ad-hoc query.
SequoiaDB operational data lakes offer multiple improvements over that scenario:
- They hold as much relational data as customers choose to dump there.
- That data can be simply copied from operational stores, with no transformation.
- Or if data arrives via JSON — from external organizations or micro-services as the case may be — the JSON can be stored unmodified as well.
- Queries can be run straight against this data soup.
- Of course, views can also be set up in advance to help with querying.
Views are particularly useful with what might be called slowly changing schemas. (I didn’t check whether what SequoiaDB is talking about matches precisely with the more common term “slowly changing dimensions”.) Each time the schema changes, a new table is created in SequoiaDB to receive copies of the data. If one wants to query against the parts of the database structure that didn’t change — well, a view can be establish to allow for that.
Finally, it seems that SequoiaCM uses are concentrated in what might be called “security and checking-up” areas, such:
- Photographs as part of an authentication process.
- Video of in-person banking transactions, both for fraud prevention and for general service quality assurance.
- Storage of security videos (for example from automated teller machines).
SequoiaCM deals seem to be bigger than other SequoiaDB ones, surely in part because the amounts of data managed are larger.
Categories: Application areas, Business intelligence, Data models and architecture, Data warehousing, Databricks, Spark and BDAS, Market share and customer counts, NoSQL, OLTP, Open source, PostgreSQL, SequoiaDB, Structured documents