Deepgreen

A Data Warehouse Solution derived from PostgreSQL

What is Deepgreen?

Scalable MPP Data Warehouse.

Better Join and Aggregation algorithms

New subsystem to handle spills

JIT-Compiled

Vectorized Scans

Data-path Optimization.

Data Warehouse System Grand Design

Scalability

a. Load Balancing

b. Data expansion

c. Data redistribution

d. Huge database (PetaByte Ready)

What is Xdrive?

Xdrive is a Deepgreen DB connectivity service that extends the reach of Deepgreen to external data sources Through Xdrive.

Deepgreen DB is able to read/write from/to a myriad of data management systems, including Amazon S3, HDFS, Oracle, and Elastic Search.

Xdrive Characteristics

Using Xdrive, Deepgreen DB is able to scan external tables at tremendous speed due to these underlying architectural choices:

High Bandwidth

Pushed-down Filters

Elasticity

TPC-H 10G Results

All 22 queries of TPC-H are measured against Greenplum DB and Deepgreen DB. Q1 and Q5 are specifically graphed below for comparisons.

All Results

Q1: Scan and aggregate fact table

Q5: 6-way-join

Raw result: Deepgreen DB vs Greenplum DB using Heap Tables

Q1 is a typical aggregate query running against the fact table.

Q5 is an aggregate over a 6-way hashjoin that joins the fact table lineitem table against the orders and supplier tables, and subsequently against other dimension tables.

Comparison Result

Deepgreen Performance on XEON-2643

Reporting Queries Comparison
No	Query Report Name	Oracle(in minutes)	Deepgreen(in minutes)	Speed Gain	Total Row Count
1	flat_price	85	2.8	21500%	663,034
2	sales_flyer_sli	25	2.4	830%	104
3	profit_mtd_lost	7.5	3.3	2121%	383,359
4	daily_lmi	120	10	1200%	154

Deepgreen Hardware Specification XEON-2643

Deepgreen Server Hardware Specification
Machine Type (HT)	Intel(R) Xeon(R) CPU E5-2643 @ 3.30GHz (16 HT)
RAM	64 GB
Disk	PCI SSD NVMe Samsung Pro 960 2 TB
Kernel Version	Linux 3.16.0-8-amd64
Operating System	Debian GNU/Linux 9 (stretch)

Oracle Data Source Server Hardware Specification
Machine Type (HT)	Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 HT)
RAM	64 GB
Disk	SSD 2.7 TB
Kernel Version	Linux 3.10.0-514.10.2.el7.x86_64
Operating System	Centos 7.3

Deepgreen Performance on EPYC-7742

Reporting Queries Comparison
No	Query Report Name	Oracle(in minutes)	Deepgreen(in minutes)	Speed Gain	Total Row Count
1	stok_teknisi	37	4.8	770%	408,122
2	provisioning_v2	33	19.6	173%	233,291
3	kumulatif	2	0.4	500%	24,338
4	out_project	17	0.7	2420%	465,676

Deepgreen Hardware Specification EPYC-7742

Deepgreen Server Hardware Specification
Machine Type (HT)	AMD EPYC 7742 64-Core Processor (128 HT)
RAM	128 GB
Disk	PCI SSD NVMe Samsung Pro 960 2 TB
Kernel Version	Linux 4.19.0-10-amd64
Operating System	Debian GNU/Linux 10 (buster)

Features Deepgreen DB

Greenplum SQL

100%

Executor tuned for x86

5X Faster

AI & Machine Learning

TensorFlow, MADlib

Graph Processing

Pregel

Disaster Recovery

Non-stop & incremental

Column Store

PAX, GP-column-store

Compression

lz4, zstd, zlib, quicklz

Load & Connectivity

Xdrive, gpfdist, gpload

In-memory Data Grid

Xdrive-Geode

Stream Interface

Xdrive-Kafka

Fast Numerics

Dec64, Dec128

GUI Monitor

Zabbix, pgBadger

Text Search

Xdrive-Elastic-Search

100% Compatible with Greenplum DB

Deepgreen DB is derived from the open source Greenplum DB project. It maintains 100% compatibility with Greenplum DB. From SQL and stored procedures syntax, to storage formats on disk, to operation utilities such as gpstart or gpfdist, Deepgreen DB ensures full compatibility to minimize effort in redeployment. In particular:

No need to reload data.

No changes to SQL code (both DML and DDL).

No changes to stored procedure code.

No changes to user-defined function code.

No changes to connectivity and

Authentication protocols such as odbc and jdbc.

No changes to operational scripts such as

Bash backup scripts and cron jobs.

Deepgreen DB Application Area

More Speed

For most OLAP workload that is CPU-bound, Deepgreen DB runs up to 3X faster than Greenplum DB on average.

More Connected

Using Xdrive, Deepgreen DB can read/write to/from many external data external sources in a distributed and efficient manner.

More Intelligent

Using the Transducer, Python and Go code fragments can be directly embedded into SQL to group and push data to TensorFlow for machine learning.

Call to action

Interested on our Products? Download our Brochure Here

Deepgreen Customers & Partners from International and Indonesian

Do you want us to do proof of concept?

Want us to contact you?

Please send us your details, we will contact you shortly

Appendix 1 - Data Warehouse Assessment Questionnaire

Common

What is your current Data Size (in TB) in your Production DWH System?
What is your current Data Size (in GB) in your Production Application System?
What is your current Data Size (in TB) in your offline Storage?
How big (in GB) is your data source growth in a year?
How long are your target retention years for Data in DWH?
Do you have any ETL Tools? Please also explain the usage
How big is your DWH System if any? (cpu, mem, storage capacity, number of servers)
What BI application do you have? Is it fit to your needs?

Size Measurement

Are you migrating from the current existing Data Warehouse solution? If yes, please describe the system (System Type, Version) and how long the current Data Warehouse system has been running?
Which tables have 5 biggest rows?
What kind of 5 slowest (Report) Select Query involving one or more of those 5 biggest tables?
How many total Rows? In a Database.
Do you use BLOB (Large Object) as the result of the Query?
How many total store procedures? (Function and Procedure) do use in DWH Environment (if any)
How many total Dblinks / Connections to outside?

Data Source Implementation (If any)

Please describe the implementation details.
Explain in the architecture /topology (i.e. Clusters, Replication, Monitors, High Availability)
Describe the current environment? (I.e OS, Hardware, Memory, RAM, Core, etc)
Describe the server? (i.e. Virtual or dedicated)
Describe the file system ( i.e. SAN or NFS)
Describe the storage disk ( i.e Spindle, SSD, Nvme)
How many tables are related to the Datawarehouse?
How big is the data source?

Constraints

What is your implementation of specific technology/features from the current database? (encryption, etc)
Do you have any source code for your application?
Is your application still under maintenance of its vendor?
Is there any specific method implemented for Backup and Restore?

ETL Process

Do you already have the ETL Process design/requirements? Please describe
Do you have any specific requirements for the ETL Process?
How do you want to store data to the Data Warehouse? In real-time, batch, or triggered manually?
Please describe all data sources wanted to be aggregated to Data Warehouse system
Do you have a plan to create a data mart? Please explain your strategy.
Please describe who will create a consolidated data? Such as Cubes, Data Marts, Views or else

The Ultimate Data Warehouse System

Deepgreen DB

Want us to contact you?

Deepgreen

A Data Warehouse Solution derived from PostgreSQL

What is Deepgreen?

Scalable MPP Data Warehouse.

Better Join and Aggregation algorithms

New subsystem to handle spills

JIT-Compiled

Vectorized Scans

Data-path Optimization.

Data Warehouse System Grand Design

Reliability

Availability

Scalability

What is Xdrive?

Xdrive Characteristics

High Bandwidth

Pushed-down Filters

Elasticity

TPC-H 10G Results

Comparison Result

Deepgreen Performance on XEON-2643

Deepgreen Hardware Specification XEON-2643

Deepgreen Performance on EPYC-7742

Deepgreen Hardware Specification EPYC-7742

Features Deepgreen DB

100% Compatible with Greenplum DB

No need to reload data.

No changes to SQL code (both DML and DDL).

No changes to stored procedure code.

No changes to user-defined function code.

No changes to connectivity and

Authentication protocols such as odbc and jdbc.

No changes to operational scripts such as

Bash backup scripts and cron jobs.

Deepgreen DB Application Area

Call to action

Do you want us to do proof of concept?

Want us to contact you?

Appendix 1 - Data Warehouse Assessment Questionnaire

Common

Size Measurement

Data Source Implementation (If any)

Constraints

ETL Process