lundi 17 janvier 2011

Muzing on NoSQL, damned ! can't get rid of InifiniDB

NoSQL have been frequently used for building analytic solutions. The big picture is using some scalable client code with map reduce to distribute full data scan. 

This approach have nothing new to the RDBMS world and can be considered an extension of the Kimball ROLAP normalization, just allowing more CPU power on a single query. 

NoSQL or sharding take advantages on 
  • Distributed processing 
NoSQL or sharding are loosing advantages such        
  • Fast memory communication 
  • Per column data type optimization and deserialization cost (NoSQL)
  • C processing when reduced with slower language (NoSQL)
There is more non technical advantages in classic ROLAP normalization like using same well known OLTP tools and bug free storage engine, all coming with GPL licences for reducing the ownership cost. In many, many case it will allow building analytic solutions with acceptable response time.

Our existing tools in My SQL using vertical scalability contains

  • MyISAM for minimizing query latency
  • MyISAM read only pack table for sparing space and io
  • Deprecated merge table for partitioning
  • Native partitioning
  • InnoDB transparent bzip compression
  • InnoDB read ahead , and multiple io threads 
  • Tokutek storage engine with predictable response time on table sizing      

On the same front MariaDB is cutting the gap with the best commercial RDBMS, introducing new algorithms based on query plan optimization
  • Table elimination (MariaDB 5.1)
  • Batched Key Access (MariaDB 5.3)
  • Join buffering with outer joins
  • Index Condition Pushdown
  • Subquery optimizations (MariaDB 5.3)
  • NULL-aware materialization
  • Materialization-scan for grouping queries
  • Predicate caching
  • FROM subquery optimizations   

ROLAP, despite being the most common approach, is facing some well know limitations that OLAP or NoSQL try to balance.

  • Single query is bound to a single server, and for My SQL single core. It is good enough in most case for a marketing or management requirements putting limited load on the analytic solution and asking data with some predefine time delay. 
  • Putting technical team in front of the write vs read challenge, adding more indexes will speed some queries but doubtfully can cover all queries and will slow down writes and DDL. 
  • Over time the schema is more and more linked with specific requirements, bringing aggregates and datamarts and forcing to maintain bunch of ETL assynchronus code.         
NoSQL defenders tend to clams this is not acceptable? Analytic systems today are moving online and could populate events at the rate of the number of Internet sessions, clicks or network packets. This is just true, but is NoSQL the best solution ?      

On the same track to NoSQL,a toolkit is emerging in MariaDB and Drizzle,  to overcome those limitations 

  • Column based OLAP storage engine 
  • ROLAP sharding solutions with or without transparent map reduce 

What is the best today? Again there is no rule of thumb but i can share some founding in my day to day job.

OLAP storage engine InfiniDB, Infobright, LucidDB provide very good results on processing a lot of records, depending on the underlying hardware and type of queries. I tend to give credit to InfiniDB to maximize resources out of a single query and Infobright to provide better response time on limited hardware resources.

Based on the top 10 queries of an existing client running an online DWH 1 Billion records on 8 cores hardware RAID 5 4*15000tpm SAS, 32G RAM

InnoDB InfiniDB InfoBright
Avg number of logical records/s on all type of queries 1 598 682 12 098 102 4 527 652
Minimum response time / million logical records scan best case on a 128M rows group by 0.6 0.02 0.06
 So far so good OLAP is the big winner here !
Can NoSQL come clause to OLAP, at the rate of 500 000 get/s ? It would required 128 servers to come clause to such response time and 12 of them to maintain avg read records rate. Hopefully NoSQL can use bulk data transfer covered with distributed SQL layer.    

On the other side, ROLAP sharding can put more memory caching on a single query, introducing home made logical partitioning that can follow the business rules like putting most  hot data on a single node, mixing huge aggregates with small data lookups. This approch is already credited with very good results for me on a 20 billion DWH build last year for a fraction of the price of commercial DB 
Can ROLAP come clause to OLAP and how many servers will i need to achieve the same performance ?  According to previous specific bench it is 1 to 1 if we can make the all cores work together, but for the same latency it is more 1 to 3 or 4. Can we verify this with micro benchmarking. 

Let's abstract a linear parallel processing on ROLAP designed with some sharding.

With Spider storage engine allowing transparent horizontal sharding, we end up splitting the data in per day shards, giving 2M rows fact tables per partition, distributing the shard across multiple MariaDB instances.  Spider make it possible to do this and reattach your data just reusing same SQL statements. 

Let's now present some micro benchmark that discourage me from the brute force only  approach.

Mounting the partitions on a RAMFS, let catch the time for a single 2 Million rows partition scan

InnoDB 1.3
Pbxt 0.8
MyISAM Pack 1.32
MyISAM          0.46
Aria 32K blocs 0.86

With linear scalability we can get 8 cores to perform 32M row scan/s on MyISAM.  
Giving our previous query doing 128M row aggregate in 2s in InfiniDB, result would have run in 4 secondes, 2 times slower comparing with InfiniDB and this is just abstracting network, disk access and any consolidation time for each shard.

What is the magic behind OLAP, could it be per column sharding ?

Our Q10 queries is touching 3 fields out of the 12 in the giving table.This is opening  the possibility to divide full scan time per 4 with additional 3 times the scan of a primary key and adding 2 times the CPU to join 2 millions rows. This operation could possibly be done transparently with the excellent VP storage engine.

Oups, join on 2M records is adding minimum 8 seconds per shard for a full scan. This POC was just a big u turn. Conclusion, sharding per column and brut force is not the solution in most case giving the cost of  algorithms to rejoin the multiple peaces. 

Fetch 2 INT COLUMNS 2M rows MyISAM Join cache level Time to scan all records
FULL SCAN 8 bytes RECORD SIZE 1 1.07
FULL SCAN 108 bytes RECORD SIZE 1 1.87
OLAP engines take clever advantage of metadata per shard and pre aggregate informations it means that any ROLAP sharding or NoSQL solution should also come to the same conclusion in regard to latency, and adding hybrid mechanisms like additional metadata or pre calculated tables that can limit the cost of brute forcing.

Some great idea would be to build a transparent metadata layer,with My SQL proxy or gearman that will inject some pre calculated results per shard, execute asynchronously at insert time. I have no doubt that a map reduce solution for building such asynchronously metadata is the way to go for a ROLAP sharding with almost real time statistics.   

For those already convince, I'd like to point out, a not so known feature of My SQL call  WITH ROLLUP  to help trigger aggregating data. 

6 commentaires:

  1. For building summary tables look at Flexviews. It support asynchronous "fast refresh" type materialized views for MySQL.

    Shard-Query is a tool of mine that can use MySQL proxy (or PHP) and gearman to distribute queries over multiple databases and aggregate the results back together. It doesn't some clever things to get extra parallelism on queries too.

  2. Hi Stephane,

    Thanks for executing the write-up. Some quick points on InfiniDB:
    Yes, definitely columnar helps the I/O by avoiding columns not needed. In addition the columns effectively act as indices, allowing for efficient parallel scans of complex predicates. In addition, with the Enterprise edition you can add compression as well as map-reduction distribution of work across a cluster.

    It may be interesting to talk about integration with SkySQL, there are likely some intriguiging possibilities there.

    Thanks - Jim Tommaney
    CTO Calpont

  3. Ce commentaire a été supprimé par l'auteur.

  4. Stephane,

    Shard Query (by Justin above) is similar to the LucidDB MPP project Firewater. It allows a scale out approach for doing star queries.

    Great write up - love the NoSQL discussion as well.

  5. Hi Stéphane,

    Keep on the good work.The 15th of September 2011 we are releasing in open source a library in PHP5/Zend for querying in MDX a classic db.
    We are based upon Saiku.
    It can be a nice tool for people wanting to slowly migrate to a full OLAP system ... because I have experienced the same as you with many columnar db!