In the same time of passionate debat of the best distribution MySQL vs MariaDB on a 120K Euro server(here), i wanted to share the story of drop of InnoDB insert performance for one of our client having a tables between 10 to 100 Billions rows.
In most case this size of table is just a no go in relationnel database.
So why it used to work so far ?
c1 int, c2 tinyint, c3 mediumint, c4 binary(1), c5 float .
The issue appear because :
Life in consulting can be hard but now the solution:
In most case this size of table is just a no go in relationnel database.
So why it used to work so far ?
c1 int, c2 tinyint, c3 mediumint, c4 binary(1), c5 float .
c2 avg cardinality 1000
c3 avg cardinality 10000
c3 avg cardinality 10000
Very simple log table with optimized column type :
- Partition by range on c1 where c1 increase with loading date
- InnoDB compression 8K block size
A first good metric when inserting 1 row
- Into a 1G partition we touch about 400 block in memory
- Into a brain fresh partition we touch 7 block
So partition really help by limiting memory required to keep the index hot
The issue appear because :
- The data have been reloaded not on a long period but on a short period of time not enabling the data to be spread into multiple partitions and the index partition was reaching 600G for a much much smaller buffer pool.
- The loading process was slow down from InnoDB compression, 3 times slower then without compression in this case.
- Creation date not a solution: old data could be load after a long time
- Hash of c1 not a solution as load and read are concurrent and touch many blocks in concurrent partitions
- We lose drop partition feature
In a first step we upgrade to MariaDB 5.5
Create a table c1 int, c3 mediumint, c4 blob, c5 blob .
We use c2 as indice of Dynamic Column as it get less then 65K values
https://kb.askmonty.org/en/dynamic-columns
https://mariadb.atlassian.net/browse/MDEV-377
https://kb.askmonty.org/en/dynamic-columns
https://mariadb.atlassian.net/browse/MDEV-377
And insert data like :
INSERT INTO t VALUES (c1, c2 , CREATE_COLUMN(c4,c2(row1) as BINARY(1), 'c4(row1)', ,c2(row2) as BINARY(1), 'c4(row2)' ...... ) , CREATE_COLUMN(c2(row1) as DECIMAL 'c5(row1)', ,c2(row2) as DECIMAL , 'c5(row2)' .........)
With that trick we divide by 1000 the number of rows in the table
What are the consequences and the big surprise here
- Load data is just faster as we remove the compression time
- Innodb table size is smaller then the compressed version
Why the table is smaller ? I can just guess here !
- InnoDB as a fill factor space that may be removed from beeing stored in a blob
- InnoDB have record overhead that is removed by Dynamic Column
In a futur step we could upgrade to MariaDB 10 and store those blob in a cassandra cluster .
Advantages :
- We get free sharding
- Security is fine because meta data stay in the DB and who can figure out what a bunch of float really mean :)



