vendredi 5 juillet 2013

MariaDB Storage Engine for CCM forum

CCM Benchmark is one of the leading forum provider on the web,  ROI is a major concern for them  and historically MyISAM was used on the forum replication cluster.  Reason is that MyISAM gave better ROI/performance on data that is hardly electable to cache mechanism.

This post is for MySQL users at scale,  if the number of servers or datacenter cost is not an issue for you, better get some more memory or flash storage and ou will found Lucifer server to demonstrate that your investment is not a lost of money or just migrate to Mongo.  

Quoting Damien Mangin, CTO at CCM "I like my data to be small, who want's to get to a post where the question is not popular and have no answer. Despite cleaning we still get more data than what commodity hardware memory can offer and storing all post in memory would be a major waste of money".

Like many other big web players at an other scale, Damien need to scale on disk not because it's good, but because you can catch more with less hardware. Doing this you need to control the cache missed at the level that you found acceptable and that give constant response time for your workload.

What data size do we get  retaining the most popular forum posts ?



Data
MyISAM
49G
TokuDB Fast
22G
InnoDB
80G
InnoDB 8K
50G
TokuDB Small
?
InnoDB 4K
?


What hardware do we have ?


PUMA : MariaDB 5.5 InnoDB 32G RAM

|__ LUCIFER : MariaDB 5.5 InnoDB compressed 8K 64G RAM

|__ GERTRUDE : MariaDB 5.5 MyISAM 32G RAM

|__ MYSQL1 : MariaDB 5.5 MyISAM 32G RAM

|__ MYSQL3 : MariaDB 5.5 TokuDB Fast 32G RAM


What are the top 10 queries, response time on each server ?

Q1


SELECT categorie, best_answer_id FROM ccmforum_index WHERE id=169328


No surprise here  that table is small and we notice that that TokuDB and InnoDB compression does not affect the response time of the queries.


Q2


SELECT id,message FROM ccmforum WHERE id IN(?,?,?,?,?)

In range of 1 to 5000 values in the IN clause.
This table is the big baby that generate RND IOps .



Interesting you get the raison here of why MyISAM is better than InnoDB at equal hardware on disk bound workload.

3 times better is something that matter as the second most frequent query.
We get almost equal performance for MyISAM(mysql1) and TokuDB(mysql3) knowing that TokuDB get all data in RAM and MyISAM 75% ; and InnoDB (puma) uncompressed 50%.


 Q3

SELECT parentx FROM uforums WHERE module="download" AND info_id=223




Q4


SELECT i.categorie,c.resume,c.title,count(i.categorie) AS nbFROM ccmforum_index i INNER JOIN ccmforum_cat c ON i.categorie=c.idWHERE i.parentx IN(32932,213290,2937,15002,13612,10016,154379,116397,79497,31886,4235,5038,5222,84819,81100,36025,8274,162824,10620,21731,12130,123360,232454) AND c.visibilite=0 AND c.acces=0
GROUP BY i.categorieORDER BY nb DESC


Q5



SELECT m.id,s.contribs,s.contribs_technique,p.devise,UNIX_TIMESTAMP(m.ts_create) AS date,p.photo,p.photo_etag,m.nick,UNIX_TIMESTAMP(s.ts_last_post) AS ts_last_post,p.siteperso AS website,(m.rang+1) AS level,m.contributeur AS contributor,m.blockedFROM commentcamarche.ccmmembres m INNER JOIN ccmforum_stats s ON m.id=s.id LEFT JOIN commentcamarche.ccmprofils p ON p.id=s.idWHERE m.id IN(1191274)




Q6

SELECT i.id,i.titre,i.auteur,UNIX_TIMESTAMP(i.date) AS date,i.membre,UNIX_TIMESTAMP(i.datex) AS datex,i.etat,i.categorie,i.parentx,i.member_id,i.reponses,i.dernier,i.dernier_membre,i.premier,i.premier_membre,UNIX_TIMESTAMP(i.datex) AS unix_datex,UNIX_TIMESTAMP(i.date) AS unix_date,0 AS view,i.appreciation FROM ccmforum_index i   WHERE i.categorie IN (2,105,10,111,108,106,110,109,107) AND i.etat!=0
ORDER BY i.datex DESC
LIMIT
2350,50


Q7

select sum(count) as cpt from ccmforum_count




Q8

SELECT id,nick FROM commentcamarche.ccmmembres WHERE nick="hyxo"



Q9

SELECT m.nick,m.mail,m.valid,s2.site_id AS id_site_create,
      UNIX_TIMESTAMP
(m.ts_create) AS ts_create,UNIX_TIMESTAMP(s.ts_last_post) AS ts_last_post,UNIX_TIMESTAMP(p.ts_last_edit) AS ts_last_edit,m.rang+1 AS level,m.contributeur AS contributor,m.following,m.followers,
      m
.signature,p.configuration AS config,p.domaines AS interest_areas,p.devise AS quote,p.bio,m.sexe AS gender,m.ville AS city,m.pays AS country,CONCAT(p.anniversaire_annee,'-',p.anniversaire_mois,'-',p.anniversaire_jour) AS birthdate,
      p
.siteperso AS website,m.newsletter AS optin_ccm,m.optin_part,m.`blocked`,m.messagerie AS accept_pm,m.notifications,p.photo AS picture,p.photo_etag AS picture_etag,m.domaine AS registration_domain,
      p
.date AS show_date,p.ville AS show_city, p.pays AS show_country, p.anniversaire AS show_birthdate, p.sexe AS show_gender,
      p
.email AS show_mail, LENGTH(p.siteperso) AS show_website,
      d
.job,d.company,d.biography,d.website AS websiteMD,d.twitter,d.facebook,d.linkedin,d.googleplus,d.firstname,d.lastnameFROM   commentcamarche.ccmmembres mLEFT JOIN commentcamarche.ccmprofils p ON p.id = m.idLEFT JOIN commentcamarche.ccmmembres_data d ON d.id = m.idLEFT JOIN ccmforum_stats s ON s.id = m.idINNER JOIN globals.sites s2 ON s2.domain=m.domaineWHERE  m.id=207360

Take away 

TokuDB proved identical MyISAM response time but being at least 2 time smaller on disk, we did not check InnoDB compression on 32G should be a more fair test but it was not the point as CCM have a server with memory to cover InnoDB fatness.

We notice that TokuDB like InnoDB does not bring count(*) query faster if data stay in the cache but TokuDB compression does not hurt the performance in all major queries.


Aucun commentaire:

Enregistrer un commentaire