vendredi 17 avril 2015

Social Networking Using OQGraph

I was given the chance to experiment typical social networking query on an existing 60 Millions edges dataset

How You're Connected


Such algorithms and others are simply hardcoded into the OQGraph. 

With the upgrade of OQGraph V3 into MariaDB 10 we can proceed directly on top of the exiting tables holding the edges kine of featured VIRTUAL VIEW. 



CREATE OR REPLACE TABLE `relations` (
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `id1` int(10) unsigned NOT NULL,
  `id2` int(10) unsigned NOT NULL,
  `relation_type` tinyint(3) unsigned DEFAULT NULL,
  KEY `id1` (`id1`),
  KEY `id2` (`id2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

oqgraph=# select count(*) from relations;

+----------+
| count(*) |
+----------+
| 59479722 |
+----------+
1 row in set (23.05 sec)

Very nice integration of table discovery that save me referring to documentation to found out all columns definition.  

CREATE TABLE `oq_graph`
ENGINE=OQGRAPH `data_table`='relations' `origid`='id1' `destid`='id2';

oqgraph=# SELECT * FROM oq_graph WHERE latch='breadth_first' AND origid=175135 AND destid=7;
+---------------+--------+--------+--------+------+--------+
| latch         | origid | destid | weight | seq  | linkid |
+---------------+--------+--------+--------+------+--------+
| breadth_first | 175135 |      7 |   NULL |    0 | 175135 |
| breadth_first | 175135 |      7 |      1 |    1 |      7 |
+---------------+--------+--------+--------+------+--------+
2 rows in set (0.00 sec)


oqgraph=# SELECT * FROM oq_graph WHERE latch='breadth_first' AND origid=175135 AND destid=5615775;
+---------------+--------+---------+--------+------+----------+
| latch         | origid | destid  | weight | seq  | linkid   |
+---------------+--------+---------+--------+------+----------+
| breadth_first | 175135 | 5615775 |   NULL |    0 |   175135 |
| breadth_first | 175135 | 5615775 |      1 |    1 |        7 |
| breadth_first | 175135 | 5615775 |      1 |    2 | 13553091 |
| breadth_first | 175135 | 5615775 |      1 |    3 |  1440976 |
| breadth_first | 175135 | 5615775 |      1 |    4 |  5615775 |
+---------------+--------+---------+--------+------+----------+
5 rows in set (0.44 sec)

What we first highlight is that underlying table indexes KEY `id1` (`id1`), KEY `id2` (`id2`) are used by OQgrah to navigate the vertices via a number of key reads and range scans, such 5 level relation was around 2689 jump and 77526  range access to the table . 

Meaning the death of the graph was around 2500 with an average of 30 edges per vertex 

# MyISAM

oqgraph=# SELECT * FROM oq_graph_myisam WHERE latch='breadth_first' AND origid=175135 AND destid=5615775;
+---------------+--------+---------+--------+------+----------+
| latch         | origid | destid  | weight | seq  | linkid   |
+---------------+--------+---------+--------+------+----------+
| breadth_first | 175135 | 5615775 |   NULL |    0 |   175135 |
| breadth_first | 175135 | 5615775 |      1 |    1 |        7 |
| breadth_first | 175135 | 5615775 |      1 |    2 | 13553091 |
| breadth_first | 175135 | 5615775 |      1 |    3 |  1440976 |
| breadth_first | 175135 | 5615775 |      1 |    4 |  5615775 |
+---------------+--------+---------+--------+------+----------+
5 rows in set (0.11 sec)

Need to investigate more such speed difference using MyISAM. Ideas are welcome ?

jeudi 16 avril 2015

Howto - Move a table to different schema with no outage

I remember a time when it was debate if views can be useful for a web oriented workload ?

This post is about one good use case:

The  story is that some tables have been creating into a schema and used by the application into same connection.

Later on some more schema have been added to separate data for multiple application domain but still using original table, kind of cross domain universal table.

With addition of many new domains, a new global schema was added storing freshly create universal tables.

The question was how to move back the old table in the correct new schema without stopping availability of the service ?

We decided to use a view that point to the physical table. Change the application to use the view and later atomically switch the table and the view.


Here is the test case for doing that :


-- Create schemas
CREATE DATABASE schema1;
CREATE DATABASE schema2;

-- Create table in schema 1
CREATE TABLE schema1.t1 (
  id int
);

-- Create views in schema 2
CREATE VIEW schema2.t1 AS SELECT * FROM schema1.t1;
-- Create dummy view on view in schema 1 
CREATE VIEW schema1.t1_new AS SELECT * FROM schema2.t1;

-- Changing the API 

-- Switch schema 1 table and schema 2 view
RENAME TABLE schema2.t1 TO schema2.t1_old,
  schema1.t1 TO schema2.t1,
  schema1.t1_new TO schema1.t1;

Is there some other path ? Surely some triggers + insert ignore like done in OAK or Pt Online Alter table but i also remember a time when it was debate if triggers can be useful for a web oriented workload :)


Thanks to Nicolas @ccmbenchmark for contributing the test case.