MySQL: Fetching Many-To-Many Relationships

Fetching entries from Many-To-Many relationships is up to ten times faster and much more readable/easier to code using multiple SELECT statements than JOINs.

I’ve been wondering for a few months how I could make complex queries more performant and readable, so today I decided to perform a little experiment.

There are basically three ways to fetch a text and its tags using MySQL.

Using a JOIN takes about 36.7s if I repeat the operation 100,000 times. The code looks like this:

const data = await mysql.query(`
  SELECT tv.title as text_title, tv.id, t.id as text_id, c.uuid, c.title 
  FROM TextVersion tv 
  LEFT JOIN Text t ON t.id = tv.text_fk 
  LEFT JOIN text_category tc ON tc.text_id = t.id 
  LEFT JOIN Category c ON tc.category_id = c.id 
  WHERE tv.id=26
`)

const result = {
  id: data[0].id,
  title: data[0].title,
  text_id: data[0].text_id,
  categories: data.map(c => { return {
    title: c.title, 
    uuid: c.uuid
  }})
}

With a concatenated subquery result, I obtain 45s, which is 18.5% slower than the previous solution. It’s the method I used so far because I couldn’t figure out how to make JOINs work and look good with multiple Many-to-Many relationships:

const data = await mysql.query(`
  SELECT tv.title, tv.id, t.id as text_id, 
    (
    SELECT 
      CONCAT( '[', 
        GROUP_CONCAT(CONCAT(
          '{"title":"', c.title, '"}', ',', '{"uuid":"', c.uuid, '"}' 
         )), 
      ']'
    ) 
    FROM Category c 
    LEFT JOIN text_category tc ON tc.category_id = c.id 
    WHERE tc.text_id = t.id) as c 
  FROM TextVersion tv 
  JOIN Text t ON t.id = tv.text_fk 
  WHERE tv.id=26
`)

const result = {
  id: data[0].id,
  title: data[0].title,
  text_id: data[0].text_id,
  categories: JSON.parse(data[0].c)
}

The last solution I came up with was to use multiple SELECT statements. It took 40.1s (10% slower) to fetch and process the data to the format I needed:

const data = await mysql.query(`
  SELECT tv.title, tv.id, t.id as text_id 
  FROM TextVersion tv 
  JOIN Text t ON t.id = tv.text_fk 
  WHERE tv.id=26; 

  SELECT c.title, c.uuid  
  FROM Category c 
  LEFT JOIN text_category tc ON tc.category_id = c.id 
  LEFT JOIN TextVersion tv ON tv.text_fk=tc.text_id 
  WHERE tv.id=26
`)

const result = {
  id: data[0][0].id,
  title: data[0][0].title,
  text_id: data[0][0].text_id,
  categories: data[1].map(c => { return {
    title: c.title, 
    uuid: c.uuid
  }})
}

The first solution obviously performs better at scale. The literature almost always advise to use JOINs, so no surprise here. 

But when I try decreasing the number of iterations, I found out that multiple SELECT statements are equivalent or outperform JOIN queries. 

They are equally fast at 12,000 iterations, 50% faster at a hundred, 10 times faster with 10 iterations, and 60 times faster when performing a single query!

@craigpetterson mentioned on Twitter there is perhaps something wrong with my indexes, but adding composite indexes on the foreign keys doesn’t seem to change the results.

JOINs do scale better. But according to my benchmark, multiple SELECT statements are easier to read/code, require less data (no data duplication) to send over the wire, and outperform JOINs when you send fewer database requests, which is the case of most apps and static-generated webpages.

In conclusion, I’ll try to prefer using multiple SELECT statements rather than complicated JOIN until I manage to perform further tests and see how the queries behave in production.