Executing complex SQL queries against large datasets often poses a significant challenge in terms of performance, even when they leverage properly constructed indexes. This challenge becomes particularly pronounced when such queries include numerous conditions combined with the OR operator. These conditions, while necessary for the query logic, can drastically slow down execution times.

A minimalistic and professional abstract representation of SQL database optimization using unions.

In MailWizz, segments are a critical feature that often generate complex SQL queries. This complexity arises from the dynamic nature of user-defined conditions on subscribers' custom fields. These segments, while powerful and flexible, can lead to complex and demanding queries, especially when dealing with large datasets and a variety of user-specified conditions.

The example below shows a segment in MailWizz where we select subscribers matching following condition:

Email not ends with "@hotmail.com" OR Age is greater than 29 OR Country is Greece OR subscribers have opened an email in the last 360 days OR subscribers clicked an email in the last 360 days.

We can also see in the bottom-left of the image, the number of subscribers matching this segment.
This is important for later comparison in our SQL queries.

Showing how a complex segment looks like in MailWizz, where the user apply various conditions to match against their subscribers

All SQL queries shown below are generated using Yii's ActiveRecord implementation, which might result a high number of parenthesis. We did our best to clean them up to make the examples more readable.

When a campaign is launched using this segment, it results in the generation of the following SQL query:

SELECT COUNT(DISTINCT t.subscriber_id) as counter
FROM `list_subscriber` `t`
    LEFT JOIN `list_field_value` `fieldValues772` ON (`fieldValues772`.`subscriber_id`=`t`.`subscriber_id`)
    LEFT JOIN `list_field_value` `fieldValues771` ON (`fieldValues771`.`subscriber_id`=`t`.`subscriber_id`)
    LEFT JOIN `campaign_track_open` `segmentCampaignConditionOpen11` ON (`segmentCampaignConditionOpen11`.`subscriber_id`=`t`.`subscriber_id`)
    LEFT JOIN `campaign_track_url` `segmentCampaignConditionClick12` ON (`segmentCampaignConditionClick12`.`subscriber_id`=`t`.`subscriber_id`)
    LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94')
WHERE (
    t.list_id='2151' AND t.status='confirmed'
    AND (
        (t.email NOT LIKE '%@hotmail.com')
        OR (`fieldValues772`.`field_id` = 772 AND CAST(`fieldValues772`.`value`  AS SIGNED) > 29)
        OR (`fieldValues771`.`field_id` = 771 AND `fieldValues771`.`value` = 'greece')
        OR (segmentCampaignConditionOpen11.campaign_id IN (10,62,63,64,65,66,67,68,71,72,88,89,90,91,92,93,94) AND DATE(DATE_SUB(NOW(), INTERVAL 360 day)) <= DATE(segmentCampaignConditionOpen11.date_added))
        OR (segmentCampaignConditionClick12.url_id IN (SELECT url_id FROM campaign_url WHERE campaign_id IN (10,62,63,64,65,66,67,68,71,72,88,89,90,91,92,93,94)) AND DATE(DATE_SUB(NOW(), INTERVAL 360 day)) <= DATE(segmentCampaignConditionClick12.date_added))
    )
) AND deliveryLogs.subscriber_id IS NULL

Analyzing this query with EXPLAIN reveals the following:

+----+-------------+---------------------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+---------+-----------------------------+--------+----------+--------------------------+
| id | select_type | table                           | partitions | type  | possible_keys                                                                                                                  | key                                         | key_len | ref                         | rows   | filtered | Extra                    |
+----+-------------+---------------------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+---------+-----------------------------+--------+----------+--------------------------+
|  1 | PRIMARY     | t                               | NULL       | ref   | fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status                                                     | fk_list_subscriber_list1_idx                | 4       | const                       | 177453 |    50.00 | Using where              |
|  1 | PRIMARY     | fieldValues772                  | NULL       | ref   | fk_list_field_value_list_subscriber1_idx                                                                                       | fk_list_field_value_list_subscriber1_idx    | 4       | mailwizz-v2.t.subscriber_id |      6 |   100.00 | NULL                     |
|  1 | PRIMARY     | fieldValues771                  | NULL       | ref   | fk_list_field_value_list_subscriber1_idx                                                                                       | fk_list_field_value_list_subscriber1_idx    | 4       | mailwizz-v2.t.subscriber_id |      6 |   100.00 | NULL                     |
|  1 | PRIMARY     | segmentCampaignConditionOpen11  | NULL       | ref   | fk_campaign_track_open_list_subscriber1_idx                                                                                    | fk_campaign_track_open_list_subscriber1_idx | 4       | mailwizz-v2.t.subscriber_id |      1 |   100.00 | NULL                     |
|  1 | PRIMARY     | segmentCampaignConditionClick12 | NULL       | ref   | fk_campaign_track_url_list_subscriber1_idx                                                                                     | fk_campaign_track_url_list_subscriber1_idx  | 4       | mailwizz-v2.t.subscriber_id |      1 |   100.00 | Using where              |
|  1 | PRIMARY     | deliveryLogs                    | NULL       | ref   | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx      | 4       | const                       |      1 |    10.00 | Using where; Not exists  |
|  2 | SUBQUERY    | campaign_url                    | NULL       | index | PRIMARY,campaign_hash,fk_campaign_url_campaign1_idx                                                                            | fk_campaign_url_campaign1_idx               | 4       | NULL                        |    105 |   100.00 | Using where; Using index |
+----+-------------+---------------------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+---------+-----------------------------+--------+----------+--------------------------+
7 rows in set, 1 warning (0.01 sec)

Despite having proper indexes, the database engine struggles to utilize them efficiently for all conditions. Consequently, it ends up processing a massive 670 million records. This extensive data traversal translates into a substantial execution time of about 16 seconds for the query:

+---------+
| counter |
+---------+
|   97228 |
+---------+
1 row in set (16.09 sec)

Rewriting the initial query using UNION instead of OR results in a longer, yet structurally different format:

SELECT COUNT(*) as counter FROM (
    (
        SELECT t.subscriber_id as subscriber_id
            FROM `list_subscriber` `t`
            LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94')
        WHERE ((t.list_id='2151') AND (t.status='confirmed') AND ((t.email NOT LIKE '%@hotmail.com'))) AND (deliveryLogs.subscriber_id IS NULL)
    )

    UNION

    (
        SELECT t.subscriber_id as subscriber_id
            FROM `list_subscriber` `t`
            LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94')
            LEFT JOIN `list_field_value` `fieldValues772` ON (`fieldValues772`.`subscriber_id`=`t`.`subscriber_id`)
        WHERE ((t.list_id='2151') AND (t.status='confirmed') AND ((`fieldValues772`.`field_id` = 772 AND (CAST(`fieldValues772`.`value`  AS SIGNED) > 29) ))) AND (deliveryLogs.subscriber_id IS NULL)
    )

    UNION

    (
        SELECT t.subscriber_id as subscriber_id
            FROM `list_subscriber` `t`
            LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94')
            LEFT JOIN `list_field_value` `fieldValues771` ON (`fieldValues771`.`subscriber_id`=`t`.`subscriber_id`)
        WHERE ((t.list_id='2151') AND (t.status='confirmed') AND ((`fieldValues771`.`field_id` = 771 AND (`fieldValues771`.`value` = 'greece') ))) AND (deliveryLogs.subscriber_id IS NULL)
    )

    UNION

    (
        SELECT t.subscriber_id as subscriber_id
            FROM `list_subscriber` `t`
            LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94')
            LEFT JOIN `campaign_track_open` `segmentCampaignConditionOpen11` ON (`segmentCampaignConditionOpen11`.`subscriber_id`=`t`.`subscriber_id`)
        WHERE ((t.list_id='2151') AND (t.status='confirmed') AND ((segmentCampaignConditionOpen11.campaign_id IN (10,62,63,64,65,66,67,68,71,72,88,89,90,91,92,93,94) AND DATE(DATE_SUB(NOW(), INTERVAL 360 day)) <= DATE(segmentCampaignConditionOpen11.date_added)))) AND (deliveryLogs.subscriber_id IS NULL)
    )

    UNION

    (

        SELECT t.subscriber_id as subscriber_id
            FROM `list_subscriber` `t`
            LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94')
            LEFT JOIN `campaign_track_url` `segmentCampaignConditionClick12` ON (`segmentCampaignConditionClick12`.`subscriber_id`=`t`.`subscriber_id`)
        WHERE ((t.list_id='2151') AND (t.status='confirmed') AND ((segmentCampaignConditionClick12.url_id IN (SELECT url_id FROM campaign_url WHERE campaign_id IN (10,62,63,64,65,66,67,68,71,72,88,89,90,91,92,93,94)) AND DATE(DATE_SUB(NOW(), INTERVAL 360 day)) <= DATE(segmentCampaignConditionClick12.date_added)))) AND (deliveryLogs.subscriber_id IS NULL)
    )

) AS combined

Each UNION corresponds to one of the original OR conditions. While the query length increases, analyzing it with EXPLAIN shows:

+----+--------------+---------------------------------+------------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+---------+-----------------------------------------------------------+--------+----------+-------------------------+
| id | select_type  | table                           | partitions | type   | possible_keys                                                                                                                  | key                                    | key_len | ref                                                       | rows   | filtered | Extra                   |
+----+--------------+---------------------------------+------------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+---------+-----------------------------------------------------------+--------+----------+-------------------------+
|  1 | PRIMARY      | derived2                        | NULL       | ALL    | NULL                                                                                                                           | NULL                                   | NULL    | NULL                                                      |  11931 |   100.00 | NULL                    |
|  2 | DERIVED      | t                               | NULL       | ref    | fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status                                                     | fk_list_subscriber_list1_idx           | 4       | const                                                     | 177453 |    44.44 | Using where             |
|  2 | DERIVED      | deliveryLogs                    | NULL       | ref    | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4       | const                                                     |      1 |    10.00 | Using where; Not exists |
|  3 | UNION        | fieldValues772                  | NULL       | ref    | fk_list_field_value_list_field1_idx,fk_list_field_value_list_subscriber1_idx,field_subscriber,field_id_value                   | field_id_value                         | 4       | const                                                     | 161240 |   100.00 | Using where             |
|  3 | UNION        | t                               | NULL       | eq_ref | PRIMARY,fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status                                             | PRIMARY                                | 4       | mailwizz-v2.fieldValues772.subscriber_id                  |      1 |    25.00 | Using where             |
|  3 | UNION        | deliveryLogs                    | NULL       | ref    | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4       | const                                                     |      1 |    10.00 | Using where; Not exists |
|  4 | UNION        | fieldValues771                  | NULL       | ref    | fk_list_field_value_list_field1_idx,fk_list_field_value_list_subscriber1_idx,field_subscriber,field_id_value                   | field_id_value                         | 770     | const,const                                               |    406 |   100.00 | Using where             |
|  4 | UNION        | t                               | NULL       | eq_ref | PRIMARY,fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status                                             | PRIMARY                                | 4       | mailwizz-v2.fieldValues771.subscriber_id                  |      1 |    25.00 | Using where             |
|  4 | UNION        | deliveryLogs                    | NULL       | ref    | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4       | const                                                     |      1 |    10.00 | Using where; Not exists |
|  5 | UNION        | segmentCampaignConditionOpen11  | NULL       | ALL    | fk_campaign_track_open_campaign1_idx,fk_campaign_track_open_list_subscriber1_idx                                               | NULL                                   | NULL    | NULL                                                      |     51 |   100.00 | Using where             |
|  5 | UNION        | t                               | NULL       | eq_ref | PRIMARY,fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status                                             | PRIMARY                                | 4       | mailwizz-v2.segmentCampaignConditionOpen11.subscriber_id  |      1 |    25.00 | Using where             |
|  5 | UNION        | deliveryLogs                    | NULL       | ref    | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4       | const                                                     |      1 |    10.00 | Using where; Not exists |
|  6 | UNION        | segmentCampaignConditionClick12 | NULL       | ALL    | fk_campaign_track_url_list_subscriber1_idx,fk_campaign_track_url_campaign_url1_idx                                             | NULL                                   | NULL    | NULL                                                      |      8 |   100.00 | Using where             |
|  6 | UNION        | t                               | NULL       | eq_ref | PRIMARY,fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status                                             | PRIMARY                                | 4       | mailwizz-v2.segmentCampaignConditionClick12.subscriber_id |      1 |    25.00 | Using where             |
|  6 | UNION        | deliveryLogs                    | NULL       | ref    | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4       | const                                                     |      1 |    10.00 | Using where; Not exists |
|  6 | UNION        | campaign_url                    | NULL       | eq_ref | PRIMARY,campaign_hash,fk_campaign_url_campaign1_idx                                                                            | PRIMARY                                | 8       | mailwizz-v2.segmentCampaignConditionClick12.url_id        |      1 |   100.00 | Using where             |
| NULL | UNION RESULT | union2,3,4,5,6                | NULL       | ALL    | NULL                                                                                                                           | NULL                                   | NULL    | NULL                                                      |   NULL |     NULL | Using temporary         |
+----+--------------+---------------------------------+------------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+---------+-----------------------------------------------------------+--------+----------+-------------------------+
17 rows in set, 1 warning (0.03 sec)

A significant improvement is the effective use of indexes, unlike in the original query. This happens because the database engine processes distinct queries, joining their results afterward. This approach not only accelerates the execution but also ensures optimal index utilization. The revised query delivers the same results as the original but achieves them much faster:

+---------+
| counter |
+---------+
|   97228 |
+---------+
1 row in set (0.80 sec)

Using UNION instead of OR in SQL queries often results in faster performance due to the way the database engine processes these queries:

  • Streamlined Execution Plans: UNION breaks down a complex query into simpler, smaller subqueries. This simplification allows the database engine to create more efficient execution plans for each subquery, reducing the computational overhead compared to a single complex query with multiple OR conditions.
  • Parallel Processing: In some cases, database engines can process each subquery within a UNION in parallel. This parallel processing is typically more efficient compared to the sequential evaluation of OR conditions within a single query.
  • Reduced I/O Operations: UNION can minimize redundant I/O operations by eliminating the need to repeatedly scan the same table for different OR conditions. Each subquery in a UNION can be optimized to access only the necessary data.
  • Index Optimization: Subqueries in a UNION are more likely to effectively utilize indexes, especially when each subquery is tailored to specific conditions. In contrast, complex OR conditions can sometimes lead to inefficient index usage or full table scans.

By breaking down a complex query with OR into simpler parts with UNION, the database engine can optimize and execute each part more efficiently, leading to overall faster query performance.

In conclusion, while it's not always feasible to replace OR conditions with UNION in SQL queries, when the opportunity arises, it's highly advisable to do so. The performance gains observed from using UNION can be substantial, transforming the efficiency of query execution. This case study demonstrates that in certain scenarios, leveraging UNION instead of OR can significantly enhance database performance, making it a valuable strategy in SQL query optimization.

Do you have questions?

We're here to help, please contact us and we will do our best to answer your questions as soon as possible.