Executing complex SQL queries against large datasets often poses a significant challenge in terms of performance,
even when they leverage properly constructed indexes. This challenge becomes particularly pronounced when such queries
include numerous conditions combined with the OR
operator. These conditions, while necessary for the query logic,
can drastically slow down execution times.
In MailWizz, segments are a critical feature that often generate complex SQL queries. This complexity arises from the dynamic nature of user-defined conditions on subscribers' custom fields. These segments, while powerful and flexible, can lead to complex and demanding queries, especially when dealing with large datasets and a variety of user-specified conditions.
The example below shows a segment in MailWizz where we select subscribers matching following condition:
Email not ends with "@hotmail.com" OR Age is greater than 29 OR Country is Greece OR subscribers have opened an email in the last 360 days OR subscribers clicked an email in the last 360 days.
We can also see in the bottom-left of the image, the number of subscribers matching this segment.
This is important for later comparison in our SQL queries.
All SQL queries shown below are generated using Yii's ActiveRecord implementation, which might result a high number of parenthesis. We did our best to clean them up to make the examples more readable.
When a campaign is launched using this segment, it results in the generation of the following SQL query:
SELECT COUNT(DISTINCT t.subscriber_id) as counter FROM `list_subscriber` `t` LEFT JOIN `list_field_value` `fieldValues772` ON (`fieldValues772`.`subscriber_id`=`t`.`subscriber_id`) LEFT JOIN `list_field_value` `fieldValues771` ON (`fieldValues771`.`subscriber_id`=`t`.`subscriber_id`) LEFT JOIN `campaign_track_open` `segmentCampaignConditionOpen11` ON (`segmentCampaignConditionOpen11`.`subscriber_id`=`t`.`subscriber_id`) LEFT JOIN `campaign_track_url` `segmentCampaignConditionClick12` ON (`segmentCampaignConditionClick12`.`subscriber_id`=`t`.`subscriber_id`) LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94') WHERE ( t.list_id='2151' AND t.status='confirmed' AND ( (t.email NOT LIKE '%@hotmail.com') OR (`fieldValues772`.`field_id` = 772 AND CAST(`fieldValues772`.`value` AS SIGNED) > 29) OR (`fieldValues771`.`field_id` = 771 AND `fieldValues771`.`value` = 'greece') OR (segmentCampaignConditionOpen11.campaign_id IN (10,62,63,64,65,66,67,68,71,72,88,89,90,91,92,93,94) AND DATE(DATE_SUB(NOW(), INTERVAL 360 day)) <= DATE(segmentCampaignConditionOpen11.date_added)) OR (segmentCampaignConditionClick12.url_id IN (SELECT url_id FROM campaign_url WHERE campaign_id IN (10,62,63,64,65,66,67,68,71,72,88,89,90,91,92,93,94)) AND DATE(DATE_SUB(NOW(), INTERVAL 360 day)) <= DATE(segmentCampaignConditionClick12.date_added)) ) ) AND deliveryLogs.subscriber_id IS NULL
Analyzing this query with EXPLAIN
reveals the following:
+----+-------------+---------------------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+---------+-----------------------------+--------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+---------+-----------------------------+--------+----------+--------------------------+ | 1 | PRIMARY | t | NULL | ref | fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status | fk_list_subscriber_list1_idx | 4 | const | 177453 | 50.00 | Using where | | 1 | PRIMARY | fieldValues772 | NULL | ref | fk_list_field_value_list_subscriber1_idx | fk_list_field_value_list_subscriber1_idx | 4 | mailwizz-v2.t.subscriber_id | 6 | 100.00 | NULL | | 1 | PRIMARY | fieldValues771 | NULL | ref | fk_list_field_value_list_subscriber1_idx | fk_list_field_value_list_subscriber1_idx | 4 | mailwizz-v2.t.subscriber_id | 6 | 100.00 | NULL | | 1 | PRIMARY | segmentCampaignConditionOpen11 | NULL | ref | fk_campaign_track_open_list_subscriber1_idx | fk_campaign_track_open_list_subscriber1_idx | 4 | mailwizz-v2.t.subscriber_id | 1 | 100.00 | NULL | | 1 | PRIMARY | segmentCampaignConditionClick12 | NULL | ref | fk_campaign_track_url_list_subscriber1_idx | fk_campaign_track_url_list_subscriber1_idx | 4 | mailwizz-v2.t.subscriber_id | 1 | 100.00 | Using where | | 1 | PRIMARY | deliveryLogs | NULL | ref | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4 | const | 1 | 10.00 | Using where; Not exists | | 2 | SUBQUERY | campaign_url | NULL | index | PRIMARY,campaign_hash,fk_campaign_url_campaign1_idx | fk_campaign_url_campaign1_idx | 4 | NULL | 105 | 100.00 | Using where; Using index | +----+-------------+---------------------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+---------+-----------------------------+--------+----------+--------------------------+ 7 rows in set, 1 warning (0.01 sec)
Despite having proper indexes, the database engine struggles to utilize them efficiently for all conditions. Consequently, it ends up processing a massive 670 million records. This extensive data traversal translates into a substantial execution time of about 16 seconds for the query:
+---------+ | counter | +---------+ | 97228 | +---------+ 1 row in set (16.09 sec)
Rewriting the initial query using UNION
instead of OR
results in a longer, yet structurally different format:
SELECT COUNT(*) as counter FROM ( ( SELECT t.subscriber_id as subscriber_id FROM `list_subscriber` `t` LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94') WHERE ((t.list_id='2151') AND (t.status='confirmed') AND ((t.email NOT LIKE '%@hotmail.com'))) AND (deliveryLogs.subscriber_id IS NULL) ) UNION ( SELECT t.subscriber_id as subscriber_id FROM `list_subscriber` `t` LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94') LEFT JOIN `list_field_value` `fieldValues772` ON (`fieldValues772`.`subscriber_id`=`t`.`subscriber_id`) WHERE ((t.list_id='2151') AND (t.status='confirmed') AND ((`fieldValues772`.`field_id` = 772 AND (CAST(`fieldValues772`.`value` AS SIGNED) > 29) ))) AND (deliveryLogs.subscriber_id IS NULL) ) UNION ( SELECT t.subscriber_id as subscriber_id FROM `list_subscriber` `t` LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94') LEFT JOIN `list_field_value` `fieldValues771` ON (`fieldValues771`.`subscriber_id`=`t`.`subscriber_id`) WHERE ((t.list_id='2151') AND (t.status='confirmed') AND ((`fieldValues771`.`field_id` = 771 AND (`fieldValues771`.`value` = 'greece') ))) AND (deliveryLogs.subscriber_id IS NULL) ) UNION ( SELECT t.subscriber_id as subscriber_id FROM `list_subscriber` `t` LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94') LEFT JOIN `campaign_track_open` `segmentCampaignConditionOpen11` ON (`segmentCampaignConditionOpen11`.`subscriber_id`=`t`.`subscriber_id`) WHERE ((t.list_id='2151') AND (t.status='confirmed') AND ((segmentCampaignConditionOpen11.campaign_id IN (10,62,63,64,65,66,67,68,71,72,88,89,90,91,92,93,94) AND DATE(DATE_SUB(NOW(), INTERVAL 360 day)) <= DATE(segmentCampaignConditionOpen11.date_added)))) AND (deliveryLogs.subscriber_id IS NULL) ) UNION ( SELECT t.subscriber_id as subscriber_id FROM `list_subscriber` `t` LEFT OUTER JOIN `campaign_delivery_log` `deliveryLogs` ON (`deliveryLogs`.`subscriber_id`=`t`.`subscriber_id`) AND (deliveryLogs.campaign_id = '94') LEFT JOIN `campaign_track_url` `segmentCampaignConditionClick12` ON (`segmentCampaignConditionClick12`.`subscriber_id`=`t`.`subscriber_id`) WHERE ((t.list_id='2151') AND (t.status='confirmed') AND ((segmentCampaignConditionClick12.url_id IN (SELECT url_id FROM campaign_url WHERE campaign_id IN (10,62,63,64,65,66,67,68,71,72,88,89,90,91,92,93,94)) AND DATE(DATE_SUB(NOW(), INTERVAL 360 day)) <= DATE(segmentCampaignConditionClick12.date_added)))) AND (deliveryLogs.subscriber_id IS NULL) ) ) AS combined
Each UNION
corresponds to one of the original OR
conditions. While the query length increases, analyzing it with EXPLAIN
shows:
+----+--------------+---------------------------------+------------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+---------+-----------------------------------------------------------+--------+----------+-------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------+---------------------------------+------------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+---------+-----------------------------------------------------------+--------+----------+-------------------------+ | 1 | PRIMARY | derived2 | NULL | ALL | NULL | NULL | NULL | NULL | 11931 | 100.00 | NULL | | 2 | DERIVED | t | NULL | ref | fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status | fk_list_subscriber_list1_idx | 4 | const | 177453 | 44.44 | Using where | | 2 | DERIVED | deliveryLogs | NULL | ref | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4 | const | 1 | 10.00 | Using where; Not exists | | 3 | UNION | fieldValues772 | NULL | ref | fk_list_field_value_list_field1_idx,fk_list_field_value_list_subscriber1_idx,field_subscriber,field_id_value | field_id_value | 4 | const | 161240 | 100.00 | Using where | | 3 | UNION | t | NULL | eq_ref | PRIMARY,fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status | PRIMARY | 4 | mailwizz-v2.fieldValues772.subscriber_id | 1 | 25.00 | Using where | | 3 | UNION | deliveryLogs | NULL | ref | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4 | const | 1 | 10.00 | Using where; Not exists | | 4 | UNION | fieldValues771 | NULL | ref | fk_list_field_value_list_field1_idx,fk_list_field_value_list_subscriber1_idx,field_subscriber,field_id_value | field_id_value | 770 | const,const | 406 | 100.00 | Using where | | 4 | UNION | t | NULL | eq_ref | PRIMARY,fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status | PRIMARY | 4 | mailwizz-v2.fieldValues771.subscriber_id | 1 | 25.00 | Using where | | 4 | UNION | deliveryLogs | NULL | ref | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4 | const | 1 | 10.00 | Using where; Not exists | | 5 | UNION | segmentCampaignConditionOpen11 | NULL | ALL | fk_campaign_track_open_campaign1_idx,fk_campaign_track_open_list_subscriber1_idx | NULL | NULL | NULL | 51 | 100.00 | Using where | | 5 | UNION | t | NULL | eq_ref | PRIMARY,fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status | PRIMARY | 4 | mailwizz-v2.segmentCampaignConditionOpen11.subscriber_id | 1 | 25.00 | Using where | | 5 | UNION | deliveryLogs | NULL | ref | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4 | const | 1 | 10.00 | Using where; Not exists | | 6 | UNION | segmentCampaignConditionClick12 | NULL | ALL | fk_campaign_track_url_list_subscriber1_idx,fk_campaign_track_url_campaign_url1_idx | NULL | NULL | NULL | 8 | 100.00 | Using where | | 6 | UNION | t | NULL | eq_ref | PRIMARY,fk_list_subscriber_list1_idx,list_email,status_last_updated,list_id_status | PRIMARY | 4 | mailwizz-v2.segmentCampaignConditionClick12.subscriber_id | 1 | 25.00 | Using where | | 6 | UNION | deliveryLogs | NULL | ref | fk_campaign_delivery_log_list_subscriber1_idx,fk_campaign_delivery_log_campaign1_idx,sub_proc_status,cid_status,cid_date_added | fk_campaign_delivery_log_campaign1_idx | 4 | const | 1 | 10.00 | Using where; Not exists | | 6 | UNION | campaign_url | NULL | eq_ref | PRIMARY,campaign_hash,fk_campaign_url_campaign1_idx | PRIMARY | 8 | mailwizz-v2.segmentCampaignConditionClick12.url_id | 1 | 100.00 | Using where | | NULL | UNION RESULT | union2,3,4,5,6 | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary | +----+--------------+---------------------------------+------------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+---------+-----------------------------------------------------------+--------+----------+-------------------------+ 17 rows in set, 1 warning (0.03 sec)
A significant improvement is the effective use of indexes, unlike in the original query. This happens because the database engine processes distinct queries, joining their results afterward. This approach not only accelerates the execution but also ensures optimal index utilization. The revised query delivers the same results as the original but achieves them much faster:
+---------+ | counter | +---------+ | 97228 | +---------+ 1 row in set (0.80 sec)
Using UNION
instead of OR
in SQL queries often results in faster performance due to the way the database engine processes these queries:
By breaking down a complex query with OR
into simpler parts with UNION
, the database engine can optimize and execute each part more efficiently, leading to overall faster query performance.
In conclusion, while it's not always feasible to replace We're here to help, please contact us and we will do our best to answer your questions as soon as possible.OR
conditions with UNION
in SQL queries, when the opportunity arises,
it's highly advisable to do so. The performance gains observed from using UNION
can be substantial, transforming the efficiency of query execution.
This case study demonstrates that in certain scenarios, leveraging UNION
instead of OR
can significantly enhance database performance,
making it a valuable strategy in SQL query optimization.Do you have questions?