Related question: How to select rows with no matching entry in another table?
I was trying to select rows using this method and couldn't get it to work in SQLite. After a bit of wrangling it occurred to me that the reason might be that there are NULL values in some fields. Sure enough, I was right, and when I changed the = to IS in the query below things started behaving as expected:
CREATE TEMP TABLE newEvent(id INTEGER,t INTEGER,name,extra,extra2,extra3);
INSERT INTO newEvent(id,t,name,extra,extra2,extra3) VALUES
(0, 1376351146, 'TEST', NULL, NULL, NULL),
(0, 1376348867, 'OLD', NULL, NULL,NULL);
SELECT n.id,n.t,n.name,n.extra,n.extra2,n.extra3 FROM newEvent n
LEFT JOIN event E ON n.t = E.t AND n.name IS E.name
AND n.extra IS E.extra;
AND n.extra2 IS E.extra2;
AND n.extra3 IS E.extra3
WHERE E.id IS NULL;
DROP TABLE newEvent;
In the above example, there is an existing record in table event with name='OLD'. The newEvent table is defined the same as the original event table.
However, I noticed a BIG problem: my query was now taking almost 30 seconds to run! If I change only the n.name IS E.name to n.name = E.name but leave all the other ISes as-is, then the query only takes around 400ms. (There are around 3 million records in table event.)
Why the big difference in performance? It turns out I can actually use = instead of IS for the name comparison because it is never null, but if it ever were to be NULL it seems like this would break. Conversely, I am concerned that at some point the query might start running slow, since I don't understand what it is about name that makes the equality query run so much faster. My guess is that maybe SQLite somehow knows that there are nulls in the extra fields and is able to optimize but I would like something a bit more firm than a wild guess.
As far as I can tell, IS is simply = with the additional provisio that it will treat NULL comparisions the same as if they were empty strings (assuming there are no actual empty strings to compare). So why is using = on the name field 75 times faster, but has no effect on performance on the extra fields???