I need to join two tables: our_sample and tls207_pers_appln from PATSTAT.
our_sample' has 4 columns: appln_id, appln_auth, appln_nr, appln_kind.
tls207_pers_appln has 4 columns: appln_id, person_id, applt_seq_nr, invt_seq_nr.
our_sample has 2191 rows and some (60) of them are missing in tls207_pers_appln.
Because I want to join the tables keeping all the appln_id in our_sample (even if they do not have the matching information from tls207_pers_appln) I join the two tables doing a RIGHT JOIN.
However, the resulting view t2_tot_in_patent has only 2096 appln_id.
This is in part due to the restriction I put (35 patents are dropped because I select only those HAVING MAX(invt_seq_nr) > 0, which is fine). But this would yield 2191-35 = 2156 patents.
Instead, I get 2096 of them that is: 2191 (in our_sample) - 60 (appln_ids in our_sample that miss from tls207) - 35 (appln_ids for which invt_seq_nr = 0).
BUT the whole point of using RIGHT JOIN is that I should not loose those 60 patents. Why then?
-- compiling total count of inventors per patent: t2_tot_in_patent
DROP VIEW IF EXISTS t2_tot_in_patent;
CREATE VIEW t2_tot_in_patent AS
SELECT m.appln_id, MAX(invt_seq_nr) AS tot_in_patent
FROM patstat2022a.tls207_pers_appln AS t7
RIGHT OUTER JOIN cecilia.our_sample AS m
ON t7.appln_id = m.appln_id
GROUP BY appln_id
HAVING MAX(invt_seq_nr) > 0