I have two tables and I need to be able to update them going both ways, the first table is a list of items:
ITEMS
Item* | Rev* | RDate | ECO | New
------+------+--------------+------+----
A | 0A | 2019-01-01 | E123 | 1
A | 01 | 2018-01-01 | E456 | 0
B | 0A | 2018-12-31 | E765 | 0
C | 01 | 2018-10-25 | E456 | 0
The second is a parent-child table, with revisions, but I have to fill in the Child Rev from the Item table
Tree
Parent* | ParentRev* | Child* | ChildRev | VDate*
--------+------------+--------+----------+-----------
Y | 0B | C | NULL | 2019-01-01
Y | 0C | D | NULL | 2019-01-13
Z | 01 | A | NULL | 2018-06-25
Z | 02 | A | NULL | 2019-01-11
Z | 0A | B | NULL | 2019-01-01
Notes:
- Primary key columns are marked with *
VDateshouldn't be part of the primary key, but the dataset is bad and has duplicates, so I need to add it
I looked into different questions like Select first row in each GROUP BY group?, but I couldn't find one that used row based conditions on the joining table that returned multiple fields. Anyway, I'm using this to fill in the records where ChildRev is NULL, but it doesn't include the ECO column
UPDATE T
SET [ChildRev] = (SELECT TOP 1 I.[Rev] AS [ChildRev]
FROM [Items] AS I
WHERE (I.[Item] = T.[Child]
AND I.[RDate] <= T.[VDate])
ORDER BY I.[RDate] DESC
)
FROM [Tree] AS T
WHERE T.[ChildRev] IS NULL
And, this is what I get:
Parent | ParentRev | Child | ChildRev | VDate | ECO
-------+-----------+-------+----------+------------+------
Y | 0B | C | 01 | 2019-01-01 | NULL
Y | 0C | D | NULL | 2019-01-13 | NULL
Z | 01 | A | 01 | 2018-06-25 | NULL
Z | 02 | A | 0A | 2019-01-11 | NULL
Z | 0A | B | 0A | 2019-01-01 | NULL
I'm dealing with 4.5M+ records in the Tree table and 1.2M+ in the Item table, growing daily. I have 2 questions:
Is there a better (faster) way to update the
TreeTable? (Bonus if it includes theECO)When I add new
Items, they are flagged with a1in theNewfield (might use trigger)How would I Check/Update the
Treetable with the newItems
Mind you that I have no real control in what order the data will get loaded (table or date).
Update
So, apparently Select first row in each GROUP BY group? was basically the solution, I just didn't realize it. Specifically on how to use a CTE to Update my data tables. Thanks @Xedni for enlightening me; I've only really used CTEs for recursive queries. So, I ended up with 2 similar CTEs,
When I add new records to the
Treetable, I addedAND ChildRev IS NULLto limit the updates:WITH CTE AS ( SELECT ... ) UPDATE CTE SET ChildRev = ItemRev WHERE RID = 1 AND ChildRev IS NULLWhen I add new records to the
Materialstable, I added aWHERE...ANYclause:WITH CTE AS ( SELECT ... RID = ROW_NUMBER() OVER (PARTITION BY t.Parent, t.ParentRev, t.Child ORDER BY i.RDate DESC) FROM #Tree t JOIN #Items i ON t.Child = i.Item AND i.RDate <= t.VDate WHERE I.Process = ANY (SELECT Item FROM #Items WHERE New = 1) ) UPDATE CTE SET ChildRev = ItemRev WHERE RID = 1