I have the following structure for the table DataTable: every column is of the datatype int, RowID is an identity column and the primary key. LinkID is a foreign key and links to rows of an other table.
RowID LinkID Order Data DataSpecifier
1 120 1 1 1
2 120 2 1 3
3 120 3 1 10
4 120 4 1 13
5 120 5 1 10
6 120 6 1 13
7 371 1 6 2
8 371 2 3 5
9 371 3 8 1
10 371 4 10 1
11 371 5 7 2
12 371 6 3 3
13 371 7 7 2
14 371 8 17 4
.................................
.................................
I'm trying to do a query which alters every LinkID batch in the following way:
- Take every row with same
LinkID(e.g. the first batch is the first 6 rows here) - Order them by the
Ordercolumn - Look at
DataandDataSpecifiercolumns as one compare unit (They can be thought as one column, calleddataunit): - Keep as many rows from
Order=1onwards, until adataunitcomes by which appears more than one time in the batch - Keep that final row, but delete rest of the rows with same
LinkIDand greaterOrdervalue
So for the LinkID 120:
- Sort the batch by the
Ordercolumn (already sorted here, but should still do it) - Start looking from the top (So
Order=1here), go as long as you don't see a value which appears more than 1 time in the batch - Stop at the first duplicate
Order=3(dataunit1 10is also onOrder5). - Delete everything which has the
LinkID=120 AND Order>=4
After similar process for LinkID 371 (and every other LinkID in the table), the processed table will look like this:
RowID LinkID Order Data DataSpecifier
1 120 1 1 1
2 120 2 1 3
3 120 3 1 10
7 371 1 6 2
8 371 2 3 5
9 371 3 8 1
10 371 4 10 1
11 371 5 7 2
.................................
.................................
I've never done an SQL query which this complicated. I know the query has to be something like this:
DELETE FROM DataTable
WHERE RowID IN (SELECT RowID
FROM DataTable
WHERE -- ?
GROUP BY LinkID
HAVING COUNT(*) > 1 -- ?
ORDER BY [Order]);
but I just can't seem to wrap my head around this and get the query right. I would preferably do this in pure SQL, with one executable (and reusable) query.
I asked a very similar question here: How to remove rest of the rows with the same ID starting from the first duplicate?
But since I realized that my original filtering logic in the question was not actually what I needed and that question had already been answered correctly, I had to make this new question.