I have two entities, Etf and DataPoint. Each Etf has multiple DataPoints. See structure below
public class Etf
{
public Etf()
{
DataPoints = new HashSet<DataPoint>();
}
public string Id { get; set; }
public ICollection<DataPoint> DataPoints { get; private set; }
}
public class DataPoint {
public string EtfId { get; set; }
public Etf Etf { get; set; }
public DateTime Date { get; set; }
}
It happens that multiple datapoints with the same EtfId and Date are inserted into the database. I would like to remove the duplicates based on these two fields.
In SQL, I have tried this:
WITH CTE AS
(SELECT Id, ROW_NUMBER() OVER
(PARTITION BY EtfId, Date ORDER BY EtfId, Date DESC)
AS ROW_NUMBER FROM DataPoints)
DELETE FROM CTE WHERE ROW_NUMBER > 1;
Which gives me an error that CTE is not an updatable table (perhaps since it is connected to both Etf as well as DataPoint). I then tried, based on this answer, the following in LINQ:
// Get the duplicate indices
var idxs = await _context.DataPoints
.GroupBy(g => new {g.EtfId, g.Date})
.OrderBy(g => g.Key.EtfId)
.ThenByDescending(g => g.Key.Date)
.SelectMany(g =>
g.Select((i, idx) => new { i.Id, Idx = idx }))
.Where(g => g.Idx > 0)
.Select(g => g.Id)
.ToListAsync(cancellationToken);
// Get the duplicate entities from indices
var duplicates = await _context.DataPoints
.Where(x => idxs.Contains(x.Id))
.ToListAsync(cancellationToken);
// Remove them
_context.DataPoints.RemoveRange(duplicates);
However, this approach gives me a System.InvalidOperationException at the Select((i, idx) => ..) statement saying it may be a bug or limitation from EF Core.
Is there any better (or just working) method that I can use?