My data model is
(:Parent {parentId:...})-[:CONTAINS]->(:Child)-[:SUPPORTS]->(:Dummy)
Every Child has only one Parent. The Parent.parentId attribute is unique, i.e. there is a constraint defined as
CREATE CONSTRAINT Parent_parentId_unique IF NOT EXISTS ON (p:Parent) ASSERT p.parentId IS UNIQUE;
I have a user-defined @Procedure which accepts collection of parentIds and I want to remove all :SUPPORTS relationships from all their children.
- When everything is in cypher, the execution of procedure is slow - hundreds milliseconds to seconds.
Result removeRelationshipResult = tx.execute(""
+ "MATCH (p:Parent)-[:CONTAINS]->(c:Child)-[r:SUPPORTS]->(:Dummy)\n"
+ "WHERE p.parentId IN $parentIds\n"
+ "DELETE r",
Map.of("parentIds", parentIds)
);
- When I loop through all relationships programmatically, the execution is fast - under 10 milliseconds (
streamOfis utility method to convert Iterable to Stream).
RelationshipType CONTAINS = RelationshipType.withName("CONTAINS");
RelationshipType SUPPORTS = RelationshipType.withName("SUPPORTS");
for (Long parentId : parentIds) {
Node parentNode = tx.findNode(Parent.LABEL, "parentId", parentId);
streamOf(parentNode.getRelationships(Direction.OUTGOING, CONTAINS))
.map(rel -> rel.getEndNode())
.flatMap(childNode -> streamOf(childNode.getRelationships(SUPPORTS)))
.forEach(Relationship::delete);
}
The difference happens even on first try when there is no :SUPPORTS relationship.
Where can be the cause of such difference and how to spot it?
UPDATE: Reaction to @cybersam's answer (too long and unformattable for comment):
I tested your suggestions on sample with 1737 Parent nodes and 655344 :SUPPORTS relations, splitted into 61 batches (useful for warmup from point #2 though primary purpose of split was different).
Applying point #1 caused huge performance improvement. The time is now comparable to programmatic implementation. I also tried to change programmatic implementation vice versa, i.e. add filtering to node labels but it did not have significant effect. Actually the time comparations differ for first run (when no relations exist) and second run (when relations from first run are actually deleted). Third and next run are similar to second run. The point #1 clearly answers my question and helped me a lot, thanks!
| implementation | first run: deletion time/total procedure time | second run: deletion time/total procedure time [ms] |
|---|---|---|
| cypher-labels | 79366/131261 | 170283/188783 |
| cypher-nolabels | 230/13756 | 1800/17284 |
| program-labels | 155/11731 | 2235/19539 |
| program-nolabels | 174/11805 | 2079/19111 |