Cypher DELETE relationship is much slower than programmatic Relationship::delete

Question

My data model is

(:Parent {parentId:...})-[:CONTAINS]->(:Child)-[:SUPPORTS]->(:Dummy)

Every Child has only one Parent. The Parent.parentId attribute is unique, i.e. there is a constraint defined as

CREATE CONSTRAINT Parent_parentId_unique IF NOT EXISTS ON (p:Parent) ASSERT p.parentId IS UNIQUE;

I have a user-defined @Procedure which accepts collection of parentIds and I want to remove all :SUPPORTS relationships from all their children.

When everything is in cypher, the execution of procedure is slow - hundreds milliseconds to seconds.

Result removeRelationshipResult = tx.execute(""
        + "MATCH (p:Parent)-[:CONTAINS]->(c:Child)-[r:SUPPORTS]->(:Dummy)\n"
        + "WHERE p.parentId IN $parentIds\n"
        + "DELETE r",
        Map.of("parentIds", parentIds)
);

When I loop through all relationships programmatically, the execution is fast - under 10 milliseconds (streamOf is utility method to convert Iterable to Stream).

RelationshipType CONTAINS = RelationshipType.withName("CONTAINS");
RelationshipType SUPPORTS = RelationshipType.withName("SUPPORTS");
for (Long parentId : parentIds) {
    Node parentNode = tx.findNode(Parent.LABEL, "parentId", parentId);
    streamOf(parentNode.getRelationships(Direction.OUTGOING, CONTAINS))
        .map(rel -> rel.getEndNode())
        .flatMap(childNode -> streamOf(childNode.getRelationships(SUPPORTS)))
        .forEach(Relationship::delete);
}

The difference happens even on first try when there is no :SUPPORTS relationship.

Where can be the cause of such difference and how to spot it?

UPDATE: Reaction to @cybersam's answer (too long and unformattable for comment):

I tested your suggestions on sample with 1737 Parent nodes and 655344 :SUPPORTS relations, splitted into 61 batches (useful for warmup from point #2 though primary purpose of split was different).

Applying point #1 caused huge performance improvement. The time is now comparable to programmatic implementation. I also tried to change programmatic implementation vice versa, i.e. add filtering to node labels but it did not have significant effect. Actually the time comparations differ for first run (when no relations exist) and second run (when relations from first run are actually deleted). Third and next run are similar to second run. The point #1 clearly answers my question and helped me a lot, thanks!

implementation	first run: deletion time/total procedure time	second run: deletion time/total procedure time [ms]
cypher-labels	79366/131261	170283/188783
cypher-nolabels	230/13756	1800/17284
program-labels	155/11731	2235/19539
program-nolabels	174/11805	2079/19111

cybersam · Accepted Answer · 2023-05-02T22:49:25.263

There are at least 3 reasons for the speed difference.

Your Cypher query specifies that the SUPPORT relationship's start node must be a Child and its end node must be a Dummy. Your Java code does not care what labels those 2 nodes have, and so has to do less work. This alternate MATCH clause would be more equivalent:
```
MATCH (p:Parent)-[:CONTAINS]->()-[r:SUPPORTS]->()
```
Cypher queries, when first executed (or executed again after being dropped from the Cypher query cache) must be parsed, error checked, and converted into Java operations that are optimized based on the current characteristics of the DB -- all of which takes time. Once the Cypher query's operations are cached, the same Cypher query will run faster. So, a fairer comparison would require that you run the Cypher query more than once and to ignore the timing of the first run.
As is always the case with generated code, you cannot expect that the Java code generated from a Cypher query will always be as efficient as hand-crafted code that tries to do the same thing. Some additional overhead is normal.

Removing label in point #1 did the trick. I appended details to end of question, thanks! — Tomáš Záluský, May 03 '23 at 19:29

Cypher DELETE relationship is much slower than programmatic Relationship::delete

1 Answers1