Don’t forget: Neo4j ‘optional match’ gives cartesian products

Nick Doyle
2 min readJul 24, 2018

--

Originally posted 18th December, 2015

The doco on optional match does point out that ‘optional match’ is the equivalent of ‘outer join’ (which gives the cartesian product) in sql, however I still got tripped me up when using multiple optional joins.

For the example, I use a Matrix database like so:

merge (movie:Movie {title: ‘The Matrix’})
merge (actor1:Actor {name: ‘Carrie-Anne Moss’})
merge (actor2:Actor {name: ‘Keanu Reaves’})
merge (actor3:Actor {name: ‘Laurence Fishburne’})
merge (writer1:Writer {name: ‘Lana Wachowski’})
merge (writer2:Writer {name: ‘Andy Wachowski’})
merge (actor1)-[:ACTED_IN]->(movie)
merge (actor2)-[:ACTED_IN]->(movie)
merge (actor3)-[:ACTED_IN]->(movie)
merge (writer1)-[:WROTE]->(movie)
merge (writer2)-[:WROTE]->(movie)

You can get an interactive instance online here with the useful neo4j console, where you can also run these queries and see for yourself.

I want to get a list of Movies (currently just ‘The Matrix’), and for each Movie a list of Actors and a list of Writers.

So I run

match (movie:Movie {title: ‘The Matrix’})
optional match (movie)<-[:WROTE]-(writer:Writer)
optional match (movie)<-[:ACTED_IN]-(actor:Actor)
return movie, collect(writer) as writers, collect(actor) as actors

However the results weren’t what I was after:

You can see I’m getting duplicates in Writers and Actors; this is indeed the cartesian product Movies x Writers x Actors

So, what’s the solution?
I need to use WITH after each OPTIONAL MATCH

match (movie:Movie {title: ‘The Matrix’})
optional match (movie)<-[:WROTE]-(writer:Writer)
with movie, collect(writer) as writers
optional match (movie)<-[:ACTED_IN]-(actor:Actor)
with movie, writers, collect(actor) as actors
return movie, writers, actors
moviewritersactorstitleThe Matrix[
nameAndy Wachowski,
nameLana Wachowski][
nameCarrie-Anne Moss,
nameLaurence Fishburne,
nameKeanu Reaves]

Giving me the result that I was after, hooray!

Just before I go though — there is also a simpler, but less efficient (and possibly incorrect, depending on your data) solution — to use COLLECT DISTINCT like so:

match (movie:Movie {title: ‘The Matrix’})
optional match (movie)<-[:WROTE]-(writer:Writer)
optional match (movie)<-[:ACTED_IN]-(actor:Actor)
return movie, collect(DISTINCT writer) as writers, collect(DISTINCT actor) as actors

But, as you might guess (and can be seen running with PROFILE before the MATCH), this first fetches the whole cartesian product then gets distinct from them — very inefficient.

--

--

Nick Doyle
Nick Doyle

Written by Nick Doyle

Cloud-Security-Agile, in Melbourne Australia, experience includes writing profanity-laced Perl, surprise Migrations, furious DB Admin and Motorcycle instructing

No responses yet