-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temp #444
base: dev
Are you sure you want to change the base?
Temp #444
Conversation
public void setupRepo() throws IOException { | ||
Path root = tempDir.newFolder().toPath(); | ||
ClassLoader loader = getClass().getClassLoader(); | ||
String filename = "2018_complete.nt"; | ||
|
||
Path hdtstore = root.resolve("hdt-store"); | ||
Path locationNative = root.resolve("native"); | ||
|
||
Files.createDirectories(hdtstore); | ||
Files.createDirectories(locationNative); | ||
|
||
String indexName = "index.hdt"; | ||
|
||
HDTOptions options = HDTOptions.of( | ||
// disable the default index (to use the custom indexes) | ||
HDTOptionsKeys.BITMAPTRIPLES_INDEX_NO_FOQ, true, | ||
// set the custom indexes we want | ||
HDTOptionsKeys.BITMAPTRIPLES_INDEX_OTHERS, "sop,ops,osp,pso,pos"); | ||
|
||
|
||
try (HDT hdt = HDTManager.generateHDT(new Iterator<>() { | ||
@Override | ||
public boolean hasNext() { | ||
return false; | ||
} | ||
|
||
@Override | ||
public TripleString next() { | ||
return null; | ||
} | ||
}, Utility.EXAMPLE_NAMESPACE, options, null)) { | ||
hdt.saveToHDT(hdtstore.resolve(indexName).toAbsolutePath().toString(), null); | ||
} catch (Error | RuntimeException e) { | ||
throw e; | ||
} catch (Exception e) { | ||
throw new RuntimeException(e); | ||
} | ||
|
||
repository = CompiledSail.compiler().withEndpointFiles(new EndpointFiles(locationNative, hdtstore, indexName)) | ||
.compileToSparqlRepository(); | ||
try (InputStream is = new BufferedInputStream(Objects.requireNonNull(loader.getResourceAsStream(filename), | ||
filename + " doesn't exist"))) { | ||
repository.loadFile(is, filename); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try adding a break point on that line and then debugging the TempTest test() I added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add the file with the data to this path: qendpoint-store/src/test/resources/2018_complete.nt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a race condition because a merge was triggered in the middle of the file loading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how I can wait for the merge to complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. In my PR eclipse-rdf4j/rdf4j#4879 I managed to get the QueryJoinOptimizer to work better when the statistics are inaccurately returning 0.0 by adding 5.0 to the cost of all operations and also penalising cartesian joins more heavily.
@@ -0,0 +1,123 @@ | |||
Distinct (resultSizeActual=12, totalTimeActual=9.8s, selfTimeActual=0.048ms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to manually optimize the query and it runs in about 10 seconds on my laptop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what did you change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, you mean, you still have the problem that the query planner is not woriking and you tried to reorder the triple patterns by hand only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I managed to reorder the query by hand.
# This line should be above the FILTER and BIND, but doing so causes the QueryJoinOptimizer to not optimize the query for some unknown reason | ||
?resultnotice epo:refersToRole ?buyerrole . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be above the FILTER and BIND. When I put it above then the QueryJoinOptimizer doesn't manage to optimize the query at all. When I move it back down again like I've done now then it does end up triggering the QueryJoinOptimizer seemingly correctly, but it only seems to check the cardinality of this single statement pattern here and when it does it gets 0.0 as the cardinality.
If you run the test you should see the following printed:
Cardinality for StatementPattern Var (name=resultnotice) Var (name=_const_6aa9a9c_uri, value=http://data.europa.eu/a4g/ontology#refersToRole, anonymous) Var (name=buyerrole) is 0.0 (HDT: 0.0, Native: 0.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at the QueryJoinOptimizer code and it doesn't seem to recurse down into the query tree. Essentially it will recurse down to the first Join
and then optimize the join arguments that are either instance of Join
or LeftJoin
or StatementPattern
. Removing the inner sub-select from the query "fixes" the QueryJoinOptimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the QueryJoinOptimizer eclipse-rdf4j/rdf4j#4879
|
|
I've merged the fix for the QueryJoinOptimizer and it's available from the snapshot repo. The query still needs a slight modification. RDF4J does not optimise the location of the BIND clause, so it is unable to move the My original theory that RDF4J can't optimise sub-selects was wrong. Queries with a single sub-select can be optimised, but not if the non-sub-select parts contain a BIND clause. They can also not be optimised if there is more than one sub-select. With the changes to the QueryJoinOptimizer it can now optimise both of these types of queries. The three comments above this one contain the query plan and the statistics. I managed to get my code to wait for the HDT merging to complete, but as you can see there are still quite a few places where the statistics are returning 0.0. I haven't looked further into why this is the case. Here is the final query that I ended up with:
|
@D063520 Would you like to me comment on the original discussion page to outline my findings and the solution? |
yes please! |
No description provided.