Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temp #444

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from
Open

Temp #444

Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import org.eclipse.rdf4j.common.concurrent.locks.Lock;
import org.eclipse.rdf4j.common.iteration.CloseableIteration;
import org.eclipse.rdf4j.common.iteration.ExceptionConvertingIteration;
import org.eclipse.rdf4j.common.transaction.IsolationLevels;
import org.eclipse.rdf4j.model.IRI;
import org.eclipse.rdf4j.model.Namespace;
import org.eclipse.rdf4j.model.Resource;
Expand Down Expand Up @@ -142,7 +143,7 @@ protected void notifyStatementRemoved(Statement st) {

@Override
public void begin() throws SailException {
logger.info("Begin connection transaction");
logger.debug("Begin connection transaction");

super.begin();

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
package com.the_qa_company.qendpoint;

import com.the_qa_company.qendpoint.compiler.CompiledSail;
import com.the_qa_company.qendpoint.compiler.SparqlRepository;
import com.the_qa_company.qendpoint.core.hdt.HDT;
import com.the_qa_company.qendpoint.core.hdt.HDTManager;
import com.the_qa_company.qendpoint.core.options.HDTOptions;
import com.the_qa_company.qendpoint.core.options.HDTOptionsKeys;
import com.the_qa_company.qendpoint.core.triples.TripleString;
import com.the_qa_company.qendpoint.store.EndpointFiles;
import com.the_qa_company.qendpoint.store.Utility;
import org.apache.commons.lang3.time.StopWatch;
import org.eclipse.rdf4j.model.Statement;
import org.eclipse.rdf4j.query.TupleQuery;
import org.eclipse.rdf4j.query.explanation.Explanation;
import org.eclipse.rdf4j.repository.sail.SailRepositoryConnection;
import org.junit.After;
import org.junit.Before;
import org.junit.*;
import org.junit.rules.TemporaryFolder;

import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Iterator;
import java.util.Objects;
import java.util.stream.Stream;

public class TempTest {

@Rule
public TemporaryFolder tempDir = TemporaryFolder.builder().assureDeletion().build();
private SparqlRepository repository;


@Before
public void setupRepo() throws IOException {
Path root = tempDir.newFolder().toPath();
ClassLoader loader = getClass().getClassLoader();
String filename = "2018_complete.nt";

Path hdtstore = root.resolve("hdt-store");
Path locationNative = root.resolve("native");

Files.createDirectories(hdtstore);
Files.createDirectories(locationNative);

String indexName = "index.hdt";

HDTOptions options = HDTOptions.of(
// disable the default index (to use the custom indexes)
HDTOptionsKeys.BITMAPTRIPLES_INDEX_NO_FOQ, true,
// set the custom indexes we want
HDTOptionsKeys.BITMAPTRIPLES_INDEX_OTHERS, "sop,ops,osp,pso,pos");


try (HDT hdt = HDTManager.generateHDT(new Iterator<>() {
@Override
public boolean hasNext() {
return false;
}

@Override
public TripleString next() {
return null;
}
}, Utility.EXAMPLE_NAMESPACE, options, null)) {
hdt.saveToHDT(hdtstore.resolve(indexName).toAbsolutePath().toString(), null);
} catch (Error | RuntimeException e) {
throw e;
} catch (Exception e) {
throw new RuntimeException(e);
}

repository = CompiledSail.compiler().withEndpointFiles(new EndpointFiles(locationNative, hdtstore, indexName))
.compileToSparqlRepository();
try (InputStream is = new BufferedInputStream(Objects.requireNonNull(loader.getResourceAsStream(filename),
filename + " doesn't exist"))) {
repository.loadFile(is, filename);
}
}
Comment on lines +40 to +84
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ate47 I'm using this code to set up a repo with the file from #413

The query statistics seems to always be returning 0. Do you know why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you detail this a bit more. I loaded the dataset and I was querying one triple pattern and I see that the cardinality is non zero:
Screenshot 2024-01-22 at 22 52 51

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding a break point on that line and then debugging the TempTest test() I added.

Copy link
Contributor Author

@hmottestad hmottestad Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add the file with the data to this path: qendpoint-store/src/test/resources/2018_complete.nt

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a race condition because a merge was triggered in the middle of the file loading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how I can wait for the merge to complete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. In my PR eclipse-rdf4j/rdf4j#4879 I managed to get the QueryJoinOptimizer to work better when the statistics are inaccurately returning 0.0 by adding 5.0 to the cost of all operations and also penalising cartesian joins more heavily.


@After
public void after() {
if (repository != null) {
repository.shutDown();
}
repository = null;
}

@Test
public void test() {
try (SailRepositoryConnection connection = repository.getConnection()) {
System.out.println();
String query = """
PREFIX epo: <http://data.europa.eu/a4g/ontology#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX legal: <https://www.w3.org/ns/legal#>
PREFIX dcterms: <http://purl.org/dc/terms#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?countryID ?year (COUNT(DISTINCT ?lot) AS ?amountLots) (SUM(if(?bidders = 1, 1, 0)) AS ?numSingleBidders) WHERE {

?proc a epo:Procedure .
?proc epo:hasProcedureType ?p .
?proc epo:hasProcurementScopeDividedIntoLot ?lot .

?stat epo:concernsSubmissionsForLot ?lot .

?stat a epo:SubmissionStatisticalInformation .
?stat epo:hasReceivedTenders ?bidders .

?resultnotice epo:refersToProcedure ?proc .
?resultnotice epo:refersToRole ?buyerrole .
?resultnotice a epo:ResultNotice .
?resultnotice epo:hasDispatchDate ?ddate .

FILTER ( ?p != <http://publications.europa.eu/resource/authority/procurement-procedure-type/neg-wo-call>)
BIND(year(xsd:dateTime(?ddate)) AS ?year) .

{
SELECT DISTINCT ?buyerrole ?countryID WHERE {
?org epo:hasBuyerType ?buytype .
FILTER (?buytype != <http://publications.europa.eu/resource/authority/buyer-legal-type/eu-int-org> )

?buyerrole epo:playedBy ?org .
?org legal:registeredAddress ?orgaddress .
?orgaddress epo:hasCountryCode ?countrycode .
?countrycode dc:identifier ?countryID .

}
}
} GROUP BY ?countryID ?year


""";

System.out.println(query);

Explanation explanation = runQuery(connection, query);
System.out.println();
System.out.println();
System.out.println();
System.out.println();
System.out.println(explanation.toDot());
System.out.println();
System.out.println();
System.out.println();
System.out.println();
System.out.println(explanation);
System.out.println();
System.out.println();
System.out.println();
System.out.println();

}

}




private static Explanation runQuery(SailRepositoryConnection connection, String query) {
StopWatch stopWatch = StopWatch.createStarted();
TupleQuery tupleQuery = connection.prepareTupleQuery(query);
tupleQuery.setMaxExecutionTime(60*10);
Explanation explain = tupleQuery.explain(Explanation.Level.Timed);
// System.out.println(explain);
// System.out.println();
System.out.println("Took: " + stopWatch.formatTime());

return explain;

}
}
123 changes: 123 additions & 0 deletions qendpoint-store/src/test/resources/queryplan.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
Distinct (resultSizeActual=0, totalTimeActual=5.0s, selfTimeActual=0.001ms)
Projection (resultSizeActual=0, totalTimeActual=5.0s, selfTimeActual=0.0ms)
├── ProjectionElemList
│ ProjectionElem "countryID"
│ ProjectionElem "year"
│ ProjectionElem "amountLots"
│ ProjectionElem "numSingleBidders"
└── Extension (resultSizeActual=0, totalTimeActual=5.0s)
Group (countryID, year) (resultSizeActual=0, totalTimeActual=5.0s, selfTimeActual=2.45ms)
Join (HashJoinIteration) (resultSizeActual=0, totalTimeActual=5.0s, selfTimeActual=0.015ms)
╠══ Extension (resultSizeActual=1, totalTimeActual=5.0s, selfTimeActual=1.47ms) [left]
║ ├── Join (JoinIterator) (resultSizeActual=1, totalTimeActual=5.0s, selfTimeActual=0.041ms)
║ │ ╠══ Join (JoinIterator) (resultSizeActual=1, totalTimeActual=5.0s, selfTimeActual=0.088ms) [left]
║ │ ║ ├── Join (JoinIterator) (resultSizeActual=19, totalTimeActual=5.0s, selfTimeActual=0.117ms) [left]
║ │ ║ │ ╠══ Join (JoinIterator) (resultSizeActual=19, totalTimeActual=5.0s, selfTimeActual=0.096ms) [left]
║ │ ║ │ ║ ├── Join (JoinIterator) (resultSizeActual=19, totalTimeActual=5.0s, selfTimeActual=1.9s) [left]
║ │ ║ │ ║ │ ╠══ Join (JoinIterator) (resultSizeActual=1.9M, totalTimeActual=1.8s, selfTimeActual=144ms) [left]
║ │ ║ │ ║ │ ║ ├── Join (JoinIterator) (resultSizeActual=19, totalTimeActual=2.24ms, selfTimeActual=0.129ms) [left]
║ │ ║ │ ║ │ ║ │ ╠══ Join (JoinIterator) (resultSizeActual=8, totalTimeActual=1.84ms, selfTimeActual=0.473ms) [left]
║ │ ║ │ ║ │ ║ │ ║ ├── Join (JoinIterator) (resultSizeActual=10, totalTimeActual=0.45ms, selfTimeActual=0.203ms) [left]
║ │ ║ │ ║ │ ║ │ ║ │ ╠══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.106ms, selfTimeActual=0.106ms) [left]
║ │ ║ │ ║ │ ║ │ ║ │ ║ s: Var (name=resultnotice)
║ │ ║ │ ║ │ ║ │ ║ │ ║ p: Var (name=_const_6aa9a9c_uri, value=http://data.europa.eu/a4g/ontology#refersToRole, anonymous)
║ │ ║ │ ║ │ ║ │ ║ │ ║ o: Var (name=buyerrole)
║ │ ║ │ ║ │ ║ │ ║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=10, totalTimeActual=0.141ms, selfTimeActual=0.141ms) [right]
║ │ ║ │ ║ │ ║ │ ║ │ s: Var (name=proc)
║ │ ║ │ ║ │ ║ │ ║ │ p: Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
║ │ ║ │ ║ │ ║ │ ║ │ o: Var (name=_const_be18ee7b_uri, value=http://data.europa.eu/a4g/ontology#Procedure, anonymous)
║ │ ║ │ ║ │ ║ │ ║ └── Filter (resultSizeActual=8, totalTimeActual=0.918ms, selfTimeActual=0.727ms) [right]
║ │ ║ │ ║ │ ║ │ ║ ╠══ Compare (!=)
║ │ ║ │ ║ │ ║ │ ║ ║ Var (name=p)
║ │ ║ │ ║ │ ║ │ ║ ║ ValueConstant (value=http://publications.europa.eu/resource/authority/procurement-procedure-type/neg-wo-call)
║ │ ║ │ ║ │ ║ │ ║ ╚══ StatementPattern [index: SPO] (resultSizeActual=9, totalTimeActual=0.191ms, selfTimeActual=0.191ms)
║ │ ║ │ ║ │ ║ │ ║ s: Var (name=proc)
║ │ ║ │ ║ │ ║ │ ║ p: Var (name=_const_9c756f6b_uri, value=http://data.europa.eu/a4g/ontology#hasProcedureType, anonymous)
║ │ ║ │ ║ │ ║ │ ║ o: Var (name=p)
║ │ ║ │ ║ │ ║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=19, totalTimeActual=0.268ms, selfTimeActual=0.268ms) [right]
║ │ ║ │ ║ │ ║ │ s: Var (name=proc)
║ │ ║ │ ║ │ ║ │ p: Var (name=_const_9c3f1eec_uri, value=http://data.europa.eu/a4g/ontology#hasProcurementScopeDividedIntoLot, anonymous)
║ │ ║ │ ║ │ ║ │ o: Var (name=lot)
║ │ ║ │ ║ │ ║ └── StatementPattern [index: SPO] (resultSizeActual=1.9M, totalTimeActual=1.6s, selfTimeActual=1.6s) [right]
║ │ ║ │ ║ │ ║ s: Var (name=stat)
║ │ ║ │ ║ │ ║ p: Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
║ │ ║ │ ║ │ ║ o: Var (name=_const_ea79e75_uri, value=http://data.europa.eu/a4g/ontology#SubmissionStatisticalInformation, anonymous)
║ │ ║ │ ║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=19, totalTimeActual=1.2s, selfTimeActual=1.2s) [right]
║ │ ║ │ ║ │ s: Var (name=stat)
║ │ ║ │ ║ │ p: Var (name=_const_25686184_uri, value=http://data.europa.eu/a4g/ontology#concernsSubmissionsForLot, anonymous)
║ │ ║ │ ║ │ o: Var (name=lot)
║ │ ║ │ ║ └── StatementPattern [index: SPO] (resultSizeActual=19, totalTimeActual=0.242ms, selfTimeActual=0.242ms) [right]
║ │ ║ │ ║ s: Var (name=stat)
║ │ ║ │ ║ p: Var (name=_const_98c73a3c_uri, value=http://data.europa.eu/a4g/ontology#hasReceivedTenders, anonymous)
║ │ ║ │ ║ o: Var (name=bidders)
║ │ ║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=19, totalTimeActual=0.205ms, selfTimeActual=0.205ms) [right]
║ │ ║ │ s: Var (name=resultnotice)
║ │ ║ │ p: Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
║ │ ║ │ o: Var (name=_const_77e914ad_uri, value=http://data.europa.eu/a4g/ontology#ResultNotice, anonymous)
║ │ ║ └── StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.068ms, selfTimeActual=0.068ms) [right]
║ │ ║ s: Var (name=resultnotice)
║ │ ║ p: Var (name=_const_183bd06d_uri, value=http://data.europa.eu/a4g/ontology#refersToProcedure, anonymous)
║ │ ║ o: Var (name=proc)
║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.061ms, selfTimeActual=0.061ms) [right]
║ │ s: Var (name=resultnotice)
║ │ p: Var (name=_const_1b0b00ca_uri, value=http://data.europa.eu/a4g/ontology#hasDispatchDate, anonymous)
║ │ o: Var (name=ddate)
║ └── ExtensionElem (year)
║ FunctionCall (http://www.w3.org/2005/xpath-functions#year-from-dateTime)
║ FunctionCall (http://www.w3.org/2001/XMLSchema#dateTime)
║ Var (name=ddate)
╚══ Distinct (new scope) (resultSizeActual=1, totalTimeActual=1.02ms, selfTimeActual=0.655ms) [right]
Projection (resultSizeActual=1, totalTimeActual=0.365ms, selfTimeActual=0.025ms)
╠══ ProjectionElemList
║ ProjectionElem "buyerrole"
║ ProjectionElem "countryID"
╚══ Join (JoinIterator) (resultSizeActual=1, totalTimeActual=0.34ms, selfTimeActual=0.083ms)
├── Join (JoinIterator) (resultSizeActual=1, totalTimeActual=0.227ms, selfTimeActual=0.026ms) [left]
│ ╠══ Join (JoinIterator) (resultSizeActual=1, totalTimeActual=0.179ms, selfTimeActual=0.064ms) [left]
│ ║ ├── Join (JoinIterator) (resultSizeActual=1, totalTimeActual=0.074ms, selfTimeActual=0.047ms) [left]
│ ║ │ ╠══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.011ms, selfTimeActual=0.011ms) [left]
│ ║ │ ║ s: Var (name=buyerrole)
│ ║ │ ║ p: Var (name=_const_beb855c2_uri, value=http://data.europa.eu/a4g/ontology#playedBy, anonymous)
│ ║ │ ║ o: Var (name=org)
│ ║ │ ╚══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.016ms, selfTimeActual=0.016ms) [right]
│ ║ │ s: Var (name=org)
│ ║ │ p: Var (name=_const_beb18915_uri, value=https://www.w3.org/ns/legal#registeredAddress, anonymous)
│ ║ │ o: Var (name=orgaddress)
│ ║ └── StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.041ms, selfTimeActual=0.041ms) [right]
│ ║ s: Var (name=orgaddress)
│ ║ p: Var (name=_const_2f7de0e1_uri, value=http://data.europa.eu/a4g/ontology#hasCountryCode, anonymous)
│ ║ o: Var (name=countrycode)
│ ╚══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.022ms, selfTimeActual=0.022ms) [right]
│ s: Var (name=countrycode)
│ p: Var (name=_const_a825a5f4_uri, value=http://purl.org/dc/elements/1.1/identifier, anonymous)
│ o: Var (name=countryID)
└── Filter (resultSizeActual=1, totalTimeActual=0.03ms, selfTimeActual=0.009ms) [right]
╠══ Compare (!=)
║ Var (name=buytype)
║ ValueConstant (value=http://publications.europa.eu/resource/authority/buyer-legal-type/eu-int-org)
╚══ StatementPattern [index: SPO] (resultSizeActual=1, totalTimeActual=0.022ms, selfTimeActual=0.022ms)
s: Var (name=org)
p: Var (name=_const_1abd8d4b_uri, value=http://data.europa.eu/a4g/ontology#hasBuyerType, anonymous)
o: Var (name=buytype)
GroupElem (amountLots)
Count (Distinct)
Var (name=lot)
GroupElem (numSingleBidders)
Sum
If
Compare (=)
Var (name=bidders)
ValueConstant (value="1"^^<http://www.w3.org/2001/XMLSchema#integer>)
ValueConstant (value="1"^^<http://www.w3.org/2001/XMLSchema#integer>)
ValueConstant (value="0"^^<http://www.w3.org/2001/XMLSchema#integer>)
ExtensionElem (amountLots)
Count (Distinct)
Var (name=lot)
ExtensionElem (numSingleBidders)
Sum
If
Compare (=)
Var (name=bidders)
ValueConstant (value="1"^^<http://www.w3.org/2001/XMLSchema#integer>)
ValueConstant (value="1"^^<http://www.w3.org/2001/XMLSchema#integer>)
ValueConstant (value="0"^^<http://www.w3.org/2001/XMLSchema#integer>)
Loading
Loading