Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in LongArrayDisk when trying to run qepSearch.sh on large HDT file #471

Open
1 of 5 tasks
balhoff opened this issue Jul 2, 2024 · 2 comments
Open
1 of 5 tasks
Labels
bug Something isn't working

Comments

@balhoff
Copy link

balhoff commented Jul 2, 2024

Part of the endpoint? (leave empty if you don't know)

  • Backend (qendpoint-backend)
  • Store (qendpoint-backend)
  • Core (qendpoint-core)
  • Frontend (qendpoint-frontend)
  • Other

Description of the issue

I'm trying to create an index for a huge HDT file (29,773,033,292 triples). I'm doing this by trying to start qepSearch.sh.

Excepted behavior

I expect a file mytriples.hdt.index.v1-1 to be generated, and then be able to search for triples.

Obtained behavior

After about 20 minutes, I get this output:

10:16:06,369 |-INFO in ch.qos.logback.classic.LoggerContext[default] - This is logback-classic version 1.4.5
10:16:06,441 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
10:16:06,446 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/home/balhoff/qendpoint-cli-1.16.1/lib/qendpoint-1.16.1.jar!/logback.xml]
10:16:06,448 |-WARN in ch.qos.logback.classic.util.DefaultJoranConfigurator@45b9a632 - Resource [logback.xml] occurs multiple times on the classpath.
10:16:06,448 |-WARN in ch.qos.logback.classic.util.DefaultJoranConfigurator@45b9a632 - Resource [logback.xml] occurs at [jar:file:/home/balhoff/qendpoint-cli-1.16.1/lib/qendpoint-1.16.1.jar!/logback.xml]
10:16:06,448 |-WARN in ch.qos.logback.classic.util.DefaultJoranConfigurator@45b9a632 - Resource [logback.xml] occurs at [jar:file:/home/balhoff/qendpoint-cli-1.16.1/lib/qendpoint-backend-1.16.1.jar!/logback.xml]
10:16:06,455 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@25d250c6 - URL [jar:file:/home/balhoff/qendpoint-cli-1.16.1/lib/qendpoint-1.16.1.jar!/logback.xml] is not of type file
10:16:06,610 |-INFO in ch.qos.logback.core.model.processor.AppenderModelHandler - Processing appender named [STDOUT]
10:16:06,611 |-INFO in ch.qos.logback.core.model.processor.AppenderModelHandler - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
10:16:06,620 |-INFO in ch.qos.logback.core.model.processor.ImplicitModelHandler - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
10:16:06,636 |-INFO in ch.qos.logback.classic.model.processor.RootLoggerModelHandler - Setting level of ROOT logger to INFO
10:16:06,636 |-INFO in ch.qos.logback.core.model.processor.AppenderRefModelHandler - Attaching appender named [STDOUT] to Logger[ROOT]
10:16:06,637 |-INFO in ch.qos.logback.core.model.processor.DefaultProcessor@79e2c065 - End of configuration.
10:16:06,639 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@36bc55de - Registering current configuration as safe fallback point
[main][          ] 0.00  reading buffer
10:32:41.515 [main] INFO  c.t.q.c.triples.impl.BitmapTriples - Count Objects in 15 min 54 sec 607 ms 784 us Max was: 2137208329
10:33:28.540 [main] INFO  c.t.q.c.triples.impl.BitmapTriples - Bitmap in 47 sec 16 ms 286 us
Exception in thread "main" java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
	at com.the_qa_company.qendpoint.core.util.disk.LongArrayDisk.set0(LongArrayDisk.java:236)
	at com.the_qa_company.qendpoint.core.util.disk.LongArrayDisk.clear(LongArrayDisk.java:289)
	at com.the_qa_company.qendpoint.core.util.disk.LongArrayDisk.<init>(LongArrayDisk.java:95)
	at com.the_qa_company.qendpoint.core.util.disk.LongArrayDisk.<init>(LongArrayDisk.java:62)
	at com.the_qa_company.qendpoint.core.util.disk.LongArrayDisk.<init>(LongArrayDisk.java:58)
	at com.the_qa_company.qendpoint.core.compact.sequence.SequenceLog64BigDisk.<init>(SequenceLog64BigDisk.java:80)
	at com.the_qa_company.qendpoint.core.compact.sequence.SequenceLog64BigDisk.<init>(SequenceLog64BigDisk.java:72)
	at com.the_qa_company.qendpoint.core.triples.impl.BitmapTriples$1.<init>(BitmapTriples.java:514)
	at com.the_qa_company.qendpoint.core.triples.impl.BitmapTriples.createSequence64(BitmapTriples.java:514)
	at com.the_qa_company.qendpoint.core.triples.impl.BitmapTriples.createIndexObjectMemoryEfficient(BitmapTriples.java:773)
	at com.the_qa_company.qendpoint.core.triples.impl.BitmapTriples.generateIndex(BitmapTriples.java:1005)
	at com.the_qa_company.qendpoint.core.hdt.impl.HDTImpl.loadOrCreateIndex(HDTImpl.java:526)
	at com.the_qa_company.qendpoint.core.hdt.HDTManagerImpl.doMapIndexedHDT(HDTManagerImpl.java:99)
	at com.the_qa_company.qendpoint.core.hdt.HDTManager.mapIndexedHDT(HDTManager.java:448)
	at com.the_qa_company.qendpoint.tools.QEPSearch.executeHDT(QEPSearch.java:361)
	at com.the_qa_company.qendpoint.tools.QEPSearch.execute(QEPSearch.java:934)
	at com.the_qa_company.qendpoint.tools.QEPSearch.main(QEPSearch.java:1322)

How to reproduce

Using JDK 17.0.2, export JAVA_OPTIONS="-Xmx500G -XX:+UseParallelGC". Then:

qepSearch.sh mytriples.hdt

The file mytriples.hdt is 344 GB. I can provide somehow if it is helpful.

Endpoint version

1.16.1

Do I want to contribute to fix it?

Maybe

Something else?

No response

@balhoff balhoff added the bug Something isn't working label Jul 2, 2024
@ate47
Copy link
Collaborator

ate47 commented Jul 3, 2024

Most of the memory implementations are old and not really reliable for large datasets (at least 1B triples). I suggest you to only use disk implementation for this kind of workload.

To enable the disk indexing you can use these configs:

# use disk implementation
bitmaptriples.indexmethod=disk
# directory to compute the index
bitmaptriples.sequence.disk.location=disk-work-dir
# use disk locations and indexes
bitmaptriples.sequence.disk=true
bitmaptriples.sequence.disk.subindex=true

It can be done in with the -config or -options params

@balhoff
Copy link
Author

balhoff commented Jul 9, 2024

@ate47 thank you! Your suggestion worked perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants