Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch 2.4.2 and Geoshape indexing improvements #1327

Open
wants to merge 13 commits into
base: titan11
Choose a base branch
from

Conversation

sjudeng
Copy link

@sjudeng sjudeng commented Jun 20, 2016

This branch continues the work done under #1153 and updates to Elasticsearch 2.4.2 and Lucene 5.5.2. Other associated dependencies were updated as shown below.

Dependency Previous Version New Version
Elasticsearch 1.5.1 2.4.2
Lucene 4.10.4 5.5.2
Spatial4j 0.4.1 0.5
Jackson2 2.4.4 2.6.6
Netty 3.6.6 3.10.5
Joda 1.6.2 2.8.2
commons-cli 1.2 1.3.1

Leveraging new GeoJSON serializers added in Spatial4j 0.5, this branch also includes a significant refactoring of Titan Geoshapes, adding support for indexing geo properties with line and polygon types and querying by point, line and polygon in all index backends (Elasticsearch, Solr and Lucene). The support for querying by polygon continued the work done in #441 by exposing the capability in the Solr and Lucene indexing backends.

My contributions in this branch are public domain.

Compatibility

These updates are not backwards compatible with Elasticsearch 1.x. Ideally Titan could support both Elasticsearch 1.x and 2.x in the same build, but this was not pursued as part of this effort.

Testing

Tests are skipped in the titan-hadoop-1 module. All other tests are passing, including in all storage (BerkelyJE, Cassandra, HBase 0.94/0.96/0.98/1.0) and indexing (Elasticsearch, Solr, Lucene) backends. The last full test run (3466 tests) took 10.3 hours on a CentOS 7 x64 instance with 2 vCPU and 7.5 GB memory.

Notes

  • Geoshape Kryo serialization now requires a custom serializer. The serializer, com.thinkaurelius.titan.hadoop.serialize.TitanKryoRegistrator is defined in the titan-hadoop-core module to avoid adding an (unshaded) Kryo dependency to titan-core. This serializer must be registered with Spark when indexing Geoshape properties.
spark.kryo.registrator=com.thinkaurelius.titan.hadoop.serialize.TitanKryoRegistrator
  • Elasticsearch 2.0 introduced a JarHell class that checks for duplicate classes across the classpath. Attempting to work through these dependency issues in Titan proved messy and was ultimately unsuccessful because of essential Hadoop jars with class overlap. As a workaround, a JarHell class has been added to titan-es to mask and bypass the checks in the original. Users are responsible for checking their own classpath.
  • JTS dependency has been added to titan-core to accommodate the new Geoshape implementation. Likewise the Noggit JSON parsing library was added for GeoJSON parsing.
  • In titan-es/src/test, the bin/elasticsearch, bin/elasticsearch.in.sh and config/elasticsearch.yml files were all updated to the versions from the Elasticsearch 2.3.3 distribution and the two bin files were then updated with Titan-specific changes as annotated in those files.
  • The test class com.thinkaurelius.titan.diskstorage.es.ElasticSearchConfigTest included four tests using an embedded ES instance with various test configuration files. These configuration files differed from the base configuration, elasticserach.yml, only in the cluster name. Because the current version of Elasticsearch no longer supports overriding the configuration file on the command line, and since it didn't appear that the cluster name was an essential part of these tests, they were removed and the custom cluster names are no longer used in those tests.

kainosnoema and others added 11 commits October 18, 2013 17:06
The current version of ElasticSearch supports the GeoPolygonFilter,
so it's trivial to add support for polygons with an arbitray number of
boundary points.

Signed-off-by: Evan Owen <kainosnoema@gmail.com>
Conflicts:
	titan-es/src/main/java/com/thinkaurelius/titan/diskstorage/es/ElasticSearchIndex.java
	titan-test/src/main/java/com/thinkaurelius/titan/diskstorage/indexing/IndexProviderTest.java
…ing and querying Geoshape properties with line and polygon type.
@titan-cla
Copy link

Hi @sjudeng, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

@pluradj
Copy link
Contributor

pluradj commented Aug 2, 2016

@sjudeng Thanks for the contribution.

I'm starting to take a look at this PR. Curious to hear why you chose Lucene 5.5 instead of 6.0. Are you familiar with the differences (especially any breaking API changes) between the two versions?

Also you stated "My contributions in this branch are public domain." Is that because you do not want to sign the CLA? I'd be interested in your thoughts on Titan's future. https://groups.google.com/d/msg/aureliusgraphs/R0RJnvVbgCs/7H10hVjlBQAJ

@sjudeng
Copy link
Author

sjudeng commented Aug 2, 2016

Hi @pluradj,

Elasticsearch 2.3.3 depends on Lucene 5.5.0.

https://github.com/thinkaurelius/titan/blob/1.0.0/pom.xml#L83-L86
https://github.com/elastic/elasticsearch/blob/v2.3.3/pom.xml#L55

It looks like Elasticsearch 5.0.0 (still in alpha) will support Lucene 6.1.0.

https://github.com/elastic/elasticsearch/blob/v5.0.0-alpha4/buildSrc/version.properties#L2

My contributions in this branch are public domain, meaning they're not subject to copyright protection. I indicated this since I'm currently unable to sign the CLA but still wanted to make the contribution available to the community.

For now it looks like TinkerPop is where they're able to continue to make open source contributions. It would be great if Titan remained as an active open source implementation, but if that's not possible hopefully we'll see another open source implementation emerge eventually. In the meantime it looks like questions are still being answered on the mailing list and the community is contributing where possible.

Thanks to you and @dylanht for your work in #1312. I did merge that branch with this one and tested without issue, but that was based on tinkerpop-3.2.0 not 3.2.1-SNAPSHOT.

@pluradj
Copy link
Contributor

pluradj commented Sep 20, 2016

@sjudeng sjudeng changed the title Elasticsearch 2.3.3 and Geoshape indexing improvements Elasticsearch 2.4.2 and Geoshape indexing improvements Dec 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants