-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elasticsearch 2.4.2 and Geoshape indexing improvements #1327
base: titan11
Are you sure you want to change the base?
Conversation
The current version of ElasticSearch supports the GeoPolygonFilter, so it's trivial to add support for polygons with an arbitray number of boundary points. Signed-off-by: Evan Owen <kainosnoema@gmail.com>
…lly even be needed
… titan09-es2.x
This reverts commit fc11a96.
… titan09_merge
Conflicts: titan-es/src/main/java/com/thinkaurelius/titan/diskstorage/es/ElasticSearchIndex.java titan-test/src/main/java/com/thinkaurelius/titan/diskstorage/indexing/IndexProviderTest.java
…ing and querying Geoshape properties with line and polygon type.
Hi @sjudeng, thanks for your contribution! In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. |
@sjudeng Thanks for the contribution. I'm starting to take a look at this PR. Curious to hear why you chose Lucene 5.5 instead of 6.0. Are you familiar with the differences (especially any breaking API changes) between the two versions? Also you stated "My contributions in this branch are public domain." Is that because you do not want to sign the CLA? I'd be interested in your thoughts on Titan's future. https://groups.google.com/d/msg/aureliusgraphs/R0RJnvVbgCs/7H10hVjlBQAJ |
Hi @pluradj, Elasticsearch 2.3.3 depends on Lucene 5.5.0. https://github.com/thinkaurelius/titan/blob/1.0.0/pom.xml#L83-L86 It looks like Elasticsearch 5.0.0 (still in alpha) will support Lucene 6.1.0. https://github.com/elastic/elasticsearch/blob/v5.0.0-alpha4/buildSrc/version.properties#L2 My contributions in this branch are public domain, meaning they're not subject to copyright protection. I indicated this since I'm currently unable to sign the CLA but still wanted to make the contribution available to the community. For now it looks like TinkerPop is where they're able to continue to make open source contributions. It would be great if Titan remained as an active open source implementation, but if that's not possible hopefully we'll see another open source implementation emerge eventually. In the meantime it looks like questions are still being answered on the mailing list and the community is contributing where possible. Thanks to you and @dylanht for your work in #1312. I did merge that branch with this one and tested without issue, but that was based on tinkerpop-3.2.0 not 3.2.1-SNAPSHOT. |
This branch continues the work done under #1153 and updates to Elasticsearch 2.4.2 and Lucene 5.5.2. Other associated dependencies were updated as shown below.
Leveraging new GeoJSON serializers added in Spatial4j 0.5, this branch also includes a significant refactoring of Titan Geoshapes, adding support for indexing geo properties with line and polygon types and querying by point, line and polygon in all index backends (Elasticsearch, Solr and Lucene). The support for querying by polygon continued the work done in #441 by exposing the capability in the Solr and Lucene indexing backends.
My contributions in this branch are public domain.
Compatibility
These updates are not backwards compatible with Elasticsearch 1.x. Ideally Titan could support both Elasticsearch 1.x and 2.x in the same build, but this was not pursued as part of this effort.
Testing
Tests are skipped in the titan-hadoop-1 module. All other tests are passing, including in all storage (BerkelyJE, Cassandra, HBase 0.94/0.96/0.98/1.0) and indexing (Elasticsearch, Solr, Lucene) backends. The last full test run (3466 tests) took 10.3 hours on a CentOS 7 x64 instance with 2 vCPU and 7.5 GB memory.
Notes
com.thinkaurelius.titan.hadoop.serialize.TitanKryoRegistrator
is defined in the titan-hadoop-core module to avoid adding an (unshaded) Kryo dependency to titan-core. This serializer must be registered with Spark when indexing Geoshape properties.bin/elasticsearch
,bin/elasticsearch.in.sh
andconfig/elasticsearch.yml
files were all updated to the versions from the Elasticsearch 2.3.3 distribution and the two bin files were then updated with Titan-specific changes as annotated in those files.com.thinkaurelius.titan.diskstorage.es.ElasticSearchConfigTest
included four tests using an embedded ES instance with various test configuration files. These configuration files differed from the base configuration,elasticserach.yml
, only in the cluster name. Because the current version of Elasticsearch no longer supports overriding the configuration file on the command line, and since it didn't appear that the cluster name was an essential part of these tests, they were removed and the custom cluster names are no longer used in those tests.