Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SuperSet errors with AWS Elasticsearch [Kibana version 6.8.0] #74

Open
harshgadhia opened this issue Nov 5, 2021 · 9 comments
Open

Comments

@harshgadhia
Copy link

harshgadhia commented Nov 5, 2021

Hello preset-io team,

I am facing some issues with using elasticsearch-dbapi version: 0.2.6.
I have described the issue in detail in the apache superset project too. SuperSet errors with AWS Elasticsearch [Kibana version 6.8.0]

To summarize:

Issue 1

  • elasticsearch-dbapi seems to be not able to parse the index metadata from AWS ES endpoint. Issue happening due to https://github.com/preset-io/elasticsearch-dbapi/blob/master/es/opendistro/api.py#L236

    Sample request made to ES:

    curl --location --request GET 'https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/<INDEX- 
    NAME>/_mapping?format=json'
    

    Here's the sample response coming from ES endpoint:

    {"dummy_app":{"mappings":{"dummy_app":{"properties":{"param1":{"type":"boolean"},"param2": 
    {"type":"float"},"param3":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"param4": 
    {"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"param5":{"type":"text","fields":{"keyword": 
    {"type":"keyword","ignore_above":256}}},"param6":{"type":"date"},"deleted":{"type":"boolean"},"event_time": 
    {"type":"date"},"gwGen":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}}}}

Issue 2

  • elasticsearch-dbapi cannot parse response from ES endpoint for a SQL query, getting error "Missing columns field, maybe it's an elastic sql ep" es.exceptions.DataError: Missing columns field, maybe it's an elastic sql ep.
    Issue happening due to https://github.com/preset-io/elasticsearch-dbapi/blob/master/es/opendistro/api.py#L282

    The sample curl command returns valid response from ES, however, possibly the query is not properly formed (it is missing ?format=jdbc):

    curl --location --request POST 'https://vpc-some-search-domain.us-west-2.es.amazonaws.com:443/_opendistro/_sql' 
    \
    --header 'Content-Type: application/json' \
    --data-raw '{
     "query": "SELECT * FROM dummy_index where some_id = 1 order by id DESC limit 1"
    }'
    
    

    Sample response from ES endpoint for the above rest call:

    {"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits": 
    {"total":43464,"max_score":null,"hits":[{"_index":"dummy_index","_type":"dummy_index","_id":"dummy_index-app_1636096855000","_score":null,"_source": 
    {"param1":"value1","param2":false,"param3":"1636096855000","param4":"2021-11-05T07:21:01.439Z"},"sort":[1636096861439]}]}}
    

Environment

  • browser type and version: Google Chrome [Version 95.0.4638.69 (Official Build) (x86_64)]
  • superset version: 1.3.1
  • python version: 3.7
  • node.js version: node -v
  • any feature flags active: None
  • pip elasticsearch-dbapi version: 0.2.6
  • pip elasticsearch version: 7.13.4
  • Kibana version: 6.8.0

Any help from you is greatly appreciated.

@harshgadhia
Copy link
Author

harshgadhia commented Nov 5, 2021

Please see here for the response formats for the Elasticsearch Opendistro SQL.

By default the plugin returns original response from Elasticsearch in JSON. Because this is the native response from Elasticsearch, extra efforts are needed to parse and interpret it.

I am pretty certain that the second issue is happening because ?format=jdbc is not set on the es query endpoint.
See here the path is set to _opendistro/_sql instead it should be _opendistro/_sql?format=jdbc

However, the code assumes (See lines 280 and 281) the response in jdbc format. Hence, the error is happening in parsing the response. Can someone please fix this?

@dpgaspar: I am tagging you since, you are an active member of this project. Thanks in advance!

@harshgadhia
Copy link
Author

In case you need the following:

pip freeze

 pip freeze
aiohttp==3.7.2
alembic==1.4.3
amqp==2.6.1
# Editable install with no version control (apache-superset==1.3.1)
-e /app
apispec==3.3.2
async-timeout==3.0.1
attrs==20.2.0
Authlib==0.15.5
Babel==2.8.0
backoff==1.10.0
billiard==3.6.3.0
bleach==3.3.0
boto3==1.16.10
botocore==1.19.10
Brotli==1.0.9
cached-property==1.5.2
cachelib==0.1.1
cachetools==4.2.4
celery==4.4.7
certifi==2020.6.20
cffi==1.14.3
chardet==3.0.4
click==7.1.2
colorama==0.4.4
configparser==5.0.2
convertdate==2.3.0
cron-descriptor==1.2.24
croniter==0.3.36
cryptography==3.3.2
defusedxml==0.6.0
Deprecated==1.2.11
deprecation==2.1.0
dnspython==2.0.0
elasticsearch==7.13.4
elasticsearch-dbapi==0.2.6
email-validator==1.1.1
et-xmlfile==1.0.1
Flask==1.1.2
Flask-AppBuilder==3.3.0
Flask-Babel==1.0.0
Flask-Caching==1.10.1
Flask-Compress==1.8.0
Flask-Cors==3.0.9
Flask-JWT-Extended==3.24.1
Flask-Login==0.4.1
Flask-Migrate==2.5.3
Flask-OpenID==1.3.0
Flask-SQLAlchemy==2.4.4
flask-talisman==0.7.0
Flask-WTF==0.14.3
future==0.18.2
geographiclib==1.50
geopy==2.0.0
google-api-core==2.1.1
google-api-python-client==2.27.0
google-auth==2.3.1
google-auth-httplib2==0.1.0
googleapis-common-protos==1.53.0
graphlib-backport==1.0.3
gunicorn==20.0.4
holidays==0.10.3
httplib2==0.20.1
humanize==3.1.0
idna==2.10
idps-client==3.94.0
idps-mgmt-swagger-client==3.94.0
idps-sdk==3.94.0
idps-swagger-client==3.94.0
ijson==3.1.2.post0
importlib-metadata==4.8.1
isodate==0.6.0
itsdangerous==1.1.0
jdcal==1.4.1
Jinja2==2.11.3
jmespath==0.10.0
jsonlines==1.2.0
jsonschema==3.2.0
kombu==4.6.11
korean-lunar-calendar==0.2.1
linear-tsv==1.1.0
machina-swagger-client==3.94.0
Mako==1.1.3
Markdown==3.3.3
MarkupSafe==1.1.1
marshmallow==3.9.0
marshmallow-enum==1.5.1
marshmallow-sqlalchemy==0.23.1
msgpack==1.0.0
multidict==5.0.0
mysqlclient==1.4.2.post1
natsort==7.0.1
numpy==1.19.4
openpyxl==3.0.5
packaging==21.0
pandas==1.2.2
parsedatetime==2.6
pgsanity==0.2.9
Pillow==7.2.0
pks-swagger-client==3.94.0
polyline==1.4.0
prison==0.1.3
progress==1.5
protobuf==3.19.0
psycopg2==2.8.5
psycopg2-binary==2.8.5
pyarrow==4.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyathena==2.3.0
pycparser==2.20
pycryptodome==3.11.0
pydruid==0.6.1
PyGithub==1.54.1
PyHive==0.6.3
PyJWT==1.7.1
PyMeeus==0.3.7
pyparsing==2.4.7
pyrsistent==0.16.1
python-dateutil==2.8.1
python-dotenv==0.15.0
python-editor==1.0.4
python-geohash==0.8.5
python3-openid==3.2.0
pytz==2020.4
PyYAML==5.4.1
rcc-swagger-client==3.94.0
redis==3.2.1
requests==2.24.0
requests-aws4auth==1.1.1
rfc3986==1.4.0
rsa==4.7.2
s3transfer==0.3.3
sasl==0.2.1
secrecy-swagger-client==3.94.0
selenium==3.141.0
simplejson==3.17.2
six==1.15.0
slackclient==2.5.0
SQLAlchemy==1.3.20
SQLAlchemy-Utils==0.36.8
sqlparse==0.3.0
tableschema==1.20.0
tabulate==0.8.9
tabulator==1.52.5
tenacity==8.0.1
thrift==0.13.0
thrift-sasl==0.4.2
typing-extensions==3.7.4.3
unicodecsv==0.14.1
uritemplate==4.1.1
urllib3==1.25.11
vine==1.3.0
webencodings==0.5.1
Werkzeug==1.0.1
wrapt==1.12.1
WTForms==2.3.3
WTForms-JSON==0.3.3
xlrd==1.2.0
yarl==1.6.2
zipp==3.4.1

harshgadhia added a commit to harshgadhia/elasticsearch-dbapi that referenced this issue Nov 5, 2021
Reference preset-io#74
Adding format=jdbc to the path for the opendistro sql endpoint
@dpgaspar
Copy link
Member

dpgaspar commented Nov 5, 2021

Hi @harshgadhia,

What's your AWS OpenSearch (ElasticSearch) version? 6.8.0?

@harshgadhia
Copy link
Author

@dpgaspar Yes that is correct:

{
    "name": "lagQbOg",
    "cluster_name": "332592635208:metrics-search-dev",
    "cluster_uuid": "2OAhfNl3S7aumuuBxQcxCQ",
    "version": {
        "number": "6.8.0",
        "build_flavor": "oss",
        "build_type": "zip",
        "build_hash": "8169a24",
        "build_date": "2021-04-21T19:26:55.782637Z",
        "build_snapshot": false,
        "lucene_version": "7.7.0",
        "minimum_wire_compatibility_version": "5.6.0",
        "minimum_index_compatibility_version": "5.0.0"
    },
    "tagline": "You Know, for Search"
}

@harshgadhia
Copy link
Author

Here's the latest information I have:

The Opensearch (aka elasticsearch) has support to query SQL via the _opendistro/_sql endpoint since version >=6.5.
However, only after version >=7.4, the endpoint presumable sends response in jdfc format (that is what must be the assumption I guess, when writing this.

However, for older versions <7.4, you have to explicitly pass on a query parameter ?format=jdbc. Looks like doing this will make the library compatible for the older versions.

Looking forward for your help in this matter, thanks in advance.

@harshgadhia
Copy link
Author

@dpgaspar Did you get a chance to look at this? Really appreciate a response on the above. Thank you in advance!

@dpgaspar
Copy link
Member

sounds reasonable to me @harshgadhia are you willing to make a PR?
We can fetch current cluster version, or create a new connection string parameter

@harshgadhia
Copy link
Author

@dpgaspar sure, I can try looking into it, and send a PR your way!

@prijopauly
Copy link

@harshgadhia,
I am new to superset and I am facing same issue with AWS ES. I am not sure how we can pass query parameter ?format=jdbc in _opendistro/_sql in superset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants