{'errorCode': 410, 'message': 'BrokerResourceMissingError'}
This essentially implies that the Pinot Broker assigned to the table specified in the query was not found. A common root cause for this is a typo in the table name in the query. Another uncommon reason could be if there wasn't actually a broker with required broker tenant tag for the table.
Here's the page explaining the Pinot response format: https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql/response-format
"timestamp" is a reserved keyword in SQL. Escape timestamp with double quotes.
select "timestamp" from myTable
Other commonly encountered reserved keywords are date, time, table.
For filtering on STRING columns, use single quotes
SELECT COUNT(*) from myTable WHERE column = 'foo'
The fields in the ORDER BY
clause must be one of the group by clauses or aggregations, BEFORE applying the alias. Therefore, this will not work
SELECT count(colA) as aliasA, colA from tableA GROUP BY colA ORDER BY aliasA
Instead, this will work
SELECT count(colA) as sumA, colA from tableA GROUP BY colA ORDER BY count(colA)
No. Pagination only works for SELECTION queries
You can add this at the end of your query: option(timeoutMs=X)
. For eg: the following example will use a timeout of 20 seconds for the query:
SELECT COUNT(*) from myTable option(timeoutMs=20000)
Add these two configs for Pinot server and broker to start tracking of running queries. The query tracks are added and cleaned as query starts and ends, so should not consume much resource.
pinot.server.enable.query.cancellation=true // false by default
pinot.broker.enable.query.cancellation=true // false by default
Then use the Rest APIs on Pinot controller to list running queries and cancel them via the query ID and broker ID (as query ID is only local to broker), like below:
GET /queries: to show running queries as tracked by all brokers
Response example: `{
"Broker_192.168.0.105_8000": {
"7": "select G_old from baseballStats limit 10",
"8": "select G_old from baseballStats limit 100"
}
}`
DELETE /query/{brokerId}/{queryId}[?verbose=false/true]: to cancel a running query
with queryId and brokerId. The verbose is false by default, but if set to true,
responses from servers running the query also return.
Response example: `Cancelled query: 8 with responses from servers:
{192.168.0.105:7501=404, 192.168.0.105:7502=200, 192.168.0.105:7500=200}`
In order to speed up aggregations, you can enable metrics aggregation on the required column by adding a metric field in the corresponding schema and setting aggregateMetrics
to true in the table config. You can also use a star-tree index config for such columns (read more about star-tree here)
There are 2 ways to verify this:
- Log in to a server that hosts segments of this table. Inside the data directory, locate the segment directory for this table. In this directory, there is a file named
index_map
which lists all the indexes and other data structures created for each segment. Verify that the requested index is present here. - During query: Use the column in the filter predicate and check the value of
numEntriesScannedInFilter
. If this value is 0, then indexing is working as expected (works for Inverted index)
Yes, Pinot uses a default value of LIMIT 10 in queries. The reason behind this default value is to avoid unintentionally submitting expensive queries that end up fetching or processing a lot of data from Pinot. Users can always overwrite this by explicitly specifying a LIMIT value.
Pinot does not cache query results, each query is computed in its entirety. Note though, running the same or similar query multiple times will naturally pull in segment pages into memory making subsequent calls faster. Also, for realtime systems, the data is changing in realtime, so results cannot be cached. For offline-only systems, caching layer can be built on top of Pinot, with invalidation mechanism built-in to invalidate the cache when data is pushed into Pinot.
Pinot memory maps segments. It warms up during the first query, when segments are pulled into the memory by the OS. Subsequent queries will have the segment already loaded in memory, and hence will be faster. The OS is responsible for bringing the segments into memory, and also removing them in favor of other segments when other segments not already in memory are accessed.
The query execution engine will prefer to use StarTree index for all queries where it can be used. The criteria to determine whether StarTree index can be used is as follows:
- All aggregation function + column pairs in the query must exist in the StarTree index.
- All dimensions that appear in filter predicates and group-by should be StarTree dimensions.
For queries where above is true, StarTree index is used. For other queries, the execution engine will default to using the next best index available.