From 2711f38d36921fee1b1bc09ddf4b80f4468d5772 Mon Sep 17 00:00:00 2001 From: Manfred Moser Date: Fri, 29 Dec 2023 13:38:59 -0800 Subject: [PATCH] Add docs for OPA access control --- docs/src/main/sphinx/security.md | 1 + .../sphinx/security/opa-access-control.md | 270 ++++++++++++++++++ docs/src/main/sphinx/security/overview.md | 2 + plugin/trino-opa/README.md | 247 +--------------- 4 files changed, 276 insertions(+), 244 deletions(-) create mode 100644 docs/src/main/sphinx/security/opa-access-control.md diff --git a/docs/src/main/sphinx/security.md b/docs/src/main/sphinx/security.md index 5b1ecf62fb23..62789611d37c 100644 --- a/docs/src/main/sphinx/security.md +++ b/docs/src/main/sphinx/security.md @@ -51,6 +51,7 @@ security/group-file security/built-in-system-access-control security/file-system-access-control +security/opa-access-control ``` ## Security inside the cluster diff --git a/docs/src/main/sphinx/security/opa-access-control.md b/docs/src/main/sphinx/security/opa-access-control.md new file mode 100644 index 000000000000..a21541e3b544 --- /dev/null +++ b/docs/src/main/sphinx/security/opa-access-control.md @@ -0,0 +1,270 @@ +# Open Policy Agent access control + +The Open Policy Agent access control plugin enables the use of [Open Policy +Agent (OPA)](https://www.openpolicyagent.org/) as authorization engine for +fine-grained access control to catalogs, schemas, tables, and more in Trino. +Policies are defined in OPA, and Trino checks access control privileges in OPA. + +## Requirements + +* A running [OPA deployment](https://www.openpolicyagent.org/docs/latest/#running-opa) +* Network connectivity from the Trino cluster to the OPA server + +With the requirements fulfilled, you can proceed to set up Trino and OPA with +your desired access control configuration. + +## Trino configuration + +To use only OPA for access control, create the file `etc/access-control.properties` +with the following minimal configuration: + +```properties +access-control.name=opa +opa.policy.uri=https://your-opa-endpoint/v1/data/allow +``` + +To combine OPA access control with file-based or other access control systems, +configure multiple access control configuration file paths in +`etc/config.properties`: + +```properties +access-control.config-files=etc/trino/file-based.properties,etc/trino/opa.properties +``` + +Order the configuration files list in the desired order of the different systems +for overall access control. Configure each access-control system in the +specified files. + +The following table lists the configuration properties for the OPA access control: + +:::{list-table} OPA access control configuration properties +:widths: 40, 60 +:header-rows: 1 + +* - Name + - Description +* - `opa.policy.uri` + - The **required** URI for the OPA endpoint, for example, + `https://opa.example.com/v1/data/allow`. +* - `opa.policy.batched-uri` + - The **optional** URI for activating batch mode for certain authorization + queries where batching is applicable, for example + `https://opa.example.com/v1/data/batch`. Batch mode is described + [](opa-batch-mode). +* - `opa.log-requests` + - Configure if request details, including URI, headers and the entire body, are + logged prior to sending them to OPA. Defaults to `false`. +* - `opa.log-responses` + - Configure if OPA response details, including URI, status code, headers and + the entire body, are logged. Defaults to `false`. +* - `opa.allow-permission-management` + - Configure if permission management operations are allowed. Find more details in + [](opa-permission-management). Defaults to `false`. +* - `opa.http-client.*` + - Optional HTTP client configurations for the connection from Trino to OPA, + for example `opa.http-client.http-proxy` for configuring the HTTP proxy. + Find more details in [](/admin/properties-http-client). +::: + +### Logging + +When request or response logging is enabled, details are logged at the `DEBUG` +level under the `io.trino.plugin.opa.OpaHttpClient` logger. The Trino logging +configuration must be updated to include this class, to ensure log entries are +created. + +Note that enabling these options produces very large amounts of log data. + +(opa-permission-management)= +### Permission management + +The following operations are allowed or denied based on the setting of +`opa.allow-permission-management` If set to `true`, these operations are +allowed. If set to `false`, they are denied. In both cases, no request is sent +to OPA. + +- `GrantSchemaPrivilege` +- `DenySchemaPrivilege` +- `RevokeSchemaPrivilege` +- `GrantTablePrivilege` +- `DenyTablePrivilege` +- `RevokeTablePrivilege` +- `CreateRole` +- `DropRole` +- `GrantRoles` +- `RevokeRoles` + +The setting defaults to `false` due to the complexity and potential unexpected +consequences of having SQL-style grants and roles together with OPA. + +You must enable permission management if another custom security system in Trino +is capable of grant management and used together with OPA access control. + +Additionally, users are always allowed to show information about roles (`SHOW +ROLES`), regardless of this setting. The following operations are _always_ +allowed: + +- `ShowRoles` +- `ShowCurrentRoles` +- `ShowRoleGrants` + +## OPA configuration + +The OPA access control in Trino contacts OPA for each query and issues an +authorization request. OPA must return a response containing a boolean `allow` +field, which determines whether the operation is permitted or not. + +Policies in OPA are defined with the purpose built policy language Rego. Find +more information in the [detailed +documentation](https://www.openpolicyagent.org/docs/latest/policy-language/). +After the initial installation and configuration in Trino, these policies are +the main configuration aspect for your access control setup. + +A query from the OPA access control in Trino to OPA contains a `context` and an +`action` as its top level fields. + +The `context` object contains all other contextual information about the query: + +- `identity`: The identity of the user performing the operation, containing the + following two fields: + - `user`: username + - `groups`: list of groups this user belongs to +- `softwareStack`: Information about the software stack issuing the request to + OPA. The following information is included: + - `trinoVersion`: Version of Trino used + +The `action` object contains information about what action is performed on what +resources. The following fields are provided: + +- `operation`: the performed operation, for example `SelectFromColumns`. +- `resource`: information about the accessed objects +- `targetResource`: information about any newly created object, if applicable +- `grantee`: grantee of a grant operation. + +Fields that are not applicable for a specific operation are set to null. +Examples are an empty `targetResource` if not modifying a table or schema or +catalog is modified, or an empty `grantee` if not granting permissions is set. +Any null field is omitted altogether from the `action` object. + +### Example requests to OPA + +Accessing a table results in a query similar to the following example: + +```json +{ + "context": { + "identity": { + "user": "foo", + "groups": ["some-group"] + }, + "softwareStack": { + "trinoVersion": "434" + } + }, + "action": { + "operation": "SelectFromColumns", + "resource": { + "table": { + "catalogName": "example_catalog", + "schemaName": "example_schema", + "tableName": "example_table", + "columns": [ + "column1", + "column2", + "column3" + ] + } + } + } +} +``` + +The `targetResource` is used in cases where a new resource, distinct from the one in +`resource` is created. For example, when renaming a table. + +```json +{ + "context": { + "identity": { + "user": "foo", + "groups": ["some-group"] + }, + "softwareStack": { + "trinoVersion": "434" + } + }, + "action": { + "operation": "RenameTable", + "resource": { + "table": { + "catalogName": "example_catalog", + "schemaName": "example_schema", + "tableName": "example_table" + } + }, + "targetResource": { + "table": { + "catalogName": "example_catalog", + "schemaName": "example_schema", + "tableName": "new_table_name" + } + } + } +} +``` + +(opa-batch-mode)= +## Batch mode + +A very powerful feature provided by OPA is its ability to respond to +authorization queries with more complex answers than a `true` or `false` boolean +value. + +Many features in Trino require filtering to determine to which resources a user +is granted access. These resources are catalogs, schema, queries, views, and +others objects. + +If `opa.policy.batched-uri` is not configured, Trino sends one request to OPA +for each object, and then creates a filtered list of permitted objects. + +Configuring `opa.policy.batched-uri` allows Trino to send a request to +the batch endpoint, with a list of resources in one request using the +under `action.filterResources` node. + +All other fields in the request are identical to the non-batch endpoint. + +An OPA policy supporting batch operations must return a list containing the +_indices_ of the items for which authorization is granted. Returning a `null` +value or an empty list is equivalent and denies any access. + +You can add batching support for policies that do not support it: + +```text +package foo + +import future.keywords.contains + +# ... rest of the policy ... +# this assumes the non-batch response field is called "allow" +batch contains i { + some i + raw_resource := input.action.filterResources[i] + allow with input.action.resource as raw_resource +} + +# Corner case: filtering columns is done with a single table item, and many columns inside +# We cannot use our normal logic in other parts of the policy as they are based on sets +# and we need to retain order +batch contains i { + some i + input.action.operation == "FilterColumns" + count(input.action.filterResources) == 1 + raw_resource := input.action.filterResources[0] + count(raw_resource["table"]["columns"]) > 0 + new_resources := [ + object.union(raw_resource, {"table": {"column": column_name}}) + | column_name := raw_resource["table"]["columns"][_] + ] + allow with input.action.resource as new_resources[i] +} +``` diff --git a/docs/src/main/sphinx/security/overview.md b/docs/src/main/sphinx/security/overview.md index 1d82479a81bb..f9d723935ddf 100644 --- a/docs/src/main/sphinx/security/overview.md +++ b/docs/src/main/sphinx/security/overview.md @@ -119,6 +119,8 @@ To implement access control, use: - {doc}`File-based system access control `, where you configure JSON files that specify fine-grained user access restrictions at the catalog, schema, or table level. +- [](opa-access-control), where you use Open Policy Agent to make access control + decisions on a fined-grained level. In addition, Trino {doc}`provides an API ` that allows you to create a custom access control method, or to extend an existing diff --git a/plugin/trino-opa/README.md b/plugin/trino-opa/README.md index 0f363f4d3ddf..166661191670 100644 --- a/plugin/trino-opa/README.md +++ b/plugin/trino-opa/README.md @@ -1,247 +1,6 @@ # trino-opa -This plugin enables Trino to use Open Policy Agent (OPA) as an authorization engine. +This plugin enables Trino to use [Open Policy Agent +(OPA)](https://www.openpolicyagent.org/) as an authorization engine. -For more information on OPA, please refer to the Open Policy Agent [documentation](https://www.openpolicyagent.org/). - -> While every attempt will be made to keep backwards compatibility, this plugin is a recent addition -> and as such the API may change. - -## Configuration - -You will need to configure Trino to use the OPA plugin as its access control engine, then configure the -plugin to contact your OPA endpoint. - -`config.properties` - **enabling the plugin**: - -Make sure to enable the plugin by configuring Trino to pull in the relevant config file for the OPA -authorizer, e.g.: - -```properties -access-control.config-files=/etc/trino/access-control-file-based.properties,/etc/trino/access-control-opa.properties -``` - -`access-control-opa.properties` - **configuring the plugin**: - -Set the access control name to `opa` and specify the policy URI, for example: - -```properties -access-control.name=opa -opa.policy.uri=https://your-opa-endpoint/v1/data/allow -``` - -If you also want to enable the _batch_ mode (see [Batch mode](#batch-mode)), you must additionally set up an -`opa.policy.batched-uri` configuration entry. - -> Batch mode is _not_ a replacement for the "main" URI. The batch mode is _only_ -> used for certain authorization queries where batching is applicable. Even when using -> `opa.policy.batched-uri`, you _must_ still provide an `opa.policy.uri` - -For instance: - -```properties -access-control.name=opa -opa.policy.uri=https://your-opa-endpoint/v1/data/allow -opa.policy.batched-uri=https://your-opa-endpoint/v1/data/batch -``` - -### All configuration entries - -| Configuration name | Required | Default | Description | -|----------------------------------------------|:--------:|:-------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `opa.policy.uri` | Yes | N/A | Endpoint to query OPA | -| `opa.policy.batched-uri` | No | Unset | Endpoint for batch OPA requests | -| `opa.log-requests` | No | `false` | Determines whether requests (URI, headers and entire body) are logged prior to sending them to OPA | -| `opa.log-responses` | No | `false` | Determines whether OPA responses (URI, status code, headers and entire body) are logged | -| `opa.allow-permission-management-operations` | No | `false` | Determines whether permission / role management operations will be allowed. These operations will be allowed or denied based on this setting, no request is sent to OPA | -| `opa.http-client.*` | No | Unset | Additional HTTP client configurations that get passed down. E.g. `opa.http-client.http-proxy` for configuring the HTTP proxy | - -> When request / response logging is enabled, they will be logged at DEBUG level under the `io.trino.plugin.opa.OpaHttpClient` logger, you will need to update -> your log configuration accordingly. -> -> Be aware that enabling these options will produce very large amounts of logs - -##### About permission management operations - -The following operations are controlled by the `opa.allow-permission-management-operations` setting. If this setting is `true`, these -operations will be allowed; they will otherwise be denied. No request is sent to OPA either way: - -- `GrantSchemaPrivilege` -- `DenySchemaPrivilege` -- `RevokeSchemaPrivilege` -- `GrantTablePrivilege` -- `DenyTablePrivilege` -- `RevokeTablePrivilege` -- `CreateRole` -- `DropRole` -- `GrantRoles` -- `RevokeRoles` - -This is due to the complexity and potential unexpected consequences of having SQL-style grants / roles together with OPA, as per [discussion](https://github.com/trinodb/trino/pull/19532#discussion_r1380776593) -on the initial PR. - -Additionally, users are always allowed to show information about roles (`SHOW ROLES`), regardless of this setting. The following operations are _always_ allowed: -- `ShowRoles` -- `ShowCurrentRoles` -- `ShowRoleGrants` - -## OPA queries - -The plugin will contact OPA for each authorization request as defined on the SPI. - -OPA must return a response containing a boolean `allow` field, which will determine whether the operation -is permitted or not. - -The plugin will pass as much context as possible within the OPA request. A simple way of checking -what data is passed in from Trino is to run OPA locally in verbose mode. - -### Query structure - -A query will contain a `context` and an `action` as its top level fields. - -#### Query context: - -While the `action` object contains information about _what_ action is being performed, the `context` object -contains all other contextual information about it. The `context` object contains the following fields: -- `identity`: The identity of the user performing the operation, containing the following 2 fields: - - `user` (string): username - - `groups` (array of strings): list of groups this user belongs to -- `softwareStack`: Information about the software stack running in the Trino server, more fields may be added later, currently: - - `trinoVersion` (string): Trino version - -#### Query action: - -This determines _what_ action is being performed and upon what resources, the top level fields are as follows: - -- `operation` (string): operation being performed -- `resource` (object, nullable): information about the object being operated upon -- `targetResource` (object, nullable): information about the _new object_ being created, if applicable -- `grantee` (object, nullable): grantee of a grant operation. - -Fields that are not applicable for a specific operation (e.g. `targetResource` if not modifying a table/schema/catalog, or `grantee` if not granting -permissions) will be set to null. Any null field will be omitted altogether from the `action` object. - -#### Examples - -Accessing a table will result in a query like the one below: - -```json -{ - "context": { - "identity": { - "user": "foo", - "groups": ["some-group"] - }, - "softwareStack": { - "trinoVersion": "434" - } - }, - "action": { - "operation": "SelectFromColumns", - "resource": { - "table": { - "catalogName": "my_catalog", - "schemaName": "my_schema", - "tableName": "my_table", - "columns": [ - "column1", - "column2", - "column3" - ] - } - } - } -} -``` - -`targetResource` is used in cases where a new resource, distinct from the one in `resource` is being created. For instance, -when renaming a table. - -```json -{ - "context": { - "identity": { - "user": "foo", - "groups": ["some-group"] - }, - "softwareStack": { - "trinoVersion": "434" - } - }, - "action": { - "operation": "RenameTable", - "resource": { - "table": { - "catalogName": "my_catalog", - "schemaName": "my_schema", - "tableName": "my_table" - } - }, - "targetResource": { - "table": { - "catalogName": "my_catalog", - "schemaName": "my_schema", - "tableName": "new_table_name" - } - } - } -} -``` - - -## Batch mode - -A very powerful feature provided by OPA is its ability to respond to authorization queries with -more complex answers than a `true`/`false` boolean value. - -Many features in Trino require _filtering_ to be performed to determine, given a list of resources, -(e.g. tables, queries, views, etc...) which of those a user should be entitled to see/interact with. - -If `opa.policy.batched-uri` is _not_ configured, the plugin will send one request to OPA _per item_ being -filtered, then use the responses from OPA to construct a filtered list containing only those items for which -a `true` response was returned. - -Configuring `opa.policy.batched-uri` will allow the plugin to send a request to that _batch_ endpoint instead, -with a **list** of the resources being filtered under `action.filterResources` (as opposed to `action.resource`). - -> The other fields in the request are identical to the non-batch endpoint. - -An OPA policy supporting batch operations should return a (potentially empty) list containing the _indices_ -of the items for which authorization is granted (if any). Returning a `null` value instead of a list -is equivalent to returning an empty list. - -> We may want to reconsider the choice of using _indices_ in the response as opposed to returning a list -> containing copies of elements from the `filterResources` field in the request for which access should -> be granted. Indices were chosen over copying elements as it made validation in the plugin easier, -> and from the few examples we tried, it also made certain policies a bit simpler. Any feedback is appreciated! - -An interesting side effect of this is that we can add batching support for policies that didn't originally -have it quite easily. Consider the following rego: - -```rego -package foo - -# ... rest of the policy ... -# this assumes the non-batch response field is called "allow" -batch contains i { - some i - raw_resource := input.action.filterResources[i] - allow with input.action.resource as raw_resource -} - -# Corner case: filtering columns is done with a single table item, and many columns inside -# We cannot use our normal logic in other parts of the policy as they are based on sets -# and we need to retain order -batch contains i { - some i - input.action.operation == "FilterColumns" - count(input.action.filterResources) == 1 - raw_resource := input.action.filterResources[0] - count(raw_resource["table"]["columns"]) > 0 - new_resources := [ - object.union(raw_resource, {"table": {"column": column_name}}) - | column_name := raw_resource["table"]["columns"][_] - ] - allow with input.action.resource as new_resources[i] -} -``` +Find more information in the documentation.