Ansible role for automating the deployment of metadata using IBM Metadata Asset Manager brokers and bridges.
New to Ansible? This simple introduction might help.
- Ansible v2.4.x
- 'dsadm'-become-able network access to an IBM Information Server environment
- Inventory group names setup the same as
IBM.infosvr
role
See defaults/main.yml
for inline documentation, and the example below for the main variables needed.
Each broker and bridge has its own unique set of parameters. These are documented in more detail in each of the YAML files under vars/
, and the bare minimum requirements for all brokers and bridges are documented in vars/simple_examples.yml
. Over time the intention is to get all brokers and bridges working, but the list will grow gradually -- expect that only those included in vars/
for a given version of this role are working.
The role is primarily inteded to be imported into other playbooks as-needed for the deployment of metadata -- through any of the supported brokers and bridges in the Information Server environment. (Thus the need for Ansible v2.4.x and the import_role
module.)
The following example will:
- Add the DB2 drivers included with Information Server to the JDBC configuration.
- Create a data connection called
AutoJDBC
(if it does not already exist) to connect to a DB2 database calledMYDB
atmyhost.somewhere.com
. (requires Information Server v11.7+) - Create an import area, and run the import, for metadata of any files found in
/data/loadable
on the file system of the Information Server engine tier (and record the hostname on which they are found as 'IS-SERVER.IBM.COM'). - For the
DB2INST1
schema ofMYDB
on theAutoJDBC
connection, automatically import any metadata (ie. for tables, columns), then run column analysis on these, automatically-detect term assignments, run data quality analysis, and once completed publish any results to Information Governance Catalog. (requires Information Server v11.7+)
Note tha the order in which the variables are defined does not matter -- the role automatically takes care of running them in the appropriate order such that dependent objects are run first (eg. JDBC configuration is completed before attempting to do any data connectivity via data connections or import areas).
- import_role: name=IBM.infosvr-metadata-asset-manager
vars:
import_areas:
- name: Simple_LocalFileConnector_ImportArea
type: LocalFileConnector
description: "A simple sample (setting only required fields) for a LocalFileConnector import area"
metadata_interchange_server: myhost.domain.com
dcn:
name: LOCALFS
assets_to_import:
- "folder[/data/loadable]"
hostname: "IS-SERVER.IBM.COM"
jdbc_entries:
classpaths:
- /opt/IBM/InformationServer/ASBNode/lib/java/db2jcc.jar
classnames:
- com.ibm.db2.jcc.DB2Driver
data_connections:
- name: AutoJDBC
type: JDBCConnector
description: Data connection for automatically discovering against a JDBC source
url: jdbc:db2://myhost.somewhere.com:50000/MYDB
username: db2inst1
password: "{{ a_password_from_eg_vault }}"
discover_sources:
- dcn: AutoJDBC
project: UGDefaultWorkspace
target_host: myhost.somewhere.com
steps:
- import
- columnAnalysis
- termAssignment
- dataQualityAnalysis
- publish
parameters:
rootAssets: schema[MYDB|DB2INST1]
Asset_description_already_exists: Replace_existing_description
Use this variable to provide a list (array) of complex structures, each of which defines an import area for Metadata Asset Manager. If the import area does not yet exist, it will be created and then loaded; if an import area by the same name already exists its metadata will be re-imported (the import area will not be replaced).
Example structures, fully documented, can be found under vars/documented_*.yml
. Simple structures can be found under vars/simple_examples.yml
.
Available only for v11.7+, this variable can be used to define just data connections rather than a complete import area. This is useful if, for example, you want to make use of the automated Discovery capability available from v11.7+ onwards (ie. leveraging the Open Discovery Framework's ability to pipeline the harvesting of metadata, followed by automated column analysis, etc).
Use this variable to define any ODBC entries that should be added to the {DSHOME}/.odbc.ini
file. This is necessary in order to ensure appropriate connectivity via ODBC, eg. by the ODBC connections in DataStage and IMAM.
Generally the required keys within each of these objects are:
name
: the (unique) name of the ODBC entrydescription
: a description of the ODBC entry (should not use the character=
anywhere)type
: the type of ODBC entry, one ofdb2
,dbase
,informix
,oracle
,oraclewire
,sqlserver
,sqlservernative
,sybase
,sybaseiq
,salesforce
,text
,teradata
,openedge
,mysql
,postgres
,greenplum
,hive
,impala
database
: the name of the database (for RDBMS entries)host
: the hostname or IP address of the system hosting the data sourceport
: the port number (for RDBMS entries) -- generally this will also be defaulted to the default port for the particular database type
Since each ODBC driver for different platforms supports a variety of platform-specific options, these can also be (optionally) specified. See the templates in templates/odbc/*.j2
for the options that can be additionally provided; any that are not mandatory (listed above) will be automatically set to their default values in the ODBC configuration if you do not specify other values for them.
Finally, if you are aware of additional properties that you want to add to a particular entry, which have no default values (ie. are not already listed in the template mentioned above), add them to an extras
entry as key-value pairs.
Examples:
odbc_entries:
- name: IADB on DB2
description: Connection to IADB on DB2
type: db2
database: IADB
host: infosvr.vagrant.ibm.com
- name: Test database on Oracle
description: Connection to some test data set on Oracle
type: oracle
database: TESTDB
host: infosvr.vagrant.ibm.com
SID: TESTDB
extras:
- QueryTimeout: -1
- ColumnSizeAsCharacter: 1
- ColumnsAsChar: 1
Use this variable to define any JDBC classes that should be included in the {DSHOME}/isjdbc.config
file. This is necessary in order to ensure appropriate connectivity via JDBC, eg. by the JDBC connections in DataStage and IMAM.
There are two sub-keys required:
classpaths
: defines the locations of any Java classes that should be added to the CLASSPATH, ie. which provide JDBC driversclassnames
: defines the names of the Java classes that provide the JDBC drivers within the classpaths above
The role will ensure that any classpaths or classnames not already included in the configuration file are added to it, and will leave any that are already present.
Examples:
jdbc_entries:
classpaths:
- "{{ ibm_infosvr_metadata_asset_mgr_install_location }}/ASBNode/lib/java/db2jcc.jar"
classnames:
- com.ibm.db2.jcc.DB2Driver
Use this variable to define any DDL files (containing CREATE TABLE statements) that should be used to generate OSH schema files for data files that would capture the same content as the database tables.
All parameters are required, except for the tables
parameter -- if not specified, an OSH schema will be generated for every CREATE TABLE statement found in the specified ddl
.
This will create the following:
- one (blank) data file under the specified
dest
directory for each CREATE TABLE statement in the specifiedddl
(limited by the tables defined in the optionaltables
parameter), using the specifiedfileext
as the file extension - one OSH schema file under the same specified
dest
directory for each of the blank data files created, appending.osh
as an additional file extension
Examples:
osh_schemas:
- ddl: /some/location/MYDB.sql
structure: "file_format: 'delimited', header: 'false'"
recordfmt: "delim='|', final_delim=end, null_field='', charset=UTF-8"
dest: /some/target/directory
fileext: csv
tables:
- TABLE1
- TABLE2
The example above will create:
/some/target/directory/TABLE1.csv
/some/target/directory/TABLE1.csv.osh
/some/target/directory/TABLE2.csv
/some/target/directory/TABLE2.csv.osh
Use this variable to define any data sources that should be automatically discovered using the Open Discovery Framework in v11.7+
The following sub-keys are required for each entry:
dcn
: specifies the name of the data connection to use for discoveryproject
: specifies the name of the Information Analyzer project in which to do the discoverytarget_host
: specifies the hostname that should be used for the source that is being discovered
The remainder of the sub-keys are optional:
parameters
: provides a set of sub-keys defining additional behaviour or restrictions to apply to the discovery, eg:rootAssets
: defines the subset of assets for which to run discovery (without which all assets on the target will be discovered -- eg. all files in all directories, all tables across all schemas)Asset_description_already_exists
: defines how to handle the situation where a particular technical asset with the same identity already exists- other parameters are also possible depending on the type of source -- see the Information Server Knowledge Centre for more details
steps
: indicates which steps of the discovery process should be applied, and can include:import
: discovers and ingests the technical metadata for the source to the Information Governance CatalogcolumnAnalysis
: runs column analysis agains the source, determining formats, frequency distributions of values, and detecting data classes (dependent on theimport
being done)termAssignment
: attempts to automatically detect business terms to assign to the technical metadata based on their detected data classes, column names, and previous relationships from which the system has learned (if configured) (dependent on thecolumnAnalysis
being done), including applying any Automation Rules that automatically assign data rules based on the assigned termsdataQualityAnalysis
: runs through a set of standard quality checks, eg. looking for outlier values, and any data rules that have been automatically assignedpublish
: publishes the results of all of the other steps to the Information Governance Catalog
Note that up until the last publish
step, all work has been kept in an unpublished state in the Information Analyzer project / workspace. Once published, all of the various analyses are visible by anyone in the Information Governance Catalog. You may therefore wish to leave off this last publish
step in order to force a review (and manual decision to publish) within Information Analyzer. (By default, if the steps are not specified, the discovery will include all steps except the publish
step.)
Also be aware that the schema[DBNAME|SCHEMA]
format of the rootAssets
, when applied to JDBC connections, may require special values for the DBNAME
portion (eg. for DB2, since the database is inherent in the URL provided as part of the data connection, you always provide db2
as the DBNAME
here, rather than the actual database name).
Examples:
discover_sources:
- dcn: AutoJDBC
project: UGDefaultWorkspace
target_host: myhost.somewhere.com
steps:
- import
- columnAnalysis
- termAssignment
- dataQualityAnalysis
- publish
parameters:
rootAssets: schema[MYDB|DB2INST1]
Asset_description_already_exists: Replace_existing_description
Apache 2.0
Christopher Grote