check_column
Module Contents
Classes
Performs one or more of the templated checks in the column_checks dictionary. |
Functions
|
|
|
Performs one or more of the templated checks in the column_checks dictionary. |
- class check_column.ColumnCheckOperator(dataset, column_mapping, partition_clause=None, task_id=None, **kwargs)
Bases:
airflow.providers.common.sql.operators.sql.SQLColumnCheckOperator
Performs one or more of the templated checks in the column_checks dictionary. Checks are performed on a per-column basis specified by the column_mapping. Each check can take one or more of the following options: - equal_to: an exact value to equal, cannot be used with other comparison options - greater_than: value that result should be strictly greater than - less_than: value that results should be strictly less than - geq_to: value that results should be greater than or equal to - leq_to: value that results should be less than or equal to - tolerance: the percentage that the result may be off from the expected value
- Parameters:
dataset (Union[astro.table.BaseTable, pandas.DataFrame]) – the table or dataframe to run checks on
column_mapping (Dict[str, Dict[str, Any]]) – the dictionary of columns and their associated checks, e.g.
partition_clause (Optional[str]) –
task_id (Optional[str]) –
{ "col_name": { "null_check": { "equal_to": 0, }, "min": { "greater_than": 5, "leq_to": 10, "tolerance": 0.2, }, "max": {"less_than": 1000, "geq_to": 10, "tolerance": 0.01}, } }
- get_db_hook()
Get the database hook for the connection.
- Returns:
the database hook object.
- Return type:
Any
- execute(context)
Derive when creating an operator.
Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.
- Parameters:
context (astro.utils.compat.typing.Context) –
- get_check_result(check_name, column_name)
Get the check method results post validating the dataframe
- Parameters:
check_name (str) –
column_name (str) –
- process_checks()
Process all the checks and print the result or raise an exception in the event of failed checks
- check_column.get_checks_string(check, col)
- check_column.check_column(dataset, column_mapping, partition_clause=None, task_id=None, **kwargs)
Performs one or more of the templated checks in the column_checks dictionary. Checks are performed on a per-column basis specified by the column_mapping. Each check can take one or more of the following options: - equal_to: an exact value to equal, cannot be used with other comparison options - greater_than: value that result should be strictly greater than - less_than: value that results should be strictly less than - geq_to: value that results should be greater than or equal to - leq_to: value that results should be less than or equal to - tolerance: the percentage that the result may be off from the expected value
- Parameters:
dataset (Union[astro.table.BaseTable, pandas.DataFrame]) – dataframe or BaseTable that has to be validated
column_mapping (Dict[str, Dict[str, Any]]) – the dictionary of columns and their associated checks, e.g.
partition_clause (Optional[str]) –
task_id (Optional[str]) –
- Return type:
{ "col_name": { "null_check": { "equal_to": 0, }, "min": { "greater_than": 5, "leq_to": 10, "tolerance": 0.2, }, "max": {"less_than": 1000, "geq_to": 10, "tolerance": 0.01}, } }