ColumnCheckOperator

Module Contents

Classes

ColumnCheckOperator

Performs one or more of the templated checks in the column_checks dictionary.

Functions

get_checks_string(check, col)

column_check(dataset, column_mapping[, ...])

Performs one or more of the templated checks in the column_checks dictionary.

class ColumnCheckOperator.ColumnCheckOperator(dataset, column_mapping, partition_clause=None, task_id=None, **kwargs)

Bases: airflow.providers.common.sql.operators.sql.SQLColumnCheckOperator

Performs one or more of the templated checks in the column_checks dictionary. Checks are performed on a per-column basis specified by the column_mapping. Each check can take one or more of the following options: - equal_to: an exact value to equal, cannot be used with other comparison options - greater_than: value that result should be strictly greater than - less_than: value that results should be strictly less than - geq_to: value that results should be greater than or equal to - leq_to: value that results should be less than or equal to - tolerance: the percentage that the result may be off from the expected value

Parameters
  • dataset (Union[astro.table.BaseTable, pandas.DataFrame]) – the table or dataframe to run checks on

  • column_mapping (Dict[str, Dict[str, Any]]) – the dictionary of columns and their associated checks, e.g.

  • partition_clause (Optional[str]) –

  • task_id (Optional[str]) –

{
    "col_name": {
        "null_check": {
            "equal_to": 0,
        },
        "min": {
            "greater_than": 5,
            "leq_to": 10,
            "tolerance": 0.2,
        },
        "max": {"less_than": 1000, "geq_to": 10, "tolerance": 0.01},
    }
}
get_db_hook()

Get the database hook for the connection.

Returns

the database hook object.

Return type

airflow.providers.common.sql.hooks.sql.DbApiHook

execute(context)

This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Parameters

context (astro.utils.typing_compat.Context) –

get_check_result(check_name, column_name)

Get the check method results post validating the dataframe

Parameters
  • check_name (str) –

  • column_name (str) –

process_checks()

Process all the checks and print the result or raise an exception in the event of failed checks

ColumnCheckOperator.get_checks_string(check, col)
ColumnCheckOperator.column_check(dataset, column_mapping, partition_clause=None, task_id=None, **kwargs)

Performs one or more of the templated checks in the column_checks dictionary. Checks are performed on a per-column basis specified by the column_mapping. Each check can take one or more of the following options: - equal_to: an exact value to equal, cannot be used with other comparison options - greater_than: value that result should be strictly greater than - less_than: value that results should be strictly less than - geq_to: value that results should be greater than or equal to - leq_to: value that results should be less than or equal to - tolerance: the percentage that the result may be off from the expected value

Parameters
  • dataset (Union[astro.table.BaseTable, pandas.DataFrame]) – dataframe or BaseTable that has to be validated

  • column_mapping (Dict[str, Dict[str, Any]]) – the dictionary of columns and their associated checks, e.g.

  • partition_clause (Optional[str]) –

  • task_id (Optional[str]) –

Return type

ColumnCheckOperator

{
    "col_name": {
        "null_check": {
            "equal_to": 0,
        },
        "min": {
            "greater_than": 5,
            "leq_to": 10,
            "tolerance": 0.2,
        },
        "max": {"less_than": 1000, "geq_to": 10, "tolerance": 0.01},
    }
}