check_column

Module Contents

Classes

ColumnCheckOperator

Performs one or more of the templated checks in the column_checks dictionary.

Functions

get_checks_string(check, col)

check_column(dataset, column_mapping[, ...])

Performs one or more of the templated checks in the column_checks dictionary.

class check_column.ColumnCheckOperator(dataset, column_mapping, partition_clause=None, task_id=None, **kwargs)

Bases: airflow.providers.common.sql.operators.sql.SQLColumnCheckOperator

Performs one or more of the templated checks in the column_checks dictionary. Checks are performed on a per-column basis specified by the column_mapping. Each check can take one or more of the following options: - equal_to: an exact value to equal, cannot be used with other comparison options - greater_than: value that result should be strictly greater than - less_than: value that results should be strictly less than - geq_to: value that results should be greater than or equal to - leq_to: value that results should be less than or equal to - tolerance: the percentage that the result may be off from the expected value

Parameters:
  • dataset (Union[astro.table.BaseTable, pandas.DataFrame]) – the table or dataframe to run checks on

  • column_mapping (Dict[str, Dict[str, Any]]) – the dictionary of columns and their associated checks, e.g.

  • partition_clause (Optional[str]) –

  • task_id (Optional[str]) –

{
    "col_name": {
        "null_check": {
            "equal_to": 0,
        },
        "min": {
            "greater_than": 5,
            "leq_to": 10,
            "tolerance": 0.2,
        },
        "max": {"less_than": 1000, "geq_to": 10, "tolerance": 0.01},
    }
}
get_db_hook()

Get the database hook for the connection.

Returns:

the database hook object.

Return type:

Any

execute(context)

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Parameters:

context (astro.utils.compat.typing.Context) –

get_check_result(check_name, column_name)

Get the check method results post validating the dataframe

Parameters:
  • check_name (str) –

  • column_name (str) –

process_checks()

Process all the checks and print the result or raise an exception in the event of failed checks

check_column.get_checks_string(check, col)
check_column.check_column(dataset, column_mapping, partition_clause=None, task_id=None, **kwargs)

Performs one or more of the templated checks in the column_checks dictionary. Checks are performed on a per-column basis specified by the column_mapping. Each check can take one or more of the following options: - equal_to: an exact value to equal, cannot be used with other comparison options - greater_than: value that result should be strictly greater than - less_than: value that results should be strictly less than - geq_to: value that results should be greater than or equal to - leq_to: value that results should be less than or equal to - tolerance: the percentage that the result may be off from the expected value

Parameters:
  • dataset (Union[astro.table.BaseTable, pandas.DataFrame]) – dataframe or BaseTable that has to be validated

  • column_mapping (Dict[str, Dict[str, Any]]) – the dictionary of columns and their associated checks, e.g.

  • partition_clause (Optional[str]) –

  • task_id (Optional[str]) –

Return type:

ColumnCheckOperator

{
    "col_name": {
        "null_check": {
            "equal_to": 0,
        },
        "min": {
            "greater_than": 5,
            "leq_to": 10,
            "tolerance": 0.2,
        },
        "max": {"less_than": 1000, "geq_to": 10, "tolerance": 0.01},
    }
}