check_column operator

When to use the check_column operator

The check_column operator allows you add checks on columns of tables and dataframes. This operator is a wrapper around Airflow’s SQLColumnCheckOperator to allow seamless integrations with SDK supported dataset like Astro tables and an extension to support Pandas dataframe.

Supported Checks

Supported checks are also explained here.

    df = pd.DataFrame(
        data={
            "name": ["Dwight Schrute", "Michael Scott", "Jim Halpert"],
            "age": [30, None, None],
            "city": [None, "LA", "California City"],
            "emp_id": [10, 1, 35],
        }
    )
    aql.check_column(
        dataset=df,
        column_mapping={
            "name": {"null_check": {"geq_to": 0, "leq_to": 1}},
            "city": {
                "null_check": {
                    "equal_to": 1,
                },
            },
            "age": {
                "null_check": {
                    "equal_to": 1,
                    "tolerance": 1,  # Tolerance is + and - the value provided. Acceptable values is 0 to 2.
                },
            },
        },
    )