check_column operator
When to use the check_column
operator
The check_column
operator allows you add checks on columns of tables and dataframes. This operator is a wrapper around Airflow’s SQLColumnCheckOperator to allow seamless integrations with SDK supported dataset like Astro tables
and an extension to support Pandas dataframe
.
Supported Checks
Supported checks are also explained here.
df = pd.DataFrame(
data={
"name": ["Dwight Schrute", "Michael Scott", "Jim Halpert"],
"age": [30, None, None],
"city": [None, "LA", "California City"],
"emp_id": [10, 1, 35],
}
)
aql.check_column(
dataset=df,
column_mapping={
"name": {"null_check": {"geq_to": 0, "leq_to": 1}},
"city": {
"null_check": {
"equal_to": 1,
},
},
"age": {
"null_check": {
"equal_to": 1,
"tolerance": 1, # Tolerance is + and - the value provided. Acceptable values is 0 to 2.
},
},
},
)