check_column operator
When to use the check_column operator
The check_column operator allows you add checks on columns of tables and dataframes. This operator is a wrapper around Airflow’s SQLColumnCheckOperator to allow seamless integrations with SDK supported dataset like Astro tables and an extension to support Pandas dataframe.
Supported Checks
Supported checks are also explained here.
df = pd.DataFrame(
data={
"name": ["Dwight Schrute", "Michael Scott", "Jim Halpert"],
"age": [30, None, None],
"city": [None, "LA", "California City"],
"emp_id": [10, 1, 35],
}
)
aql.check_column(
dataset=df,
column_mapping={
"name": {"null_check": {"geq_to": 0, "leq_to": 1}},
"city": {
"null_check": {
"equal_to": 1,
},
},
"age": {
"null_check": {
"equal_to": 1,
"tolerance": 1, # Tolerance is + and - the value provided. Acceptable values is 0 to 2.
},
},
},
)