Changelog
1.1.0
Features
Add native autodetect schema feature #780
Allow users to disable auto addition of inlets/outlets via airflow.cfg #858
Support for Datasets introduced in Airflow 2.4 #786, #808, #862, #871
inlets
andoutlets
will be automatically set for all the operators.Users can now schedule DAGs on
File
andTable
objects. Example:input_file = File( path="https://raw.githubusercontent.com/astronomer/astro-sdk/main/tests/data/imdb_v2.csv" ) imdb_movies_table = Table(name="imdb_movies", conn_id="sqlite_default") top_animations_table = Table(name="top_animation", conn_id="sqlite_default") START_DATE = datetime(2022, 9, 1) @aql.transform() def get_top_five_animations(input_table: Table): return """ SELECT title, rating FROM {{input_table}} WHERE genre1='Animation' ORDER BY rating desc LIMIT 5; """ with DAG( dag_id="example_dataset_producer", schedule=None, start_date=START_DATE, catchup=False, ) as load_dag: imdb_movies = aql.load_file( input_file=input_file, task_id="load_csv", output_table=imdb_movies_table, ) with DAG( dag_id="example_dataset_consumer", schedule=[imdb_movies_table], start_date=START_DATE, catchup=False, ) as transform_dag: top_five_animations = get_top_five_animations( input_table=imdb_movies_table, output_table=top_animations_table, )
Dynamic Task Templates: Tasks that can be used with Dynamic Task Mapping (Airflow 2.3+)
Create upstream_tasks parameter for dependencies independent of data transfers #585
Improvements
Bug fixes
Docs
Update quick start example #819
Add links to docs from README #832
Fix Astro CLI doc link #842
Add configuration details from settings.py #861
Add section explaining table metadata #774
Fix docstring for run_raw_sql #817
Add missing docs for Table class #788
Add the readme.md example dag to example dags folder #681
Add reason for enabling XCOM pickling #747
1.0.2
Bug fixes
Skip folders while processing paths in load_file operator when file pattern is passed. #733
Misc
Limit Google Protobuf for compatibility with bigquery client. #742
1.0.1
Bug fixes
Added a check to create table only when
if_exists
isreplace
inaql.load_file
for snowflake. #729Fix the file type for NDJSON file in Data transfer job in AWS S3 to Google BigQuery. #724
Create a new version of imdb.csv with lowercase column names and update the examples to use it, so this change is backwards-compatible. #721, #727
Skip folders while processing paths in load_file operator when file patterns is passed. #733
Docs
Misc
1.0.0
Features
Improved the performance of
aql.load_file
by supporting database-specific (native) load methods. This is now the default behaviour. Previously, the Astro SDK Python would always use Pandas to load files to SQL databases which passed the data to worker node which slowed the performance. #557, #481Introduced new arguments to
aql.load_file
:use_native_support
for data transfer if available on the destination (defaults touse_native_support=True
)native_support_kwargs
is a keyword argument to be used by method involved in native support flow.enable_native_fallback
can be used to fall back to default transfer(defaults toenable_native_fallback=True
).
Now, there are three modes:
Native
: Default, uses Bigquery Load Job in the case of BigQuery and Snowflake COPY INTO using external stage in the case of Snowflake.Pandas
: This is how datasets were previously loaded. To enable this mode, use the argumentuse_native_support=False
inaql.load_file
.Hybrid
: This attempts to use the native strategy to load a file to the database and if native strategy(i) fails , fallback to Pandas (ii) with relevant log warnings. #557
Allow users to specify the table schema (column types) in which a file is being loaded by using
table.columns
. If this table attribute is not set, the Astro SDK still tries to infer the schema by using Pandas (which is previous behaviour).#532Add Example DAG for Dynamic Map Task with Astro-SDK. #377,airflow-2.3.0
Breaking Change
The
aql.dataframe
argumentidentifiers_as_lower
(which wasboolean
, with default set toFalse
) was replaced by the argumentcolumns_names_capitalization
(string
within possible values["upper", "lower", "original"]
, default islower
).#564The
aql.load_file
before would change the capitalization of all column titles to be uppercase, by default, now it makes them lowercase, by default. The old behaviour can be achieved by using the argumentcolumns_names_capitalization="upper"
. #564aql.load_file
attempts to load files to BigQuery and Snowflake by using native methods, which may have pre-requirements to work. To disable this mode, use the argumentuse_native_support=False
inaql.load_file
. #557, #481aql.dataframe
will raise an exception if the default Airflow XCom backend is being used. To solve this, either use an external XCom backend, such as S3 or GCS or set the configurationAIRFLOW__ASTRO_SDK__DATAFRAME_ALLOW_UNSAFE_STORAGE=True
. #444Change the declaration for the default Astro SDK temporary schema from using
AIRFLOW__ASTRO__SQL_SCHEMA
toAIRFLOW__ASTRO_SDK__SQL_SCHEMA
#503Renamed
aql.truncate
toaql.drop_table
#554
Bug fixes
Enhancements
Improved the performance of
aql.load_file
for files for below:Get configurations via Airflow Configuration manager. #503
Change catching
ValueError
andAttributeError
toDatabaseCustomError
#595Unpin pandas upperbound dependency #620
Remove markupsafe from dependencies #623
Added
extend_existing
to Sqla Table object #626Move config to store DF in XCom to settings file #537
Make the operator names consistent #634
Use
exc_info
for exception logging #643Update query for getting bigquery table schema #661
Use lazy evaluated Type Annotations from PEP 563 #650
Provide Google Cloud Credentials env var for bigquery #679
Handle breaking changes for Snowflake provide version 3.2.0 and 3.1.0 #686
Misc
0.11.0
Feature:
Internals:
Enhancement:
0.10.0
Feature:
Breaking Change:
aql.merge
interface changed. Argumentmerge_table
changed totarget_table
,target_columns
andmerge_column
combined tocolumn
argument,merge_keys
is changed totarget_conflict_columns
,conflict_strategy
is changed toif_conflicts
. More details can be found at 422, #466
Enhancement:
0.9.2
Bug fix:
Change export_file to return File object #454.
0.9.1
Bug fix:
Table unable to have Airflow templated names #413
0.9.0
Enhancements:
Introduction of the user-facing
Table
,Metadata
andFile
classes
Breaking changes:
The operator
save_file
becameexport_file
The tasks
load_file
,export_file
(previouslysave_file
) andrun_raw_sql
should be used with useTable
,Metadata
andFile
instancesThe decorators
dataframe
,run_raw_sql
andtransform
should be used withTable
andMetadata
instancesThe operators
aggregate_check
,boolean_check
,render
andstats_check
were temporarily removedThe class
TempTable
was removed. It is possible to declare temporary tables by usingTable(temp=True)
. All the temporary tables names are prefixed with_tmp_
. If the user decides to name aTable
, it is no longer temporary, unless the user enforces it to be.The only mandatory property of a
Table
instance isconn_id
. If no metadata is given, the library will try to extract schema and other information from the connection object. If it is missing, it will default to theAIRFLOW__ASTRO__SQL_SCHEMA
environment variable.
Internals:
Major refactor introducing
Database
,File
,FileType
andFileLocation
concepts.
0.8.4
Enhancements:
Add support for Airflow 2.3 #367.
Breaking change:
We have renamed the artifacts we released to
astro-sdk-python
fromastro-projects
.0.8.4
is the last version for which we have published bothastro-sdk-python
andastro-projects
.
0.8.3
Bug fix:
Do not attempt to create a schema if it already exists #329.
0.8.2
Bug fix:
Support dataframes from different databases in dataframe operator #325
Enhancements:
Add integration testcase for
SqlDecoratedOperator
to test execution of Raw SQL #316
0.8.1
Bug fix:
Snowflake transform without
input_table
#319
0.8.0
Feature:
*load_file
support for nested NDJSON files #257
Breaking change:
aql.dataframe
switches the capitalization to lowercase by default. This behaviour can be changed by usingidentifiers_as_lower
#154
Documentation:
Fix commands in README.md #242
Add scripts to auto-generate Sphinx documentation
Enhancements:
Improve type hints coverage
Improve Amazon S3 example DAG, so it does not rely on pre-populated data #293
Add example DAG to load/export from BigQuery #265
Fix usages of mutable default args #267
Enable DeepSource validation #299
Improve code quality and coverage
Bug fixes:
Support
gcpbigquery
connections #294Support
params
argument inaql.render
to override SQL Jinja template values #254Fix
aql.dataframe
when table arg is absent #259
Others:
0.7.0
Feature:
load_file
to a Pandas dataframe, without SQL database dependencies #77
Documentation:
Simplify README #101
Add Release Guidelines #160
Add Code of Conduct #101
Add Contribution Guidelines #101
Enhancements:
Add SQLite example #149
Allow customization of
task_id
when usingdataframe
#126Use standard AWS environment variables, as opposed to
AIRFLOW__ASTRO__CONN_AWS_DEFAULT
#175
Bug fixes:
Fix
merge
XComArg
support #183Fixes to
load_file
:Fixes to
render
:Fix
transform
, so it works with SQLite #159
Others:
0.6.0
Features:
Support SQLite #86
Support users who can’t create schemas #121
Ability to install optional dependencies (amazon, google, snowflake) #82
Enhancements:
Change
render
so it creates a DAG as opposed to a TaskGroup #143Allow users to specify a custom version of
snowflake_sqlalchemy
#127
Bug fixes:
Others: