Changelog
1.8.0
Misc
Bug Fixes
1.7.0
Feature
Bug Fixes
Misc
Docs
1.6.2
Bug Fixes
1.6.1
Bug Fixes
Fix AstroCustomXcomBackend circular import issue. #1943
1.6.0
Feature
Add MySQL support #1801
Add support to load from Azure blob storage into Databricks #1561
Add argument
skip_on_failuretoCleanupOperator#1837 by @scottleechuaAdd
query_modifiertoraw_sql,transformandtransform_file, which allow users to define SQL statements to be run before the main query statement #1898. Example of how to use this feature can be used to add Snowflake query tags to a SQL statement:from astro.query_modifier import QueryModifier @aql.run_raw_sql( results_format="pandas_dataframe", conn_id="sqlite_default", query_modifier=QueryModifier(pre_queries=["ALTER team_1", "ALTER team_2"]), ) def dummy_method(): return "SELECT 1+1"
Upgrade astro-runtime to 7.4.2 #1878
Bug fix:
Docs
Update open lineage documentation #1881
Misc
1.5.4
Bug Fixes
Fix AstroCustomXcomBackend circular import issue. #1943
1.5.3
Bug fix:
1.5.2
Improvements
Restore pandas load option classes -
PandasCsvLoadOptions,PandasJsonLoadOptions,PandasNdjsonLoadOptionsandPandasParquetLoadOptions#1795
1.5.1
Improvements
Add Openlineage facets for Microsoft SQL server. #1752
Bug fixes
1.5.0
Feature:
Add support for Microsoft SQL server. #1538
Add support for DuckDB. #1695
Add
result_formatandfail_on_emptyparams torun_raw_sqloperator #1584Add support
validation_modeas part of theCOPY INTOcommand for snowflake. #1689Add support for native transfers for Azure Blob Storage to Snowflake in
LoadFileOperator. #1675
Improvements
Use cache to reduce redundant database calls #1488
Remove default
copy_optionsas part ofSnowflakeLoadOptions. Allcopy_optionsare now supported as part ofSnowflakeLoadOptionsas per documentation. #1689Remove
load_optionsfromFileobject. #1721Render SQL code with parameters in BaseSQLDecoratedOperator. #897
Bug fixes
Fix handling of multiple dataframes in the
run_raw_sqloperator. #1700
Docs
Add documentation around Microsoft SQL support with example DAG. #1538
Add documentation around DuckDB support with example DAG. #1695
Add documentation for
validation_modeas part of theCOPY INTOcommand for snowflake. #1689Add documentation and example DAGs for snowflake
SnowflakeLoadOptionsfor various available options aroundcopy_optionsandfile_options. #1689Fix the documentation to run the quickstart example described in the Python SDK README. #1716
Misc
Breaking Change
Consolidated
PandasCsvLoadOptions,PandasJsonLoadOptions,PandasNdjsonLoadOptionsandPandasParquetLoadOptionsto singlePandasLoadOptions. #1722
1.4.1
Feature:
Bug fixes
Support “s3” conn type for S3Location #1647
Docs
Add the documentation and example DAG for Azure blob storage #1598
Fix dead link in documentation #1596
Update README with newly supported location and database #1596
Update configuration reference for XCom #1646
Add step to generate constraints in Python SDK release process #1474
Add document to showcase the use of
check_tableandcheck_columnoperators #1631
Misc
1.4.0
Feature:
Add support for Azure Blob Storage (only non-native implementation) #1275, #1542
Add databricks delta table support docs #1352, #1397, #1452, #1476, #1480, #1555
Add sourceCode facet to
aql.dataframe()andaql.transform()as part of OpenLineage integration #1537Enhance
LoadFileOperatorso that users can send pandas attributes throughPandasLoadOptionsdocs #1466Enhance
LoadFileOperatorso that users can send Snowflake specific load attributes throughSnowflakeLoadOptionsdocs #1516Expose
get_file_list_functo users so that it returns iterable File list from given destination file storage #1380
Improvements
Deprecate
export_table_to_filein favor ofexport_to_file(ExportTableToFileOperatorandexport_table_to_fileoperator would be removed in astro-python-sdk 1.5.0) #1503
Bug fixes
Docs
Misc
Refactor snowflake merge function for easier maintenance #1493
1.3.3
Bug fixes
1.3.2
Bug fixes
Fix the
run_raw_sql()operator as handler returnNonecausing the serialization logic to fail. #1431
Misc
Update the deprecation warning for
export_file()operator. #1411
1.3.1
Feature:
Dataframe operator would now allow a user to either
appendto a table orreplacea table withif_existsparameter. #1379
Bug fixes
Fix the
aql.cleanup()operator as failing as the attributeoutputwas implemented in 2.4.0 #1359Fix the backward compatibility with
apache-airflow-providers-snowflake==4.0.2. #1351LoadFile operator returns a dataframe if not using XCom backend.#1348,#1337
Fix the functionality to create region specific temporary schemas when they don’t exist in same region. #1369
Docs
Cross-link to API reference page from Operators page.#1383
Misc
1.3.0
Feature:
Remove the need to use a custom Xcom backend for storing dataframes when Xcom pickling is disabled. #1334, #1331,#1319
Add support to Google Drive to be used as
FileLocation. Example to load file from Google Drive to Snowflake #1044aql.load_file( input_file=File( path="gdrive://sample-google-drive/sample.csv", conn_id="gdrive_conn" ), output_table=Table( conn_id=SNOWFLAKE_CONN_ID, metadata=Metadata( database=os.environ["SNOWFLAKE_DATABASE"], schema=os.environ["SNOWFLAKE_SCHEMA"], ), ), )
Improvements
Use
DefaultExtractorfrom OpenLineage. Users need not set environment variableOPENLINEAGE_EXTRACTORSto use OpenLineage. #1223, #1292Generate constraints file for multiple Python and Airflow version that display the set of “installable” constraints for a particular Python (3.7, 3.8, 3.9) and Airflow version (2.2.5, 2.3.4, 2.4.2) #1226
Improve the logs in case native transfers fallbacks to Pandas as well as fallback indication in
LoadFileOperator. #1263
Bug fixes
Docs
Misc
Fix the GCS path in
aql.export_filein the example DAGs. #1339
1.2.3
Bug fixes
When
if_existsis set toreplacein Dataframe operator, replace the table rather than append. This change fixes a regression on the Dataframe operator which caused it to append content to an output table instead of replacing. #1260Pass the table metadata
databasevalue to the underlying airflowPostgresHookinstead ofschemaas schema is renamed to database in airflow as per this PR. #1276
Docs
Include description on pickling and usage of custom Xcom backend in README.md #1203
Misc
Investigate and fix tests that are filling up Snowflake database with tmp tables as part of our CI execution. #738
1.2.2
Bug fixes
1.2.1
Feature:
Bug fixes
Improvement:
Change the namespace for Open Lineage #1179
Add
LOAD_FILE_ENABLE_NATIVE_FALLBACKconfig to globally disable native fallback #1089Add
OPENLINEAGE_EMIT_TEMP_TABLE_EVENTconfig to emit events for tmp table in Open Lineage. #1121Fix issue with fetching table row count for snowflake #1145
Generate unique Open Lineage namespace for Sqlite based operations #1141
Docs
Misc
Pin SQLAlchemy version to >=1.3.18,<1.4.42 #1185
1.2.0
Feature:
Remove dependency on
AIRFLOW__CORE__ENABLE_XCOM_PICKLING. Users can set new environment variables, namelyAIRFLOW__ASTRO_SDK__XCOM_STORAGE_CONN_IDandAIRFLOW__ASTRO_SDK__XCOM_STORAGE_URLand use a custom XCOM backend namely,AstroCustomXcomBackendwhich enables the XCOM data to be saved to an S3 or GCS location. #795, #997Added OpenLineage support for
LoadFileOperator,AppendOperator,TransformOperatorandMergeOperator#898, #899, #902, #901 and #900Add
TransformFileOperatorthatparses a SQL file with templating
applies all needed parameters
runs the SQL to return a table object to keep the
aql.transform_filefunction, the function can returnTransformFileOperator().outputin a similar fashion to the merge operator. #892
Add the implementation for row count for
BaseTable. #1073
Improvement:
Improved handling of snowflake identifiers for smooth experience with
dataframeandrun_raw_sqlandload_fileoperators. #917, #1098Fix
transform_fileto not depend ontransformdecorator #1004Set the CI to run and publish benchmark reports once a week #443
Fix cyclic dependency and improve import time. Reduces the import time for
astro/databases/__init__.pyfrom 23.254 seconds to 0.062 seconds #1013
Docs
Create GETTING_STARTED.md #1036
Document the Open Lineage facets published by Astro Python SDK. #1086
Documentation changes to specify permissions needed for running BigQuery jobs. #896
Document the details on custom XCOM. #1100
Document the benchmarking process. #1017
Include a detailed description on the default Dataset concept in Astro Python SDK. #1092
Misc
NFS volume mount in Kubernetes to test benchmarking from local to databases. #883
1.1.1
Improvements
Add filetype when resolving path in case of loading into dataframe #881
Bug fixes
Fix postgres performance regression (example from one_gb file - 5.56min to 1.84min) #876
1.1.0
Features
Add native autodetect schema feature #780
Allow users to disable auto addition of inlets/outlets via airflow.cfg #858
Support for Datasets introduced in Airflow 2.4 #786, #808, #862, #871
inletsandoutletswill be automatically set for all the operators.Users can now schedule DAGs on
FileandTableobjects. Example:input_file = File( path="https://raw.githubusercontent.com/astronomer/astro-sdk/main/tests/data/imdb_v2.csv" ) imdb_movies_table = Table(name="imdb_movies", conn_id="sqlite_default") top_animations_table = Table(name="top_animation", conn_id="sqlite_default") START_DATE = datetime(2022, 9, 1) @aql.transform() def get_top_five_animations(input_table: Table): return """ SELECT title, rating FROM {{input_table}} WHERE genre1='Animation' ORDER BY rating desc LIMIT 5; """ with DAG( dag_id="example_dataset_producer", schedule=None, start_date=START_DATE, catchup=False, ) as load_dag: imdb_movies = aql.load_file( input_file=input_file, task_id="load_csv", output_table=imdb_movies_table, ) with DAG( dag_id="example_dataset_consumer", schedule=[imdb_movies_table], start_date=START_DATE, catchup=False, ) as transform_dag: top_five_animations = get_top_five_animations( input_table=imdb_movies_table, output_table=top_animations_table, )
Dynamic Task Templates: Tasks that can be used with Dynamic Task Mapping (Airflow 2.3+)
Create upstream_tasks parameter for dependencies independent of data transfers #585
Improvements
Bug fixes
Docs
Update quick start example #819
Add links to docs from README #832
Fix Astro CLI doc link #842
Add configuration details from settings.py #861
Add section explaining table metadata #774
Fix docstring for run_raw_sql #817
Add missing docs for Table class #788
Add the readme.md example dag to example dags folder #681
Add reason for enabling XCOM pickling #747
1.0.2
Bug fixes
Skip folders while processing paths in load_file operator when file pattern is passed. #733
Misc
Limit Google Protobuf for compatibility with bigquery client. #742
1.0.1
Bug fixes
Added a check to create table only when
if_existsisreplaceinaql.load_filefor snowflake. #729Fix the file type for NDJSON file in Data transfer job in AWS S3 to Google BigQuery. #724
Create a new version of imdb.csv with lowercase column names and update the examples to use it, so this change is backwards-compatible. #721, #727
Skip folders while processing paths in load_file operator when file patterns is passed. #733
Docs
Misc
1.0.0
Features
Improved the performance of
aql.load_fileby supporting database-specific (native) load methods. This is now the default behaviour. Previously, the Astro SDK Python would always use Pandas to load files to SQL databases which passed the data to worker node which slowed the performance. #557, #481Introduced new arguments to
aql.load_file:use_native_supportfor data transfer if available on the destination (defaults touse_native_support=True)native_support_kwargsis a keyword argument to be used by method involved in native support flow.enable_native_fallbackcan be used to fall back to default transfer(defaults toenable_native_fallback=True).
Now, there are three modes:
Native: Default, uses Bigquery Load Job in the case of BigQuery and Snowflake COPY INTO using external stage in the case of Snowflake.Pandas: This is how datasets were previously loaded. To enable this mode, use the argumentuse_native_support=Falseinaql.load_file.Hybrid: This attempts to use the native strategy to load a file to the database and if native strategy(i) fails , fallback to Pandas (ii) with relevant log warnings. #557
Allow users to specify the table schema (column types) in which a file is being loaded by using
table.columns. If this table attribute is not set, the Astro SDK still tries to infer the schema by using Pandas (which is previous behaviour).#532Add Example DAG for Dynamic Map Task with Astro-SDK. #377,airflow-2.3.0
Breaking Change
The
aql.dataframeargumentidentifiers_as_lower(which wasboolean, with default set toFalse) was replaced by the argumentcolumns_names_capitalization(stringwithin possible values["upper", "lower", "original"], default islower).#564The
aql.load_filebefore would change the capitalization of all column titles to be uppercase, by default, now it makes them lowercase, by default. The old behaviour can be achieved by using the argumentcolumns_names_capitalization="upper". #564aql.load_fileattempts to load files to BigQuery and Snowflake by using native methods, which may have pre-requirements to work. To disable this mode, use the argumentuse_native_support=Falseinaql.load_file. #557, #481aql.dataframewill raise an exception if the default Airflow XCom backend is being used. To solve this, either use an external XCom backend, such as S3 or GCS or set the configurationAIRFLOW__ASTRO_SDK__DATAFRAME_ALLOW_UNSAFE_STORAGE=True. #444Change the declaration for the default Astro SDK temporary schema from using
AIRFLOW__ASTRO__SQL_SCHEMAtoAIRFLOW__ASTRO_SDK__SQL_SCHEMA#503Renamed
aql.truncatetoaql.drop_table#554
Bug fixes
Enhancements
Improved the performance of
aql.load_filefor files for below:Get configurations via Airflow Configuration manager. #503
Change catching
ValueErrorandAttributeErrortoDatabaseCustomError#595Unpin pandas upperbound dependency #620
Remove markupsafe from dependencies #623
Added
extend_existingto Sqla Table object #626Move config to store DF in XCom to settings file #537
Make the operator names consistent #634
Use
exc_infofor exception logging #643Update query for getting bigquery table schema #661
Use lazy evaluated Type Annotations from PEP 563 #650
Provide Google Cloud Credentials env var for bigquery #679
Handle breaking changes for Snowflake provide version 3.2.0 and 3.1.0 #686
Misc
0.11.0
Feature:
Internals:
Enhancement:
0.10.0
Feature:
Breaking Change:
aql.mergeinterface changed. Argumentmerge_tablechanged totarget_table,target_columnsandmerge_columncombined tocolumnargument,merge_keysis changed totarget_conflict_columns,conflict_strategyis changed toif_conflicts. More details can be found at 422, #466
Enhancement:
0.9.2
Bug fix:
Change export_file to return File object #454.
0.9.1
Bug fix:
Table unable to have Airflow templated names #413
0.9.0
Enhancements:
Introduction of the user-facing
Table,MetadataandFileclasses
Breaking changes:
The operator
save_filebecameexport_fileThe tasks
load_file,export_file(previouslysave_file) andrun_raw_sqlshould be used with useTable,MetadataandFileinstancesThe decorators
dataframe,run_raw_sqlandtransformshould be used withTableandMetadatainstancesThe operators
aggregate_check,boolean_check,renderandstats_checkwere temporarily removedThe class
TempTablewas removed. It is possible to declare temporary tables by usingTable(temp=True). All the temporary tables names are prefixed with_tmp_. If the user decides to name aTable, it is no longer temporary, unless the user enforces it to be.The only mandatory property of a
Tableinstance isconn_id. If no metadata is given, the library will try to extract schema and other information from the connection object. If it is missing, it will default to theAIRFLOW__ASTRO__SQL_SCHEMAenvironment variable.
Internals:
Major refactor introducing
Database,File,FileTypeandFileLocationconcepts.
0.8.4
Enhancements:
Add support for Airflow 2.3 #367.
Breaking change:
We have renamed the artifacts we released to
astro-sdk-pythonfromastro-projects.0.8.4is the last version for which we have published bothastro-sdk-pythonandastro-projects.
0.8.3
Bug fix:
Do not attempt to create a schema if it already exists #329.
0.8.2
Bug fix:
Support dataframes from different databases in dataframe operator #325
Enhancements:
Add integration testcase for
SqlDecoratedOperatorto test execution of Raw SQL #316
0.8.1
Bug fix:
Snowflake transform without
input_table#319
0.8.0
Feature:
*load_file support for nested NDJSON files #257
Breaking change:
aql.dataframeswitches the capitalization to lowercase by default. This behaviour can be changed by usingidentifiers_as_lower#154
Documentation:
Fix commands in README.md #242
Add scripts to auto-generate Sphinx documentation
Enhancements:
Improve type hints coverage
Improve Amazon S3 example DAG, so it does not rely on pre-populated data #293
Add example DAG to load/export from BigQuery #265
Fix usages of mutable default args #267
Enable DeepSource validation #299
Improve code quality and coverage
Bug fixes:
Support
gcpbigqueryconnections #294Support
paramsargument inaql.renderto override SQL Jinja template values #254Fix
aql.dataframewhen table arg is absent #259
Others:
0.7.0
Feature:
load_fileto a Pandas dataframe, without SQL database dependencies #77
Documentation:
Simplify README #101
Add Release Guidelines #160
Add Code of Conduct #101
Add Contribution Guidelines #101
Enhancements:
Add SQLite example #149
Allow customization of
task_idwhen usingdataframe#126Use standard AWS environment variables, as opposed to
AIRFLOW__ASTRO__CONN_AWS_DEFAULT#175
Bug fixes:
Fix
mergeXComArgsupport #183Fixes to
load_file:Fixes to
render:Fix
transform, so it works with SQLite #159
Others:
0.6.0
Features:
Support SQLite #86
Support users who can’t create schemas #121
Ability to install optional dependencies (amazon, google, snowflake) #82
Enhancements:
Change
renderso it creates a DAG as opposed to a TaskGroup #143Allow users to specify a custom version of
snowflake_sqlalchemy#127
Bug fixes:
Others: