astro.databases.aws.redshift

AWS Redshift table implementation.

Module Contents

Classes

RedshiftDatabase

Handle interactions with Redshift databases.

Attributes

DEFAULT_CONN_ID

NATIVE_PATHS_SUPPORTED_FILE_TYPES

astro.databases.aws.redshift.DEFAULT_CONN_ID
astro.databases.aws.redshift.NATIVE_PATHS_SUPPORTED_FILE_TYPES
class astro.databases.aws.redshift.RedshiftDatabase(conn_id=DEFAULT_CONN_ID, table=None, load_options=None)

Bases: astro.databases.base.BaseDatabase

Handle interactions with Redshift databases.

Parameters:
property sql_type
property default_metadata: astro.table.Metadata

Fill in default metadata values for table objects addressing redshift databases

Return type:

astro.table.Metadata

NATIVE_LOAD_EXCEPTIONS: Any = ()
DEFAULT_SCHEMA
NATIVE_PATHS
illegal_column_name_chars: list[str] = ['.']
illegal_column_name_chars_replacement: list[str] = ['_']
hook()

Retrieve Airflow hook to interface with the Redshift database.

Return type:

airflow.providers.amazon.aws.hooks.redshift_sql.RedshiftSQLHook

sqlalchemy_engine()

Return SQAlchemy engine.

Return type:

sqlalchemy.engine.base.Engine

schema_exists(schema)

Checks if a dataset exists in the Redshift

Parameters:

schema (str) – Redshift namespace

Return type:

bool

table_exists(table)

Check if a table exists in the database.

Parameters:

table (astro.table.BaseTable) – Details of the table we want to check that exists

Return type:

bool

load_pandas_dataframe_to_table(source_dataframe, target_table, if_exists='replace', chunk_size=DEFAULT_CHUNK_SIZE)

Create a table with the dataframe’s contents. If the table already exists, append or replace the content, depending on the value of if_exists.

Parameters:
  • source_dataframe (pandas.DataFrame) – Local or remote filepath

  • target_table (astro.table.BaseTable) – Table in which the file will be loaded

  • if_exists (astro.constants.LoadExistStrategy) – Strategy to be used in case the target table already exists.

  • chunk_size (int) – Specify the number of rows in each batch to be written at a time.

Return type:

None

merge_table(source_table, target_table, source_to_target_columns_map, target_conflict_columns, if_conflicts='exception')

Merge the source table rows into a destination table. The argument if_conflicts allows the user to define how to handle conflicts.

Parameters:
  • source_table (astro.table.BaseTable) – Contains the rows to be merged to the target_table

  • target_table (astro.table.BaseTable) – Contains the destination table in which the rows will be merged

  • source_to_target_columns_map (dict[str, str]) – Dict of target_table columns names to source_table columns names

  • target_conflict_columns (list[str]) – List of cols where we expect to have a conflict while combining

  • if_conflicts (astro.constants.MergeConflictStrategy) – The strategy to be applied if there are conflicts.

Return type:

None

is_native_load_file_available(source_file, target_table)

Check if there is an optimised path for source to destination.

Parameters:
  • source_file (astro.files.File) – File from which we need to transfer data

  • target_table (astro.table.BaseTable) – Table that needs to be populated with file data

Return type:

bool

load_file_to_table_natively(source_file, target_table, if_exists='replace', native_support_kwargs=None, **kwargs)

Checks if optimised path for transfer between File location to database exists and if it does, it transfers it and returns true else false.

Parameters:
  • source_file (astro.files.File) – File from which we need to transfer data

  • target_table (astro.table.BaseTable) – Table that needs to be populated with file data

  • if_exists (astro.constants.LoadExistStrategy) – Overwrite file if exists. Default False

  • native_support_kwargs (dict | None) – kwargs to be used by method involved in native support flow

load_s3_file_to_table(source_file, target_table, native_support_kwargs=None, **kwargs)

Load content of multiple files in S3 to output_table in Redshift by: - Creating a table - Using the COPY command

Parameters:
  • source_file (astro.files.File) – Source file that is used as source of data

  • target_table (astro.table.BaseTable) – Table that will be created on the redshift

  • if_exists – Overwrite table if exists. Default ‘replace’

  • native_support_kwargs (dict | None) – kwargs to be used by method involved in native support flow

static get_merge_initialization_query(parameters)

Handles database-specific logic to handle constraints for Redshift.

Parameters:

parameters (tuple) –

Return type:

str

openlineage_dataset_name(table)

Returns the open lineage dataset name as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md Example: schema_name.table_name

Parameters:

table (astro.table.BaseTable) –

Return type:

str

openlineage_dataset_namespace()

Returns the open lineage dataset namespace as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md Example: redshift://cluster:5439

Return type:

str

openlineage_dataset_uri(table)

Returns the open lineage dataset uri as per https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md

Parameters:

table (astro.table.BaseTable) –

Return type:

str