astro.files.base
Module Contents
Classes
Handle all file operations, and abstract away the details related to location and file types. |
Functions
|
get file objects by resolving path_pattern from local/object stores |
- class astro.files.base.File(path, conn_id=None, filetype=None, normalize_config=None)
Handle all file operations, and abstract away the details related to location and file types. Intended to be used within library.
- Parameters
path (str) –
conn_id (str | None) –
filetype (constants.FileType | None) –
normalize_config (dict | None) –
- template_fields = ['location']
- property path
- Return type
str
- property conn_id
- Return type
str | None
- property size
Return the size in bytes of the given file.
- Returns
File size in bytes
- Return type
int
- is_binary()
Return a constants.FileType given the filepath. Uses a naive strategy, using the file extension.
- Returns
True or False
- Return type
bool
- create_from_dataframe(df)
Create a file in the desired location using the values of a dataframe.
- Parameters
df (pandas.DataFrame) – pandas dataframe
- Return type
None
- export_to_dataframe(**kwargs)
Read file from all supported location and convert them into dataframes.
Due to noted issues with using smart_open with pandas (like https://github.com/RaRe-Technologies/smart_open/issues/524), we create a BytesIO or StringIO buffer before exporting to a dataframe. We’ve found a sizable speed improvement with this optimization.
- Return type
pandas.DataFrame
- exists()
Check if the file exists or not
- Return type
bool
- astro.files.base.resolve_file_path_pattern(path_pattern, conn_id=None, filetype=None, normalize_config=None)
get file objects by resolving path_pattern from local/object stores path_pattern can be 1. local location - glob pattern 2. s3/gcs location - prefix
- Parameters
path_pattern (str) – path/pattern to a file in the filesystem/Object stores, supports glob and prefix pattern for object stores
conn_id (str | None) – Airflow connection ID
filetype (constants.FileType | None) – constant to provide an explicit file type
normalize_config (dict | None) – parameters in dict format of pandas json_normalize() function
- Return type
list[File]