astro.files.types
Submodules
Package Contents
Functions
|
Factory method to create FileType super objects based on the file extension in path or filetype specified. |
|
Return a FileType given the filepath. Uses a naive strategy, using the file extension. |
Classes
Generic enumeration. |
|
Abstract File type class, meant to be the interface to all client code for all supported file types |
|
Concrete implementation to handle CSV file type |
|
Concrete implementation to handle JSON file type |
|
Concrete implementation to handle NDJSON file type |
|
Concrete implementation to handle Parquet file type |
- astro.files.types.create_file_type(path, filetype=None, normalize_config=None)
Factory method to create FileType super objects based on the file extension in path or filetype specified.
- Parameters
path (str) –
filetype (Union[astro.constants.FileType, None]) –
normalize_config (Optional[dict]) –
- Return type
- astro.files.types.get_filetype(filepath)
Return a FileType given the filepath. Uses a naive strategy, using the file extension.
- Parameters
filepath (str or pathlib.PosixPath) – URI or Path to a file
- Returns
The filetype (e.g. csv, ndjson, json, parquet)
- Return type
- class astro.files.types.FileTypeConstants
Bases:
enum.Enum
Generic enumeration.
Derive from this class to define new enumerations.
- CSV = csv
- JSON = json
- NDJSON = ndjson
- PARQUET = parquet
- class astro.files.types.FileType(path, normalize_config=None)
Bases:
abc.ABC
Abstract File type class, meant to be the interface to all client code for all supported file types
- Parameters
path (str) –
normalize_config (Optional[dict]) –
- abstract export_to_dataframe(stream, **kwargs)
read file from one of the supported locations and return dataframe
- Parameters
stream – file stream object
- Return type
pandas.DataFrame
- abstract create_from_dataframe(df, stream)
Write file to one of the supported locations
- Parameters
df (pandas.DataFrame) – pandas dataframe
stream (io.TextIOWrapper) – file stream object
- Return type
None
- property name
get file type
- __str__()
String representation of type
- __repr__()
Return repr(self).
- Return type
str
- class astro.files.types.CSVFileType(path, normalize_config=None)
Bases:
astro.files.types.base.FileType
Concrete implementation to handle CSV file type
- Parameters
path (str) –
normalize_config (Optional[dict]) –
- export_to_dataframe(stream, **kwargs)
read csv file from one of the supported locations and return dataframe
- Parameters
stream – file stream object
- Return type
pandas.DataFrame
- create_from_dataframe(df, stream)
Write csv file to one of the supported locations
- Parameters
df (pandas.DataFrame) – pandas dataframe
stream (io.TextIOWrapper) – file stream object
- Return type
None
- property name
get file type
- class astro.files.types.JSONFileType(path, normalize_config=None)
Bases:
astro.files.types.base.FileType
Concrete implementation to handle JSON file type
- Parameters
path (str) –
normalize_config (Optional[dict]) –
- export_to_dataframe(stream, **kwargs)
read json file from one of the supported locations and return dataframe
- Parameters
stream (io.TextIOWrapper) – file stream object
- create_from_dataframe(df, stream)
Write json file to one of the supported locations
- Parameters
df (pandas.DataFrame) – pandas dataframe
stream (io.TextIOWrapper) – file stream object
- Return type
None
- property name
get file type
- class astro.files.types.NDJSONFileType(path, normalize_config=None)
Bases:
astro.files.types.base.FileType
Concrete implementation to handle NDJSON file type
- Parameters
path (str) –
normalize_config (Optional[dict]) –
- export_to_dataframe(stream, **kwargs)
read ndjson file from one of the supported locations and return dataframe
- Parameters
stream – file stream object
- create_from_dataframe(df, stream)
Write ndjson file to one of the supported locations
- Parameters
df (pandas.DataFrame) – pandas dataframe
stream (io.TextIOWrapper) – file stream object
- Return type
None
- property name
get file type
- static flatten(normalize_config, stream)
Flatten the nested ndjson/json.
- Parameters
normalize_config (dict) – parameters in dict format of pandas json_normalize() function. https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html
stream (io.TextIOWrapper) – io.TextIOWrapper object for the file
- Returns
return dataframe containing the loaded data
- Return type
pandas.DataFrame
- class astro.files.types.ParquetFileType(path, normalize_config=None)
Bases:
astro.files.types.base.FileType
Concrete implementation to handle Parquet file type
- Parameters
path (str) –
normalize_config (Optional[dict]) –
- export_to_dataframe(stream, **kwargs)
read parquet file from one of the supported locations and return dataframe
- Parameters
stream – file stream object
- create_from_dataframe(df, stream)
Write parquet file to one of the supported locations
- Parameters
df (pandas.DataFrame) – pandas dataframe
stream (io.TextIOWrapper) – file stream object
- Return type
None
- property name
get file type