datapyground.compute.datasources¶
Query Plan nodes that load data
The datasource nodes are expected to fetch the data from some source, convert it into the format accepted by the compute engine and forward it to the next node in the plan.
They are used to do things like loading data from CSV files or equivalent operations
Classes
- class datapyground.compute.datasources.CSVDataSource(filename: str, block_size: int | None = None)[source]¶
Load data from a CSV file.
Given a local CSV file path, load the content, convert it into Arrow format, and emit it for the next nodes of the query plan to consume.
- Parameters:
filename – The path of the local CSV file.
block_size – How big to make batches of data, Influences how many batches will be produced
- class datapyground.compute.datasources.DataSourceNode[source]¶
Base class for nodes that load data from a source.
- class datapyground.compute.datasources.ParquetDataSource(filename: str, batch_size: int | None = None)[source]¶
Load data from a Parquet file.
Given a local parquet file path, load the content, convert it into Arrow format, and emit it for the next nodes of the query plan to consume.
- Parameters:
filename – The path of the local parquet file.
batch_size – How big to make batches of data, Influences how many batches will be produced
- class datapyground.compute.datasources.PyArrowTableDataSource(table: Table | RecordBatch)[source]¶
Load data from an in-memory pyarrow.Table or pyarrow.RecordBatch.
Given a
pyarrow.Tableor pyarrow.RecordBatch object, allow to use its data in a query plan.- Parameters:
table – The table or recordbatch with the data to read.