Table#
- class ducklake.Table[source]#
A DuckLake table.
Methods:
Add a new column to the table.
Add a new tag to a column.
Add a new tag to the table.
Delete the table from the catalog.
Merge small adjacent data files of this table into larger ones.
Read the full contents of the table as a PyArrow table.
Remove a column from the table.
Remove an existing tag from a column.
Remove an existing tag from the table.
Rename the table in the catalog.
Rename a column in the table.
Rewrite data files of this table that have a high fraction of deleted rows.
Scan the table and return all data files with their associated delete files.
Read the full contents of the table as a DuckDB relation.
Set one or more metadata options for this table.
Update the default value of a column.
Update the data type of the provided column.
Update the nullability of a column.
Update the partitioning of this table.
Update the full schema of the table.
Append the provided data to the table.
Attributes:
The metadata associated with the table.
The fully qualified name of the table.
The partitioning of the table, if any.
The schema of the table.
The tags associated with the table.
- add_column(column: Column) None[source]#
Add a new column to the table.
- Parameters:
column – The column to add.
- add_column_tag(column: str | Sequence[str], key: str, value: str) None[source]#
Add a new tag to a column.
- Parameters:
column – The name of the column to add the tag to. This may be provided as a “path” to a nested column.
key – The key of the tag.
value – The value of the tag.
- add_tag(key: str, value: str) None[source]#
Add a new tag to the table.
- Parameters:
key – The key of the tag.
value – The value of the tag.
- delete() None[source]#
Delete the table from the catalog.
After calling this method, the Table object is no longer valid.
- merge_adjacent_files(
- *,
- max_compacted_files: int | None = None,
- min_file_size: int | None = None,
- max_file_size: int | None = None,
Merge small adjacent data files of this table into larger ones.
Dispatches to
ducklake_merge_adjacent_filesscoped to this table.- Parameters:
max_compacted_files – Maximum number of compaction operations produced in a single call.
min_file_size – Excludes files smaller than this many bytes from compaction.
max_file_size – Excludes files at or larger than this many bytes from compaction. Defaults to the
target_file_sizetable option.
- Returns:
A row for each output file created by the operation.
Note
This requires
duckdbto be installed.
- property metadata: TableMetadata#
The metadata associated with the table.
- property partitioning: Partitioning | None#
The partitioning of the table, if any.
- read_arrow() pa.Table[source]#
Read the full contents of the table as a PyArrow table.
- Returns:
The PyArrow table containing the data.
Note
This requires
pyarrowandduckdbto be installed.
- remove_column(column: str) None[source]#
Remove a column from the table.
- Parameters:
column – The name of the column to remove.
- remove_column_tag(column: str | Sequence[str], key: str) None[source]#
Remove an existing tag from a column.
- Parameters:
column – The name of the column to remove the tag from. This may be provided as a “path” to a nested column.
key – The key of the tag.
- Raises:
ValueError – If no tag for the provided key exists.
- remove_tag(key: str) None[source]#
Remove an existing tag from the table.
- Parameters:
key – The key of the tag.
- Raises:
ValueError – If no tag for the provided key exists.
- rename(new_name: str) None[source]#
Rename the table in the catalog.
- Parameters:
new_name – The new name for the table.
Note
This operation does not affect the schema the table resides in. It is not currently possible to move a table to a different schema.
- rename_column(column: str, new_name: str) None[source]#
Rename a column in the table.
- Parameters:
column – The current name of the column to rename.
new_name – The new name for the column.
- rewrite_data_files( ) list[MaintenanceResult][source]#
Rewrite data files of this table that have a high fraction of deleted rows.
Dispatches to
ducklake_rewrite_data_filesscoped to this table.- Parameters:
delete_threshold – Minimum fraction (0-1) of deleted rows required to trigger a rewrite. Defaults to the
rewrite_delete_thresholdmetadata option (0.95).- Returns:
A row for each output file created by the operation.
Note
This requires
duckdbto be installed.
- scan() ScanResult[source]#
Scan the table and return all data files with their associated delete files.
- scan_duckdb() duckdb.DuckDBPyRelation[source]#
Read the full contents of the table as a DuckDB relation.
- Returns:
The DuckDB relation containing the data.
- set_metadata(
- **options: Unpack[TableMetadataUpdate],
Set one or more metadata options for this table.
Provide options as keyword arguments. Pass
Noneas a value to remove the option from the metadata (i.e. revert it to its default).- Raises:
ValueError – If a key is read-only and cannot be set.
See also
Ducklake.set_metadata()for setting metadata options at the global or schema scope.
- update_column_default( ) None[source]#
Update the default value of a column.
The default value can be a literal value, or an expression specified as a
(dialect, expression)tuple. PassNoneto remove the default.- Parameters:
column – The column for which to change the default value.
default_value – The new default value.
- update_column_dtype(column: str, new_dtype: DataType) None[source]#
Update the data type of the provided column.
Generally speaking, data types can only be changed via type promotion. For example, integers can be turned into larger integers.
For struct columns, updating the data type allows adding and dropping fields.
- Parameters:
column – The column for which to change the data type.
new_dtype – The new data type of the column.
- update_column_nullability(column: str, nullable: bool) None[source]#
Update the nullability of a column.
- Parameters:
column – The column for which to change the nullability.
nullable – Whether the column should allow null values.
- update_partitioning(partitioning: Partitioning | None) None[source]#
Update the partitioning of this table.
- Parameters:
partitioning – The new partitioning. If
Noneis provided, the partitioning of the table is reset.
Note
This is a metadata-only operation which does not rewrite data files. As a result, queries might not be fully optimized.
- update_schema(schema: Schema | Sequence[Column] | Mapping[str, DataType]) None[source]#
Update the full schema of the table.
This is a convenience function that allows to easily add and remove multiple columns as well as changing the data type of existing columns.
- Parameters:
schema – The new schema of the table.