Table#

class ducklake.Table[source]#

A DuckLake table.

Methods:

`add_column`	Add a new column to the table.
`add_column_tag`	Add a new tag to a column.
`add_tag`	Add a new tag to the table.
`delete`	Delete the table from the catalog.
`merge_adjacent_files`	Merge small adjacent data files of this table into larger ones.
`read_arrow`	Read the full contents of the table as a PyArrow table.
`remove_column`	Remove a column from the table.
`remove_column_tag`	Remove an existing tag from a column.
`remove_tag`	Remove an existing tag from the table.
`rename`	Rename the table in the catalog.
`rename_column`	Rename a column in the table.
`rewrite_data_files`	Rewrite data files of this table that have a high fraction of deleted rows.
`scan`	Scan the table and return all data files with their associated delete files.
`scan_duckdb`	Read the full contents of the table as a DuckDB relation.
`set_metadata`	Set one or more metadata options for this table.
`update_column_default`	Update the default value of a column.
`update_column_dtype`	Update the data type of the provided column.
`update_column_nullability`	Update the nullability of a column.
`update_partitioning`	Update the partitioning of this table.
`update_schema`	Update the full schema of the table.
`write_arrow`	Append the provided data to the table.

Attributes:

`metadata`	The metadata associated with the table.
`name`	The fully qualified name of the table.
`partitioning`	The partitioning of the table, if any.
`schema`	The schema of the table.
`tags`	The tags associated with the table.

add_column(column: Column) → None[source]#

Add a new column to the table.

Parameters:: column – The column to add.

add_column_tag(column: str | Sequence[str], key: str, value: str) → None[source]#

Add a new tag to a column.

Parameters:

column – The name of the column to add the tag to. This may be provided as a “path” to a nested column.
key – The key of the tag.
value – The value of the tag.

add_tag(key: str, value: str) → None[source]#

Add a new tag to the table.

Parameters:

key – The key of the tag.
value – The value of the tag.

delete() → None[source]#

Delete the table from the catalog.

After calling this method, the Table object is no longer valid.

merge_adjacent_files( *, max_compacted_files: int | None = None, min_file_size: int | None = None, max_file_size: int | None = None, ) → list[MaintenanceResult][source]#

Merge small adjacent data files of this table into larger ones.

Dispatches to ducklake_merge_adjacent_files scoped to this table.

Parameters:

max_compacted_files – Maximum number of compaction operations produced in a single call.
min_file_size – Excludes files smaller than this many bytes from compaction.
max_file_size – Excludes files at or larger than this many bytes from compaction. Defaults to the target_file_size table option.

Returns:

A row for each output file created by the operation.

Note

This requires duckdb to be installed.

property metadata: TableMetadata#: The metadata associated with the table.

property name: TableName#: The fully qualified name of the table.

property partitioning: Partitioning | None#: The partitioning of the table, if any.

read_arrow() → pa.Table[source]#

Read the full contents of the table as a PyArrow table.

Returns:: The PyArrow table containing the data.

Note

This requires pyarrow and duckdb to be installed.

remove_column(column: str) → None[source]#

Remove a column from the table.

Parameters:: column – The name of the column to remove.

remove_column_tag(column: str | Sequence[str], key: str) → None[source]#

Remove an existing tag from a column.

Parameters:

column – The name of the column to remove the tag from. This may be provided as a “path” to a nested column.
key – The key of the tag.

Raises:

ValueError – If no tag for the provided key exists.

remove_tag(key: str) → None[source]#

Remove an existing tag from the table.

Parameters:: key – The key of the tag.
Raises:: ValueError – If no tag for the provided key exists.

rename(new_name: str) → None[source]#

Rename the table in the catalog.

Parameters:: new_name – The new name for the table.

Note

This operation does not affect the schema the table resides in. It is not currently possible to move a table to a different schema.

rename_column(column: str, new_name: str) → None[source]#

Rename a column in the table.

Parameters:

column – The current name of the column to rename.
new_name – The new name for the column.

rewrite_data_files( *, delete_threshold: float | None = None, ) → list[MaintenanceResult][source]#

Rewrite data files of this table that have a high fraction of deleted rows.

Dispatches to ducklake_rewrite_data_files scoped to this table.

Parameters:: delete_threshold – Minimum fraction (0-1) of deleted rows required to trigger a rewrite. Defaults to the rewrite_delete_threshold metadata option (0.95).
Returns:: A row for each output file created by the operation.

Note

This requires duckdb to be installed.

scan() → ScanResult[source]#: Scan the table and return all data files with their associated delete files.

scan_duckdb() → duckdb.DuckDBPyRelation[source]#

Read the full contents of the table as a DuckDB relation.

Returns:: The DuckDB relation containing the data.

property schema: Schema#: The schema of the table.

set_metadata(

**options: Unpack[TableMetadataUpdate],

) → None[source]#

Set one or more metadata options for this table.

Provide options as keyword arguments. Pass None as a value to remove the option from the metadata (i.e. revert it to its default).

Raises:: ValueError – If a key is read-only and cannot be set.