Table#

class ducklake.Table[source]#

A DuckLake table.

Methods:

add_column

Add a new column to the table.

add_column_tag

Add a new tag to a column.

add_tag

Add a new tag to the table.

delete

Delete the table from the catalog.

merge_adjacent_files

Merge small adjacent data files of this table into larger ones.

read_arrow

Read the full contents of the table as a PyArrow table.

remove_column

Remove a column from the table.

remove_column_tag

Remove an existing tag from a column.

remove_tag

Remove an existing tag from the table.

rename

Rename the table in the catalog.

rename_column

Rename a column in the table.

rewrite_data_files

Rewrite data files of this table that have a high fraction of deleted rows.

scan

Scan the table and return all data files with their associated delete files.

scan_duckdb

Read the full contents of the table as a DuckDB relation.

set_metadata

Set one or more metadata options for this table.

update_column_default

Update the default value of a column.

update_column_dtype

Update the data type of the provided column.

update_column_nullability

Update the nullability of a column.

update_partitioning

Update the partitioning of this table.

update_schema

Update the full schema of the table.

write_arrow

Append the provided data to the table.

Attributes:

metadata

The metadata associated with the table.

name

The fully qualified name of the table.

partitioning

The partitioning of the table, if any.

schema

The schema of the table.

tags

The tags associated with the table.

add_column(column: Column) None[source]#

Add a new column to the table.

Parameters:

column – The column to add.

add_column_tag(column: str | Sequence[str], key: str, value: str) None[source]#

Add a new tag to a column.

Parameters:
  • column – The name of the column to add the tag to. This may be provided as a “path” to a nested column.

  • key – The key of the tag.

  • value – The value of the tag.

add_tag(key: str, value: str) None[source]#

Add a new tag to the table.

Parameters:
  • key – The key of the tag.

  • value – The value of the tag.

delete() None[source]#

Delete the table from the catalog.

After calling this method, the Table object is no longer valid.

merge_adjacent_files(
*,
max_compacted_files: int | None = None,
min_file_size: int | None = None,
max_file_size: int | None = None,
) list[MaintenanceResult][source]#

Merge small adjacent data files of this table into larger ones.

Dispatches to ducklake_merge_adjacent_files scoped to this table.

Parameters:
  • max_compacted_files – Maximum number of compaction operations produced in a single call.

  • min_file_size – Excludes files smaller than this many bytes from compaction.

  • max_file_size – Excludes files at or larger than this many bytes from compaction. Defaults to the target_file_size table option.

Returns:

A row for each output file created by the operation.

Note

This requires duckdb to be installed.

property metadata: TableMetadata#

The metadata associated with the table.

property name: TableName#

The fully qualified name of the table.

property partitioning: Partitioning | None#

The partitioning of the table, if any.

read_arrow() pa.Table[source]#

Read the full contents of the table as a PyArrow table.

Returns:

The PyArrow table containing the data.

Note

This requires pyarrow and duckdb to be installed.

remove_column(column: str) None[source]#

Remove a column from the table.

Parameters:

column – The name of the column to remove.

remove_column_tag(column: str | Sequence[str], key: str) None[source]#

Remove an existing tag from a column.

Parameters:
  • column – The name of the column to remove the tag from. This may be provided as a “path” to a nested column.

  • key – The key of the tag.

Raises:

ValueError – If no tag for the provided key exists.

remove_tag(key: str) None[source]#

Remove an existing tag from the table.

Parameters:

key – The key of the tag.

Raises:

ValueError – If no tag for the provided key exists.

rename(new_name: str) None[source]#

Rename the table in the catalog.

Parameters:

new_name – The new name for the table.

Note

This operation does not affect the schema the table resides in. It is not currently possible to move a table to a different schema.

rename_column(column: str, new_name: str) None[source]#

Rename a column in the table.

Parameters:
  • column – The current name of the column to rename.

  • new_name – The new name for the column.

rewrite_data_files(
*,
delete_threshold: float | None = None,
) list[MaintenanceResult][source]#

Rewrite data files of this table that have a high fraction of deleted rows.

Dispatches to ducklake_rewrite_data_files scoped to this table.

Parameters:

delete_threshold – Minimum fraction (0-1) of deleted rows required to trigger a rewrite. Defaults to the rewrite_delete_threshold metadata option (0.95).

Returns:

A row for each output file created by the operation.

Note

This requires duckdb to be installed.

scan() ScanResult[source]#

Scan the table and return all data files with their associated delete files.

scan_duckdb() duckdb.DuckDBPyRelation[source]#

Read the full contents of the table as a DuckDB relation.

Returns:

The DuckDB relation containing the data.

property schema: Schema#

The schema of the table.

set_metadata(
**options: Unpack[TableMetadataUpdate],
) None[source]#

Set one or more metadata options for this table.

Provide options as keyword arguments. Pass None as a value to remove the option from the metadata (i.e. revert it to its default).

Raises:

ValueError – If a key is read-only and cannot be set.

See also

Ducklake.set_metadata() for setting metadata options at the global or schema scope.

property tags: dict[str, str]#

The tags associated with the table.

update_column_default(
column: str,
default_value: Value | tuple[str, str] | None = None,
) None[source]#

Update the default value of a column.

The default value can be a literal value, or an expression specified as a (dialect, expression) tuple. Pass None to remove the default.

Parameters:
  • column – The column for which to change the default value.

  • default_value – The new default value.

update_column_dtype(column: str, new_dtype: DataType) None[source]#

Update the data type of the provided column.

Generally speaking, data types can only be changed via type promotion. For example, integers can be turned into larger integers.

For struct columns, updating the data type allows adding and dropping fields.

Parameters:
  • column – The column for which to change the data type.

  • new_dtype – The new data type of the column.

update_column_nullability(column: str, nullable: bool) None[source]#

Update the nullability of a column.

Parameters:
  • column – The column for which to change the nullability.

  • nullable – Whether the column should allow null values.

update_partitioning(partitioning: Partitioning | None) None[source]#

Update the partitioning of this table.

Parameters:

partitioning – The new partitioning. If None is provided, the partitioning of the table is reset.

Note

This is a metadata-only operation which does not rewrite data files. As a result, queries might not be fully optimized.

update_schema(schema: Schema | Sequence[Column] | Mapping[str, DataType]) None[source]#

Update the full schema of the table.

This is a convenience function that allows to easily add and remove multiple columns as well as changing the data type of existing columns.

Parameters:

schema – The new schema of the table.

write_arrow(data: pa.Table) None[source]#

Append the provided data to the table.

Parameters:

data – The PyArrow table containing the data to append. The schema of the data must match the table’s current schema.

Note

This requires pyarrow and duckdb to be installed.