Datasets
A Spicepod can contain one or more datasets referenced by relative path, or defined inline.
datasets
Inline example:
spicepod.yaml
datasets:
- from: spice.ai/eth/beacon/eigenlayer
name: strategy_manager_deposits
params:
app: goerli-app
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
spicepod.yaml
datasets:
- from: databricks.com/spiceai/datasets
name: uniswap_eth_usd
params:
environment: prod
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
spicepod.yaml
datasets:
- from: local/Users/phillip/data/test.parquet
name: test
acceleration:
enabled: true
mode: inmemory # / file
engine: arrow # / duckdb
refresh_interval: 1h
refresh_mode: full / append # update / incremental
retention: 30m
Relative path example:
spicepod.yaml
datasets:
- from: datasets/uniswap_v2_eth_usdc
datasets/uniswap_v2_eth_usdc/dataset.yaml
name: spiceai.uniswap_v2_eth_usdc
type: overwrite
source: spice.ai
auth: spice.ai
acceleration:
enabled: true
refresh: 1h
name
The name of the dataset. This is used to reference the dataset in the pod manifest, as well as in external data sources.
type
The type of dataset. The following types are supported:
overwrite
- Overwrites the dataset with the contents of the dataset source.append
- Appends new data from dataset source to the dataset.
source
The source of the dataset. The following sources are supported:
spice.ai
dremio
(coming soon)databricks
(coming soon)
auth
Optional. The authentication profile to use to connect to the dataset source. Use spice login
to create a new authentication profile.
If not specified, the default profile for the data source is used.
acceleration
Optional. Accelerate queries to the dataset by caching data locally.
acceleration.enabled
Enable or disable acceleration, defaults to true
.
acceleration.engine
The acceleration engine to use, defaults to arrow
. The following engines are supported:
arrow
- Accelerated in-memory backed by Apache Arrow DataTables.duckdb
- Accelerated by an embedded DuckDB database.postgres
- Accelerated by an embedded DuckDB database.
acceleration.mode
Optional. The mode of acceleration. The following values are supported:
memory
- Store acceleration data in-memory.file
- Store acceleration data in a file.
mode
is currently only supported for the duckdb
engine.
acceleration.refresh_mode
Optional. How to refresh the dataset. The following values are supported:
full
- Refresh the entire dataset.append
- Append new data to the dataset.
acceleration.refresh_interval
Optional. How often data should be refreshed. Only supported for full
datasets. For append
datasets, the refresh interval not used.
i.e. 1h
for 1 hour, 1m
for 1 minute, 1s
for 1 second, etc.
acceleration.retention
Optional. Only supported for append
datasets. Specifies how long to retain data updates from the data source before they are deleted.
If not specified, the default retention is to keep all data.
acceleration.params
Optional. Parameters to pass to the acceleration engine. The parameters are specific to the acceleration engine used.
acceleration.engine_secret
Optional. The secret store key to use the acceleration engine connection credential. For supported data connectors, use spice login
to store the secret.