DBT Transformation & Model Management
Last updated
Last updated
Data transformation ensures your data warehouse is a single, consistent source of centralized data. We recommend using DBT, which is open-source, runs where the data lives, and is easy to use.
Using DBT with Hydra uses DBT's Postgres support.
pip is the easiest way to install the adapter:
Installing dbt-postgres
will also install dbt-core
and any other dependencies.
Other ways to install dbt Core on the command line:
Use Homebrew to install dbt (recommended for MacOS + most popular plugins)
"Unlogged" tables can be considerably faster than ordinary tables, as they are not written to the write-ahead log nor replicated to read replicas. They are also considerably less safe than ordinary tables. See Postgres docs for details.
It's important to create indexes for columns that are commonly used in joins or where clauses.
Table models, incremental models, seeds, and snapshots may have a list of indexes
defined. Each Postgres index can have three components:
columns
(list, required): one or more columns on which the index is defined
unique
(boolean, optional): whether the index should be declared unique
type
(string, optional): a supported index type (B-tree, Hash, GIN, etc)
If one or more indexes are configured on a resource, dbt will run create index
DDL statement(s) as part of that resource's materialization, within the same transaction as its main create
statement. For the index's name, dbt uses a hash of its properties and the current timestamp, in order to guarantee uniqueness and avoid namespace conflict with other indexes.
You can also configure indexes for a number of resources at once:
Postgres targets should be set up using the following configuration in your profiles.yml
file.
The search_path
config controls the Postgres "search path" that dbt configures when opening new connections to the database. By default, the Postgres search path is "$user, public"
, meaning that unqualified table names will be searched for in the public
schema, or a schema with the same name as the logged-in user. Note: Setting the search_path
to a custom value is not necessary or recommended for typical usage of dbt.
The role
config controls the Postgres role that dbt assumes when opening new connections to the database.
The sslmode
config controls how dbt connectes to Postgres databases using SSL. See the Postgres docs on sslmode
for usage information. When unset, dbt will connect to databases using the Postgres default, prefer
, as the sslmode
.
keepalives_idle
If the database closes its connection while dbt is waiting for data, you may see the error SSL SYSCALL error: EOF detected
. Lowering the [keepalives_idle
value](https://www.postgresql.org/docs/9.3/libpq-connect.html) may prevent this, because the server will send a ping to keep the connection active more frequently.
dbt's default setting is 0 (the server's default value), but can be configured lower (perhaps 120 or 60 seconds), at the cost of a chattier network connection.