DBT Transformation & Model Management

PreviousData Hygiene NextUsing ETL Tools

Last updated 10 months ago

DBT Transformation & Model Management

Data transformation ensures your data warehouse is a single, consistent source of centralized data. We recommend using DBT, which is open-source, runs where the data lives, and is easy to use.

Using DBT with dbt-postgres

Using DBT with Hydra uses DBT's Postgres support.

Installing dbt-postgres

pip is the easiest way to install the adapter:

pip install dbt-postgres

Installing dbt-postgres will also install dbt-core and any other dependencies.

Other ways to install dbt Core on the command line:

Configuring dbt-postgres

Unlogged

{{ config(materialized='table', unlogged=True) }}
select ...

models:
  +unlogged: true

Indexes

It's important to create indexes for columns that are commonly used in joins or where clauses.

Table models, incremental models, seeds, and snapshots may have a list of indexes defined. Each Postgres index can have three components:

columns (list, required): one or more columns on which the index is defined

{{ config(
    materialized = 'table',
    indexes=[
      {'columns': ['column_a'], 'type': 'hash'},
      {'columns': ['column_a', 'column_b'], 'unique': True},
    ]
)}}
select ...

create index if not exists
"3695050e025a7173586579da5b27d275"
on "my_target_database"."my_target_schema"."indexed_model"
 using hash
(column_a);

create unique index if not exists
"1bf5f4a6b48d2fd1a9b0470f754c1b0d"
on "my_target_database"."my_target_schema"."indexed_model"
 (column_a, column_b);

You can also configure indexes for a number of resources at once:

models:
  project_name:
    subdirectory:
      +indexes:
        - columns: ['column_a']
          type: hash

Profile Configuration

Postgres targets should be set up using the following configuration in your profiles.yml file.

company-name:
  target: dev
  outputs:
    dev:
      type: postgres
      host: [hostname]
      user: [username]
      password: [password]
      port: [port]
      dbname: [database name]
      schema: [dbt schema]
      threads: [1 or more]
      [keepalives_idle](<https://docs.getdbt.com/reference/warehouse-setups/postgres-setup#keepalives_idle>): 0 # default 0, indicating the system default. See below
      connect_timeout: 10 *# default 10 seconds*
      [retries](<https://docs.getdbt.com/reference/warehouse-setups/postgres-setup#retries>): 1  # default 1 retry on error/timeout when opening connections
      [search_path](<https://docs.getdbt.com/reference/warehouse-setups/postgres-setup#search_path>): [optional, override the default postgres search_path]
      [role](<https://docs.getdbt.com/reference/warehouse-setups/postgres-setup#role>): [optional, set the role dbt assumes when executing queries]
      [sslmode](<https://docs.getdbt.com/reference/warehouse-setups/postgres-setup#sslmode>): [optional, set the sslmode used to connect to the database]

Configurations

search_path

role

The role config controls the Postgres role that dbt assumes when opening new connections to the database.

sslmode

`keepalives_idle`

PreviousData Hygiene NextUsing ETL Tools

Last updated 10 months ago