Hydra Scheduler
Hydra Scheduler is used to schedule tasks at a regular interval on your data warehouse.
Hydra Scheduler is a cron-style task scheduler that can run arbitrary SQL inside your data warehouse on any schedule. You can use tasks to move data between data sources (using foreign data wrappers), move data from row to columnar tables, perform transformations, refresh materialized views, or any other regular maintenance tasks.
Scheduler uses the standard cron syntax:
In addition, you can use @hourly
, @daily
, @weekly
, @monthly
, and @yearly
shortcuts. These are equivalent to:
Note that the timezone used for all tasks is UTC.
Administering tasks
All administration of tasks occur in the postgres
database. Your Hydra user has permission to connect to this database for this purpose.
To connect to the database, connect normally, then switch to postgres
using \c postgres
.
Creating a task
To create a task:
Deleting a task
To delete a task:
Modify, enable, or disable a task
You can alter any parameters about a task, such as its schedule, command, and whether it is currently active.
First you must obtain the job ID:
You can then alter the task. NULL
indicates that change is desired. The last parameter controls whether the task is disabled:
Review tasks
The cron.job
table contains all current jobs and their parameters.
Reviewing task logs
Each time a job runs, a record is placed in cron.job_run_details
.
Deleting task logs
Cleaning the task logs is at your discretion. You could schedule a task to remove logs after a set time:
Use Cases
Any SQL you can write can be turned into a scheduled task, but even better is writing a function you then schedule. This allows you to more easily write and test the function before scheduling the task. Functions can perform anything you need to do on your data warehouse.
Update materialized views
Materialized views are a great way to cache results of expensive analytical queries for your dashboards to consume. You can write a function to refresh your materialized views periodically:
You can then schedule this task:
⚠️ Refreshing too many views concurrently may cause a very large load spike on your database. You can use pg_sleep()
or multiple tasks to spread the load over time.
Move data to columnar storage
You can create an hourly or daily process to move new data into columnar. This makes managing the incoming flow of data easier as you can replicate data into row-based tables at any interval, and then archive data in columnar storage once it no longer changes.
⚠️ This query will only work as intended if some recent data exists in events_columnar
. You should do an initial data load before running this function.
You can then schedule the task to move this data every day:
As an added bonus, you can create a view to query from both tables:
Last updated