Hydra Scheduler
Hydra Scheduler is used to schedule tasks at a regular interval on your data warehouse.
Last updated
Hydra Scheduler is used to schedule tasks at a regular interval on your data warehouse.
Last updated
Hydra Scheduler is a cron-style task scheduler that can run arbitrary SQL inside your data warehouse on any schedule. You can use tasks to move data between data sources (using foreign data wrappers), move data from row to , perform transformations, refresh materialized views, or any other regular maintenance tasks.
Scheduler uses the standard cron syntax:
In addition, you can use @hourly
, @daily
, @weekly
, @monthly
, and @yearly
shortcuts. These are equivalent to:
Note that the timezone used for all tasks is UTC.
All administration of tasks occur in the postgres
database. Your Hydra user has permission to connect to this database for this purpose.
To connect to the database, connect normally, then switch to postgres
using \c postgres
.
To create a task:
To delete a task:
You can alter any parameters about a task, such as its schedule, command, and whether it is currently active.
First you must obtain the job ID:
You can then alter the task. NULL
indicates that change is desired. The last parameter controls whether the task is disabled:
The cron.job
table contains all current jobs and their parameters.
Each time a job runs, a record is placed in cron.job_run_details
.
Cleaning the task logs is at your discretion. You could schedule a task to remove logs after a set time:
Any SQL you can write can be turned into a scheduled task, but even better is writing a function you then schedule. This allows you to more easily write and test the function before scheduling the task. Functions can perform anything you need to do on your data warehouse.
You can then schedule this task:
You can create an hourly or daily process to move new data into columnar. This makes managing the incoming flow of data easier as you can replicate data into row-based tables at any interval, and then archive data in columnar storage once it no longer changes.
You can then schedule the task to move this data every day:
As an added bonus, you can create a view to query from both tables:
are a great way to cache results of expensive analytical queries for your dashboards to consume. You can write a function to periodically:
Refreshing too many views concurrently may cause a very large load spike on your database. You can use or multiple tasks to spread the load over time.
This query will only work as intended if some recent data exists in events_columnar
. You should do an initial data load before running this function.