If you’ve managed PostgreSQL databases in the past, you know how important it is to have proper monitoring in place for them. One bad query at the right volume or time of day can cascade into a plethora of issues. Such a query could cause crippling CPU utilization on your database server or result in overutilization of the available database connections.
If you’re using Datadog to monitor some of your other infrastructures, then leveraging their PostgreSQL monitoring integration could provide you with some great insights into what’s happening inside your databases.
In this blog, I’ll be detailing some of the useful metrics that I found to be provided by their PostgreSQL monitoring integration, as well as how to configure it.
Also, just to be clear, I am not partnered with Datadog or any of the vendors mentioned in this post. I’m just here to provide some tips that will help you extend your current implementation to PostgreSQL!
Some of the key metrics in Postgres that I like to keep tabs on are:
These are just some of the metrics I tend to monitor within Postgres. Datadog’s integration checks all of these boxes. We’ll cover a more comprehensive list of useful metrics later on in this piece.
If you’re hosting your database server on-prem or on virtual machines, you can couple these metrics with the foundational metrics provided by the Datadog agent to build a good foundation for database system monitoring.
Are you using Amazon RDS or Google CloudSQL? I’d highly recommend integrating your AWS or GCP account into Datadog, by providing them with a read-only account to pull metrics with. Once this information is pulled in, you can leverage the StackDriver or CloudWatch metrics for your infrastructure with your Postgres integration’s metrics to build a holistic monitoring suite for your backend Postgres instances.
As per best practices, you should create a dedicated monitoring database user. This helps with tracking down connections and performance issues should anything pop up.
For Postgres 10:
CREATE USER datadog_monitoring WITH PASSWORD ''; grant pg_monitor to datadog_monitoring;
Postgres Versions < 10:
CREATE USER datadog_monitoring WITH PASSWORD ''; GRANT SELECT ON pg_stat_database to datadog_monitoring; # You also need a special definer since the pg_monitor role isn't available in PG < 10 :( CREATE FUNCTION pg_stat_activity() RETURNS SETOF pg_catalog.pg_stat_activity AS $$ SELECT * from pg_catalog.pg_stat_activity; $$ LANGUAGE sql VOLATILE SECURITY DEFINER; CREATE VIEW pg_stat_activity_dd AS SELECT * FROM pg_stat_activity(); grant SELECT ON pg_stat_activity_dd to datadog_monitoring;
If you didn’t already know, Datadog automatically loads YAML files you put under the agent’s conf.d/ directory when their agent starts up.
Here’s the directory I put my Datadog Postgres monitoring YAML files in:
/etc/datadog-agent/conf.d/postgres.d/
Since I use Terraform to manage multiple environments, I opted to dynamically build my YAML config files based on outputs from Terraform modules (i.e. RDS instances), but you can manually configure yours if it suits you.
Using another Configuration Management tool like Puppet or Ansible would work great too; that’s another route I’ve taken.
init_config: instances: # X Database - host: localhost port: 5432 username: ${database_username} password: ${database_user_password} dbname: ${database_name} # This allows Datadog to pull statement/transaction metrics collect_activity_metrics: true # Tags are a critical concept in Datadog to enable monitor grouping/filtering, etc. tags: - environment:${environment} - project:${project} - db:${database_name} # Optionally, whitelist only certain relations you want to collect metrics for relations: - table_name_1 - table_name_2 # Y Database - host: localhost port: 5432 username: ${database_username} password: ${database_user_password} dbname: ${database_name_2} # This allows Datadog to pull statement/transaction metrics collect_activity_metrics: true # Tags are a critical concept in Datadog to enable monitor grouping/filtering, etc. tags: - environment:${environment} - project:${project} - db:${database_name_2} # Optionally, whitelist only certain relations you want to collect metrics for relations: - table_name_1 - table_name_2
When I first implemented Postgres monitoring in Datadog, I had to specify all relations that I wanted to monitor, but their latest documentation tends to suggest that by default, all are included. If you find this to not be the case, try specifying the relations list as detailed above.
After getting the config files deployed to your environment, restart the Datadog agent.
If you’re on Version 6 of the Datadog Agent, you can issue this command to confirm that the integration is working:
agent status
Here’s a quick list of some of the metrics you’ve reaped from the Datadog Postgres integration, which I’ve found valuable:
Datadog automatically adds tags to some of the data points, so be sure to tinker with averaging by table, database, etc.
I become SO fired up talking about monitoring; it’s a really interesting topic. With that being said, we can all go a little overboard with our monitoring if we’re not careful.
One of the things I’ve tried to do over my career is to partition metrics into three categories:
I’ve found that taking this approach from the beginning helps you to better characterize the appropriate notification medium when a metric exhibits a certain behavior.
Let’s be honest – we don’t like waking up in the middle of the night to answer PagerDuty or VictorOps alerts, but it’s part of the job. What we can do to make it better is to only push the big red alarm button when immediate action is necessary. Passive alerts can go to a dedicated Slack channel for DevOps/SRE to observe and investigate as needed.
--I work at a staffing agency. Day in and day out, I get to hear our recruiters prep candidates as…
Getting into DevOps is like any other career, in the sense that it is a life-altering decision - and, with…
Not too long ago, I was searching for information on the AWS Solutions Architect Professional certification exam. I was trying…
So you've launched a new application in production, and your applications are performing snappily as you might expect. The cloud…
It's Not Easy Doing Everything There's an oft-circulated image in the technology sector - a company looking for a Swift…
Run a Task on Each ECS Cluster Node If you're using Amazon ECS with tools such as Datadog or local…