As our PostgreSQL database needs grow, I needed to adjust how it used memory to make the most use of what we are paying for. Tuning Your PostgreSQL Server was really helpful in understanding the parameters to adjust, what they affect, and their relative importance. shared_buffers
, work_mem
, and effective_cache_size
are the parameters that I was mostly looking at to get memory use right.
In order to get a good picture and know if my changes were effective I needed to monitor how we were using memory. I set up our servers to record metrics to Graphite and configured a Grafana dashboard to show usage over time.
Tracking Memory Usage
My first question was, “How much memory is PostgreSQL using?” It’s really hard to know for sure, but this article describes how to traditional tools are likely to report incorrectly (because resident memory, or RSS, counts memory between between processes for each process). A better tool to use is smem
which can report on proportional memory, or PSS.
I downloaded the latest smem
release and copied the executable to /usr/local/bin
. The I used a small script to pull out USS (freeable memory), PSS, and RSS from the smem
results for the postgres
user and send them to Graphite. Lines like this (the full script is available in this Gist):
/usr/local/bin/smem -u | grep postgres | awk '{print "'$API_KEY'.postgres.'$HOST'.pss " $5 " " strftime("%s")}' | nc $SERVER $PORT
Capturing Temporary File Size
In reading about work_mem
, it states that when queries are too large to be handled in the allotted memory it will swap out blocks to disk as temporary files in the pgsql_tmp
folder. I want to reduce this swapping as much as possible, so I added a chart to track the amount of disk space consumed in that folder.
du -sk /data/pgsql/9.5/data/base/pgsql_tmp/ | awk '{print "'$API_KEY'.postgres.'$HOST'.tmp " $1 " " strftime("%s")}' | nc $SERVER $PORT
Configuring the Dashboard
The next step was to set up a Grafana dashboard. This was pretty straightforward. For each metrics (tmp
, uss
, pss
, rss
) I set up a chart to pull that from all sources. There were two tricks:
- Set the left Y axis to use “kilobytes” as the unit (which auto adjusts to MB, GB, etc.)
- Add consolidateBy(max) function to each metrics so that when you zoom out it will show the largest value for the period (although I think this isn’t quite right because Graphite is still aggregating by average)
This has been a real boon and helped me better understand whether memory was being used. By looking at the proportional memory chart I can see that we’re using about 10% of our memory for postgres tasks on average and peaking at about 20%. The balance of our memory is being consumed by disk cache (which PostgreSQL depends on for SQL query caching). Because we front our database with significant caching, and we tend to run lots of different, large, complex queries, I can accept smaller disk cache, and increase memory available to worker processes (i.e. work_mem
) so that we’re not writing temporary files to disk as much.
Update
I posted a follow up to this with additional charts in PostgreSQL Monitoring and Performance Tuning: Phase 2.
One thought on “PostgreSQL Monitoring and Performance Tuning”