SYSXPERTS‎ > ‎Announcements‎ > ‎

vm.dirty_ratio and vm.dirty_background_ratio

posted Jul 20, 2011, 12:24 PM by Paul Valentino   [ updated Sep 27, 2011, 11:17 PM ]
I've recently confirmed that as far as kernel parameters go there are none so valuable for performance when set correctly as in the case of the two mentioned in the title of this post.  The improvements are especially noticeable on systems with large amounts of memory, but I wouldn't ignore them in cases of normal memory allocation either.  And, that's not to say that others like kernel.shm??? net.core.* or fs.* are not important, I'm simply saying that all things equal I have never seen more profound improvements in performance by adjusting a kernel parameter than I have with these two related to the Page Cache.  This has held true for me whether it be an Oracle Database server or a JBoss application server, a physical machine or a virtual machine. 

So that begs the question, "What are these dirty_ratio and dirty_background_ratio and why do they have such a significant impact?"

To answer that question we first must understand that the page cache on a Linux system is the area where filesystem based I/O is cached and that these settings affect how the cache is utilized by the kernel, more specifically, tuning pdflush for how much RAM based cache to use for (dirty pages) data targeted to disk and for how frequently to flush that cached data by writing the pages back to disk.  Dirty pages are the pages in memory (page cache) that have been updated and therefore have changed from what is currently stored on disk.

So getting into the detail of these settings I will first say that there are other settings related to pdflush and many other posts that go into the gory details so I shall focus on these two settings since they are the only ones I've ever had to manipulate in order to improve performance or eliminate a crippling bottleneck.

The first setting vm.dirty_background_ratio is found in /proc/sys/vm/dirty_ratio and can be set by echoing a value to that location as in:
echo 5 > /proc/sys/vm/dirty_background_ratio
or by updating /etc/sysctl.conf by adding vm.dirty_background_ratio = 5 and then running sysctl -p

vm.dirty_background_ratio is the maximum percentage of ((Cache + Free) - Mapped) memory that can be dirty before it is written to disk by the pdflush process.  In other words, if you have 256G of RAM and are using the default value of 10 for dirty_background_ratio then the system will probably never attempt to flush the cache based upon this setting.  The value of dirty_expire_centisecs (default of every 30 seconds) will hopefully kick in to initiate a pdflush well before the 10% mark is reached.

The second setting vm.dirty_ratio is found in /proc/sys/vm/dirty_ratio and can be set by echoing a value to that location or by updating /etc/sysctl.conf with vm.dirty_ratio = 5 for example (default value is 40).

vm.dirty_ratio is the value that represents the percentage of MemTotal that can consume dirty pages before all processes must write dirty buffers back to disk and when this value is reached all I/O is blocked for any new writes until dirty pages have been flushed.  Therefore, in the case of the 256G of installed RAM and the default setting of 40% for this value, you will see the pdflush process kick in at approximately 100G of dirty pages resulting in blocked I/O for as much as several minutes at a time.  Just do a dd to a large file and watch ls -la /largefile to see the lengthy pauses.  Again, dirty_expire_centisecs will kick in before the cache grows this large but 30 seconds is a very long time for a system under heavy I/O.  You will most likely experience debilitating I/O problems with the default values on systems with large amounts of memory.

Before we progress to changing any values it's pertinent to get a good baseline during periods of peak activity for the targeted workload, so if this is a JBoss server I want a baseline during a period where it is most actively being used by the users for its primary purpose and not during the backup window.  (Note that changing these settings will most definitely impact the way backups work so you will also want to verify that impact and make adjustments accordingly, more info at end of post)

The best way to see the pdflush activity on a systems is by watching the relevant vmstats:

watch grep -A 1 dirty /proc/vmstat  # Dirty Pages and writeback to disk activity

You might also like to have atop installed as it provides very good insight into the status of your page cache.  Right around the 4th row of output you can see the cache and dirty pages info.  At the same time you will also want to capture some iostats, netstats, vmstats etc. for your baseline.  In this way you can see how the pdflush settings changes are impacting your overall system performance as you make them. 

Similarly you can watch the proc filesystem to capture Cache and dirty pages stats if you prefer:

watch grep ^Cached /proc/meminfo   # Page Cache size
watch grep -A 1 dirty /proc/vmstat  # Dirty Pages and writeback to disk activity
watch cat /proc/sys/vm/nr_pdflush_threads  # shows # of active pdflush threads

First I add the new values to /etc/sysctl.conf but do not update them:

vm.dirty_background_ratio = 5
vm.dirty_ratio = 5

Then, I prefer to get things cleaned up by freeing up pagecache, dentries and inodes and then performing the update to settings by running the following:

sync; echo 3 > /proc/sys/vm/drop_caches; sysctl -p

On a system that has been crippled by the default settings I will typically start with the settings above and then go through several iterations of capturing metrics while changing only the vm.dirty_background_ratio by 1 until I find a sweet spot.  I seem to always end up staying with a value of 5 for the vm.dirty_ratio but you may also want to go through the same steps to confirm the sweetest spot for that value as well.

As mentioned earlier, changing these settings will most likely have a negative impact on your backup performance unless your system has similar I/O characteristics to the backup solution you are using.  It will be in your best interests to monitor your backup window before and after each change.  For me, I often end up offloading any large file backups to an alternate server and adjusting the backup software's client cache settings in addition to making the changes above.  The end result in my cases has been about 25% average performance improvement;  I've seen as high as a 50% increase in I/O performance on high transaction systems and as much as 40% for batch processing activity by manipulating these values.

Help out a future vProfessional in need by donating to vCommunity Trust Inc. today!

vSphere5 Training from TrainSignal