Wednesday, April 11, 2007

Understanding RRDTool


RRD Tool is a product that grew out of MRTG. It creates a very compact database structure for the storage of periodic data, such as is gathered by OpenNMS. RRD data is stored in files that are created when initialized to hold data for a certain amount of time. This means that with the first data collection these files are as large as they will ever get, but it also means that you will see an initially large decrease in disk space as collection is first started. Once the RRD file is full, the oldest data is discarded.


Each RRD is made up of Round-Robin Archives. An RRA consists of a certain number of steps. All of the data that is collected in those steps is then consolidated into a single value that is then stored in the RRD. For instance, if I poll a certain SNMP variable once a minute, I could have an RRA that would collect all samples over a step of five minutes, average the (five) values together, and store the average in the RRD.


Step: The first line, the rrd step size, determines the granularity of the data. By default this is set to 300 seconds, or five minutes, which means that the data will be saved once every five minutes per step.


The RRA statements take the form:

RRA:Cf:xff:steps:rows

Where,

  • Cf: consolidation factors. It can take one of four values, AVERAGE, MAX, MIN, or LAST.
    • AVERAGE Average all the values over the number of steps in the RRA.
    • MAX Store the maximum value collected over the number of steps in the RRA.
    • MIN Store the minimum value collected over the number of steps in the RRA.
    • LAST Store the last value collected over the number of steps in the RRA.
  • xff: This is the "x-files factor". If we are trying to consolidate a number of samples into one, there is a chance that there could be gaps where a value wasn't collected (the device was down, etc.). In that case, the value would be UNKNOWN. This factor determines how many of the samples can be UNKNOWN for the consolidated sample is considered UNKNOWN. By default this is set to 0.5 or 50%.
  • steps: This states the number of "steps" that make up the RRA. For example, if the step size is 300 seconds (5 minutes) and the number of steps is 12, then the RRA is 12 x 5 minutes = 60 minutes = 1 hour long, and it will stored the consolidated value for that hour.
  • rows: This states the number of "steps" that make up the RRA. For example, if the step size is 300 seconds (5 minutes) and the number of steps is 12, then the RRA is 12 x 5 minutes = 60 minutes = 1 hour long, and it will stored the consolidated value for that hour.

So, we may have the following rrd element in poll-config.xml:

RRA:AVERAGE:0.5:1:8928

RRA:AVERAGE:0.5:12:8784

RRA:MIN:0.5:12:8784

RRA:MAX:0.5:12:8784

For example, consider the following line:

RRA:AVERAGE:0.5:1:8928

This says to create an archive consisting of the AVERAGE value collected over 1 step and store up to 8928 of them. If, for any step, more than 50% of the values are UNKNOWN, then the average value will be UNKNOWN. Since the default step size is 300 seconds, or five minutes, and the default polling cycle (in the collectd configuration) is five minutes, we would expect there to be one value per step, and so the AVERAGE should be the same as the MIN or MAX or LAST. 8928 five minute samples at 12 samples per hour and 24 hours per day is 31 days. Thus this RRA will hold five minute samples for 31 days before discarding data.

The last 3 lines of our RRD config are:

RRA:AVERAGE:0.5:12:8784

RRA:MIN:0.5:12:8784
RRA:MAX:0.5:12:8784

The only difference between these lines is the consolidation function. We are going to "roll up" the step 1 samples (5 minutes) into 12 step samples (1 hour). We are also going to store three values: the average of all samples during the hour, the minimum value of those samples and the maximum value. This data is useful for various reports (the AVERAGE shows throughput whereas MAX and MIN show peaks and valleys). These will be stored as one hour samples 8784 times, or 366 days.


So, to summarize, by default an SNMP collector will poll once every five minutes. This value will be stored as collected for 31 days. Also, hourly samples will be stored which include the MIN, MAX and AVERAGE.

7 comments:

Anonymous said...

Great work...
had scanned most of the sites for xff...
finally found it in detail enough to understand the % value ...
thnx!!!
cheers!

TechDood said...

Rajneesh/Rohit,

You have used example related to NMS which I have very little idea. Say if we want to graph number of DB queries over a period of 24 hours..How would we do this?

Your help would help me understand the logic.

TechDood said...

Guys,

I fixed the issue that I had. No need to explain further.

Thanks

Anonymous said...

Thanks.
I couldn't understand how steps were calculated, you cleared this up.

Anonymous said...

Unfortunately, none of the other guides that I've seen have been this informative (and short). Thanks!

Oleksandr Iegorov said...

Thank you for the article. It was very clear and explanative for me.

Anonymous said...

At last, a short / clear / concise article about rrdtool. Thanks!

Popular micro services patterns

Here are some popular Microservice design patterns that a programmer should know: Service Registry  pattern provides a  central location  fo...