Infrastructure Integration

Configuration

  1. Configure the agent by editing /etc/netsil-dd-agent/conf.d/process.yamlin the collectors.

Example:

    init_config:
      # the check will refresh the matching pid list every X seconds
      # except if it detects a change before. You might want to set it
      # low if you want to alert on process service checks.
      # pid_cache_duration: 120
      #
      # used to override the default procfs path, e.g. for docker containers with the outside fs mounted at /host/proc
      # procfs_path: /proc

    instances:
    # The `system.processes.cpu.pct` metric sent by this check is only accurate for processes that live
    # for more than 30 seconds. Do not expect its value to be accurate for shorter-lived processes.
    #
    #  One and only one of search_string, pid or pid_file must be specified
    #  - name: (required) STRING. It will be used to uniquely identify your metrics as they will be tagged with this name
    #    search_string: LIST OF STRINGS. If one of the elements in the list matches,
    #                   return the counter of all the processes that contain the string
    #    pid: STRING. A Process id.
    #    pid_file: STRING. A Pid file.
    #    exact_match: (optional) Boolean. Default to True, if you want to look for an arbitrary
    #                 string only use search_string, use exact_match: False
    #    ignore_denied_access: (optional) Boolean. Default to True, when getting the number of files descriptors, dd-agent user might
    #    get a denied access. Set this to true to not issue a warning if that happens.
    #    thresholds: (optional) Two ranges: critical and warning
    #         warning: (optional) List of two values: If the number of processes found is below the first value or
    #                  above the second one, the process check will return WARNING.
    #         critical: (optional) List of two values: If the number of processes found is below the first value or
    #                   above the second one, the process check will return CRITICAL.
    #     In this example, process check will return OK for 3 to 5 process. WARNING for 1, 2, 6, 7 processes and Critical below 1 or above 7.
    #     CRITICAL is always dominant in case of overlapping.
    #    collect_children: BOOLEAN. If true, the check will also collect metrics from all child processes of a matched process. Default to false. 
    #                      Please be aware that the collection is recursive, and might take some time depending on the use case.
    #
    # Examples:
    #
    #  - name: ssh
    #    search_string: ['ssh', 'sshd']
    #    tags:
    #      - env:staging
    #      - cluster:big-data
    #    thresholds:
    #      critical if no sshd or more than 8 sshd are running
    #      critical: [1, 7]
    #      warning if 1, 2, 6, 7 sshd processes are running
    #      warning: [3, 5]
    #      ok if 3, 4, 5 processes are running
    #
    #  - name: postgres
    #    search_string: ['postgres']
    #    ignore_denied_access: True
    #
    #  - name: nodeserver
    #    search_string: ['node server.js']
    #
    #  - name: pid_process
    #    pid: 1278
    #    Do not use search_string when searching by pid or multiple processes will be grabbed
    #
    #  - name: pid_file
    #    pid_file: /var/run/sshd.pid
  1. Check and make sure that all yaml files are valid with following command:

    /etc/init.d/netsil-collectors configcheck
    
  2. Restart the Agent using the following command:

    /etc/init.d/netsil-collectors restart
    
  3. Execute the info command to verify that the integration check has passed:

    /etc/init.d/netsil-collectors info
    

Infrastructure Datasources

Datasource Available Aggregations Unit Description
system.processes.cpu.pct avg max min sum percent The process CPU utilization.
system.processes.involuntary_ctx_switches avg max min sum event The number of involuntary context switches performed by this process.
system.processes.ioread_bytes avg max min sum byte The number of bytes read from disk by this process.
system.processes.ioread_count avg max min sum read The number of disk reads by this process.
system.processes.iowrite_bytes avg max min sum byte The number of bytes written to disk by this process.
system.processes.iowrite_count avg max min sum write The number of disk writes by this process.
system.processes.mem.page_faults.minor_faults avg max min sum occurrence/second The number of minor page faults per second for this process.
system.processes.mem.page_faults.children_minor_faults avg max min sum occurrence/second The number of minor page faults per second for children of this process.
system.processes.mem.page_faults.major_faults avg max min sum occurrence/second The number of major page faults per second for this process.
system.processes.mem.page_faults.children_major_faults avg max min sum occurrence/second The number of major page faults per second for children of this process.
system.processes.mem.pct avg max min sum percent The process memory consumption.
system.processes.mem.real avg max min sum byte The non-swapped physical memory a process has used and cannot be shared with another process.
system.processes.mem.rss avg max min sum byte The non-swapped physical memory a process has used. aka "Resident Set Size".
system.processes.mem.vms avg max min sum byte The total amount of virtual memory used by the process. aka "Virtual Memory Size".
system.processes.number avg max min sum process The number of processes.
system.processes.open_file_descriptors avg max min sum The number of file descriptors used by this process.
system.processes.open_handles avg max min sum The number of handles used by this process.
system.processes.threads avg max min sum thread The number of threads used by this process.
system.processes.voluntary_ctx_switches avg max min sum event The number of voluntary context switches performed by this process.