Infrastructure Integration

Configuration

  1. Configure the agent by editing /etc/netsil-dd-agent/conf.d/mapreduce.yamlin the collectors.

Example:

      instances:
      #
      # The MapReduce check retrieves metrics from YARN's ResourceManager. This
      # check must be run from the Master Node and the ResourceManager URI must
      # be specified below. The ResourceManager URI is composed of the
      # ResourceManager's hostname and port.
      #
      # The ResourceManager hostname can be found in the yarn-site.xml conf file
      # under the property yarn.resourcemanager.address
      #
      # The ResourceManager port can be found in the yarn-site.xml conf file under
      # the property yarn.resourcemanager.webapp.address
      #
      - resourcemanager_uri: http://localhost:8088

        # A Required friendly name for the cluster.
        # cluster_name: MyMapReduceCluster

        # Set to true to collect histograms on the elapsed time of
        # map and reduce tasks (default: false)
        # collect_task_metrics: false

        # Optional tags to be applied to every emitted metric.
        # tags:
        #   - key:value
        #   - instance:production

    init_config:
      #
      # Optional metrics can be specified for counters. For more information on
      # counters visit the MapReduce documentation page:
      # https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html#Job_Counters_API
      #

      general_counters:
        #
        # general_counters are job agnostic metrics that create a metric for each
        # specified counter
        #
        # - counter_group_name: 'org.apache.hadoop.mapreduce.TaskCounter'
        #   counters:
        #     - counter_name: 'MAP_INPUT_RECORDS'
        #     - counter_name: 'MAP_OUTPUT_RECORDS'
        #     - counter_name: 'REDUCE_INPUT_RECORDS'
        #     - counter_name: 'REDUCE_OUTPUT_RECORDS'
        #
        # Additional counter's can be specified as following
        #

        # - counter_group_name: 'org.apache.hadoop.mapreduce.FileSystemCounter'
        #   counters:
        #     - counter_name: 'HDFS_BYTES_READ'

      job_specific_counters:
        #
        # job_specific_counters are metrics that are specific to a particular job.
        # The following example specifies counters for the jobs 'Foo' and 'Bar'.
        #

        # - job_name: 'Foo'
        #   metrics:
        #     - counter_group_name: 'org.apache.hadoop.mapreduce.FileSystemCounter'
        #       counters:
        #         - counter_name: 'FILE_BYTES_WRITTEN'
        #         - counter_name: 'HDFS_BYTES_WRITTEN'
        #     - counter_group_name: 'org.apache.hadoop.mapreduce.FileSystemCounter'
        #       counters:
        #         - counter_name: 'HDFS_BYTES_READ'
        # - job_name: 'Bar'
        #   metrics:
        #     - counter_group_name: 'org.apache.hadoop.mapreduce.FileSystemCounter'
        #       counters:
        #         - counter_name: 'FILE_BYTES_WRITTEN'
  1. Check and make sure that all yaml files are valid with following command:

    /etc/init.d/netsil-collectors configcheck
    
  2. Restart the Agent using the following command:

    /etc/init.d/netsil-collectors restart
    
  3. Execute the info command to verify that the integration check has passed:

    /etc/init.d/netsil-collectors info
    

The output of the command should contain a section similar to the following:

    Checks
    ======

      [...]

      mapreduce
      ---------
          - instance #0 [OK]
          - Collected 8 metrics & 0 events

Infrastructure Datasources

Datasource Available Aggregations Unit Description
mapreduce.job.elapsed_time.max avg max min sum millisecond Max elapsed time since the application started
mapreduce.job.elapsed_time.avg avg max min sum millisecond Average elapsed time since the application started
mapreduce.job.elapsed_time.median avg max min sum millisecond Median elapsed time since the application started
mapreduce.job.elapsed_time.95percentile avg max min sum millisecond 95th percentile elapsed time since the application started
mapreduce.job.elapsed_time.count avg max min sum Number of times the elapsed time was sampled
mapreduce.job.maps_total avg max min sum task/second Total number of maps
mapreduce.job.maps_completed avg max min sum task/second Number of completed maps
mapreduce.job.reduces_total avg max min sum task/second Number of reduces
mapreduce.job.reduces_completed avg max min sum task/second Number of completed reduces
mapreduce.job.maps_pending avg max min sum task/second Number of pending maps
mapreduce.job.maps_running avg max min sum task/second Number of running maps
mapreduce.job.reduces_pending avg max min sum task/second Number of pending reduces
mapreduce.job.reduces_running avg max min sum task/second Number of running reduces
mapreduce.job.new_reduce_attempts avg max min sum task/second Number of new reduce attempts
mapreduce.job.running_reduce_attempts avg max min sum task/second Number of running reduce attempts
mapreduce.job.failed_reduce_attempts avg max min sum task/second Number of failed reduce attempts
mapreduce.job.killed_reduce_attempts avg max min sum task/second Number of killed reduce attempts
mapreduce.job.successful_reduce_attempts avg max min sum task/second Number of successful reduce attempts
mapreduce.job.new_map_attempts avg max min sum task/second Number of new map attempts
mapreduce.job.running_map_attempts avg max min sum task/second Number of running map attempts
mapreduce.job.failed_map_attempts avg max min sum task/second Number of failed map attempts
mapreduce.job.killed_map_attempts avg max min sum task/second Number of killed map attempts
mapreduce.job.successful_map_attempts avg max min sum task/second Number of successful map attempts
mapreduce.job.counter.reduce_counter_value avg max min sum task/second Counter value of reduce tasks
mapreduce.job.counter.map_counter_value avg max min sum task/second Counter value of map tasks
mapreduce.job.counter.total_counter_value avg max min sum task/second Counter value of all tasks
mapreduce.job.map.task.elapsed_time.max avg max min sum millisecond Max of all map tasks elapsed time
mapreduce.job.map.task.elapsed_time.avg avg max min sum millisecond Average of all map tasks elapsed time
mapreduce.job.map.task.elapsed_time.median avg max min sum millisecond Median of all map tasks elapsed time
mapreduce.job.map.task.elapsed_time.95percentile avg max min sum millisecond 95th percentile of all map tasks elapsed time
mapreduce.job.map.task.elapsed_time.count avg max min sum Number of times the map tasks elapsed time were sampled
mapreduce.job.reduce.task.elapsed_time.max avg max min sum millisecond Max of all reduce tasks elapsed time
mapreduce.job.reduce.task.elapsed_time.avg avg max min sum millisecond Average of all reduce tasks elapsed time
mapreduce.job.reduce.task.elapsed_time.median avg max min sum millisecond Median of all reduce tasks elapsed time
mapreduce.job.reduce.task.elapsed_time.95percentile avg max min sum millisecond 95th percentile of all reduce tasks elapsed time
mapreduce.job.reduce.task.elapsed_time.count avg max min sum Number of times the reduce tasks elapsed time were sampled