article AIX Performance Metrics

Related Documentation Version of up.time affected Affected Platforms
All AIX

The up.time AIX agent collects the following performance metrics from the systems on which it is installed:

The AIX agent uses a number of utilities to gather these metrics including:

  • sar: collects information about system activity. This version of sar is bundled with AIX.
  • mpstat: collects processor-related metrics.
  • ifconfig: configures the parameters for network interfaces.
  • ps: reports on the status of processes.

Each set of performance metrics is averaged between the interval at which the up.time monitoring station polls the agent (e.g. every 10 minutes).

Whenever the sar command uses the -f option to specify a file, that file is generated using the sadc 1 1 command. The sadccommand polls the system counters at a one-second interval, and writes the information that it receives to a file. The sar command then reads this file.

CPU

The up.time agent uses the sar -u -f command to collect CPU metrics from an AIX system. The statistics that the agent returns are averaged for all CPUs on the system and the sar command compares the system counters during a one-second interval. If you have multiple CPUs, the CPU statistics output by the agent are an average of all the CPUs on the server.

Metric Explanation
% Usr The amount of time that the CPU spends in user mode.
% Sys The amount of time that the kernel spends processing system calls.
% WIO The amount of waiting time that a runnable process for a device takes to perform an I/O operation.
Multi CPU Usage Whether or not a system wity multiple CPUs is effectively balancing tasks between CPUs, or if processes are being forced off CPUs in certain circumstances.
Run Queue Length The percentage of time that one or more services or processes are waiting to be served by the CPU.
Run Queue Occupancy The percentage of time that one or more services or processes are waiting to be served by the CPU.

Memory

The up.time agent uses the vmstat 1 2 command to average statistics for the entire system. The agent also uses the sar utility with the following options to collect memory metrics from an AIX system:

  • -b -f (cache metrics)
  • -r -f (unused memory pages and disk blocks)
  • -q -f (the average queue length while it is occupied, and the percentage of time the queue is occupied)
  • -c -f (system calls)

The sar commands compare the system counters over a one-second interval.

Metric Explanation
Free Memory The amount of physical memory available to the operating system, system library files, and applications.
Cache Hit Rate How often the system accesses the CPU cache.
Page-outs/s The rate at which pages were written to disk.
Page-ins/s The rate at which pages were read from or written to the disk.
Page Free/s The number of pages that are freed from memory each second.
Attaches/s The number of pages that get attached to memory each second.
odio/s The number of non-paging disk I/O per operations that occur each second.
slots The number of available initiators.
cycle/s The number of page replacement cycles that occur each second.
fault/s The number of page faults that occur each second.
Software Locks/s The number of software locks that are issued each second.

Disk

The up.time agent uses the following commands to collect disk statistics:

  • df -k to gather file system capacity statistics, for file system.
  • sar -d -f to output disk statistics (e.g. %busy, Read/Write/s) per disk, and compare those statisitics between polling intervals.

By default, the disk statistics are generated for all disks (including disks that are not active). This can be changed within the agent by setting the ACTIVEONLY flag in the perfparse.sh file to 0.

Metric Explanation
Disk (Spindle) Name The names of each disk on the system.
Usage (% Busy) The percentage of time during which the disk drive is handling read or write requests.
Blocks per second The number of read and write operations on the disk that occur each second.
Transfers/s The average number of bytes that have been transferred to or from the disk during write or read operations.
Average Queued Requests The number of threads that are waiting for processor time.
Average Service Time The average amount of time, in milliseconds, that is required for a request to be carried out.
Average Wait Time The average time, in milliseconds, that a transaction is waiting in a queue. The wait time is directly proportional to the length of the queue.

Network

The up.time agent uses the netstat command with the following options to collect network metrics from an AIX system:

  • netstat -s to combine TCP retransmits for all interfaces
  • netstat -I <interface> to average statistics (e.g. kbps, errors and collisions) per interface.
Metric Explanation
Receive Rate The rate, in kilobytes per seconds, at which data is received over a specific network adapter.
Send Rate The rate, in kilobytes per seconds, at which data is sent over a specific network adapter.
Packets Inbound Errors The number of inbound packets that contained errors, which preventing those packets from being delivered to a higher-layer protocol.
Packets Outbound Errors The number of outbound packets that could not be transmitted because of errors.
Collisions The number of signals from two separate nodes on the network that have collided.
TCP Retransmits The number of packets that have been re-sent over a network interface.

Process and Workload

The up.time agent uses the ps -eo command to collect process metrics from an AIX system. By default, the agent only gathers the top 20 processes and sorts them by the highest CPU usage.

Workload statistics are sorted within up.time's core. However, the core uses the same 20 processes that were gathered from the Process method. The following data are also gathered with the processes: the names of users, groups and processes along with their invividual statistics (e.g. memory and CPU usage). up.time's core will then sort the statistics based on the graph you want to generate (e.g. user, group or process name).

Metric Explanation
Number of Processes The number of processes that are currently running on a system.
Process Creation Rate This metric determines whether or not there are runaway processes on a system or if a forking-based process (like a Web server) is spawning too many processes over a specified period of time.
Processes Running The number of processes that are currently running.
Processes Blocked The number of processes that are currently being blocked from running.
Processes Waiting The number of processes that are currently waiting to runn.
Workload - User The demand that network and local services are putting on the system, based on the IDs of the users who are logged into a system.
Workload - Group The demand that network and local services are putting on the system, based on the IDs of the user groups that are logged into a system.
Workload - Process Name The demand that network and local services are putting on a system, based on the processes that are running.
Workload Top 10 - User The 10 network and local services that are are putting the most load on the system, based on the IDs of the users who are logged into a system.
Workload Top 10 - Group The 10 network and local services that are are putting the most load on the system, based on the IDs of the user groups who are logged into a system.
Workload Top 10 - Process Name The 10 network and local services that are are putting the most load on the system, based on the processes that are running.

User

The up.time agent uses the following utilities to collect user metrics from an AIX system:

  • ps -eo
  • last | head 10 (login history for the last 10 users on the system)
  • who (lists who is currently logged into the system)
Metric Explanation
Login History The number of times or frequency at which a user has logged into a system during any 30 minute time interval.
Sessions The number of sessions or number of distinct users who are logged into a system during any 30 minute time interval.

Related Articles


Microsoft IIS Performance Metrics

RatingViews
article

This article outlines the performance metrics that up.time collects from an IIS server.

By: uptime Support | Date Created: 1-26-2007 | Last Modified: 6-30-2011 | Index: 133

  7587

Linux Performance Metrics

RatingViews
article

This article outlines the performance metrics that are collected by the up.time Linux agent.

By: uptime Support | Date Created: 1-17-2007 | Last Modified: 2-15-2013 | Index: 117

  20724

WebLogic Performance Metrics

RatingViews
article

This article outlines the performance metrics that up.time collects from a WebLogic server.

By: uptime Support | Date Created: 1-30-2007 | Last Modified: 6-30-2011 | Index: 136

  4771

pSeries LPAR Performance Metrics

RatingViews
article

This article outlines the metrics that the AIX agent collects from servers that have logical partitions (LPARs).

By: uptime Support | Date Created: 1-26-2007 | Last Modified: 6-30-2011 | Index: 134

  4739

Tru64 Performance Metrics

RatingViews
article

This article outlines the performance metrics that the up.time Tru64 agent collects.

By: uptime Support | Date Created: 1-26-2007 | Last Modified: 6-30-2011 | Index: 130

  5383

User Comments



No comments have been posted.