Jump to content


Photo
- - - - -

Solaris Zone Workload Plug-in Monitor


  • Please log in to reply
4 replies to this topic

#1 David Leith

David Leith

    Guru

  • Uptime Software
  • PipPipPipPip
  • 168 posts
  • Gender:Male
  • Location:Toronto, Ontario

Posted 10 January 2008 - 01:41 PM

Solaris Zone Workload Plug-in Monitor

The Solaris Zone Workload Monitor is now available on the Grid

#2 Towster

Towster

    Member

  • Members
  • Pip
  • 1 posts

Posted 16 March 2011 - 06:55 PM

Ok, I just got this setup on my solaris uptime server.
There were a couple of issues so I thought I would share with everyone.

First off when I would try and test the service monitor there would be an error that the tempfile could not be created.
The solution to this was that I needed to create a directory "/opt/uptime/tmp" on the uptime server.

The next issue was little harder. So some of my agents were showing the data properly. Others were not showing the memory. And finally some were only showing a count of the zones.
These issues were actually two seperate problems.
The first one it looks like it was attempted to be resolved but there were issues with it. Uptime requires the results for persistant data to be numerical only. Because the output of prstat -Z from some of my servers have the memory in "GB" the data passed would have a "G" in it. I also noticed that if the results passed back contained an "M" it was simply stripped off so the results then were off by a factor of 1024.
The second issue was that the design of the script was to execute prstat -Z and then store the output and then kill the prstat after 2 seconds. I have some servers that are busy enough that the output of prstat would not return within the 2 second timeout. I changed the script to execute prstat -Z for only one iteration and use that output.

Here is the script with my changes applied.
CODE
#!/bin/ksh
#set -x
# a shell script to display basic workload information on the various zones on this system
# based off of prstat -Z and zoneadm

# ideal output is like so

# zones_running 5
# css1\.cpu 5
# css1\.mem 1000
# css1\.rss 2000
# css2\.cpu 25
# css2\.mem 2000
# css2\.rss 8000
# css3\.cpu 40
# css3\.mem 3000
# css3\.rss 7500

AWKBIN="/usr/bin/nawk"
SEDBIN="/usr/bin/sed"
PRSTAT="/usr/bin/prstat -Z 1 1"
ZONEADM="/usr/sbin/zoneadm list -iv"

# first add up the running zones
ZR=`$ZONEADM | grep running | wc -l`
echo "zones_running $ZR"

# Original design used two temp files and would timeout if the server was busy
# changed to not rely on tempfiles and not timeout
$PRSTAT | grep -v 'ZONEID' | grep -v 'PID' | $AWKBIN '{if (NF=="8") print $0;}' | while read ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE; do

  CPU=`echo $CPU | $SEDBIN s/%//`
  SIZE=`echo $SIZE | $AWKBIN '{ V=substr($1,0,length($1)-1); M=substr($1,length($1),1); if (M=="G") { print V*1024*1024 } else { if (M=="M") { print V*1024 } else { print $1 } } }'`
  RSS=`echo $RSS | $AWKBIN '{ V=substr($1,0,length($1)-1); M=substr($1,length($1),1); if (M=="G") { print V*1024*1024 } else { if (M=="M") { print V*1024 } else { print $1 } } }'`

  echo ${ZONE}.cpu $CPU
  echo ${ZONE}.mem $SIZE
  echo ${ZONE}.rss $RSS
  echo ${ZONE}.procs $NPROC
done


Hope others find this useful..


#3 Joe Fletcher

Joe Fletcher

    Member

  • Members
  • Pip
  • 8 posts

Posted 22 February 2012 - 02:42 PM

Hi,

I'm having some fun with this plugin on Uptime v6. I like your mods from the original and I'd like to suggest some adjustments.
If we change PRSTAT="prstat -Z -n1,20 1 1" we get a slightly more useful set of output, especially when there are lots of zones.

My version of the script now looks like this in terms of the grunt work part:

PRSTAT="prstat -Z -n1,20 1 1"
ZONEADM="/usr/sbin/zoneadm list -iv"

# first add up the running zones

ZR=`$ZONEADM | grep running | wc -l`
echo "zones_running $ZR"


# now the trickey part
$PRSTAT | grep -v 'ZONEID' | grep -v 'PID' | grep -v "\/" | grep -v Total | while read ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE; do

# Dump the last character so we just have numbers and no M characters or % signs.
CPU=`echo $CPU |sed 's/\(.*\)./\1/'`
SIZE=`echo $SIZE |sed 's/\(.*\)./\1/'`
RSS=`echo $RSS |sed 's/\(.*\)./\1/'`
MEM=`echo $MEMORY |sed 's/\(.*\)./\1/'`

echo ${ZONE}.cpu $CPU
echo ${ZONE}.mem $SIZE
echo ${ZONE}.rss $RSS
echo ${ZONE}.memory $MEM
echo ${ZONE}.procs $NPROC
done

I freely admit there is probably a more efficient way of doing this but it produces the numbers in a useful format and can deal with up to 20 zones on a box.



Cheers

Joe





QUOTE (Towster @ Mar 16 2011, 06:55 PM) <{POST_SNAPBACK}>
Ok, I just got this setup on my solaris uptime server.
There were a couple of issues so I thought I would share with everyone.

First off when I would try and test the service monitor there would be an error that the tempfile could not be created.
The solution to this was that I needed to create a directory "/opt/uptime/tmp" on the uptime server.

The next issue was little harder. So some of my agents were showing the data properly. Others were not showing the memory. And finally some were only showing a count of the zones.
These issues were actually two seperate problems.
The first one it looks like it was attempted to be resolved but there were issues with it. Uptime requires the results for persistant data to be numerical only. Because the output of prstat -Z from some of my servers have the memory in "GB" the data passed would have a "G" in it. I also noticed that if the results passed back contained an "M" it was simply stripped off so the results then were off by a factor of 1024.
The second issue was that the design of the script was to execute prstat -Z and then store the output and then kill the prstat after 2 seconds. I have some servers that are busy enough that the output of prstat would not return within the 2 second timeout. I changed the script to execute prstat -Z for only one iteration and use that output.

Here is the script with my changes applied.
CODE
#!/bin/ksh
#set -x
# a shell script to display basic workload information on the various zones on this system
# based off of prstat -Z and zoneadm

# ideal output is like so

# zones_running 5
# css1\.cpu 5
# css1\.mem 1000
# css1\.rss 2000
# css2\.cpu 25
# css2\.mem 2000
# css2\.rss 8000
# css3\.cpu 40
# css3\.mem 3000
# css3\.rss 7500

AWKBIN="/usr/bin/nawk"
SEDBIN="/usr/bin/sed"
PRSTAT="/usr/bin/prstat -Z 1 1"
ZONEADM="/usr/sbin/zoneadm list -iv"

# first add up the running zones
ZR=`$ZONEADM | grep running | wc -l`
echo "zones_running $ZR"

# Original design used two temp files and would timeout if the server was busy
# changed to not rely on tempfiles and not timeout
$PRSTAT | grep -v 'ZONEID' | grep -v 'PID' | $AWKBIN '{if (NF=="8") print $0;}' | while read ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE; do

  CPU=`echo $CPU | $SEDBIN s/%//`
  SIZE=`echo $SIZE | $AWKBIN '{ V=substr($1,0,length($1)-1); M=substr($1,length($1),1); if (M=="G") { print V*1024*1024 } else { if (M=="M") { print V*1024 } else { print $1 } } }'`
  RSS=`echo $RSS | $AWKBIN '{ V=substr($1,0,length($1)-1); M=substr($1,length($1),1); if (M=="G") { print V*1024*1024 } else { if (M=="M") { print V*1024 } else { print $1 } } }'`

  echo ${ZONE}.cpu $CPU
  echo ${ZONE}.mem $SIZE
  echo ${ZONE}.rss $RSS
  echo ${ZONE}.procs $NPROC
done


Hope others find this useful..



#4 Steve Esso

Steve Esso

    Member

  • Members
  • Pip
  • 1 posts

Posted 06 June 2012 - 07:07 AM

Hi,

I like your modifications, but i've had some issues with the outputs for Zones which are using GB of Memories.
In fact, the output of PRSTAT of a zone named myzone1 which is using for example 14GB of Memories with 13GB of RSS will be like

myzone1.cpu 2.0
myzone1.mem 14
myzone1.rss 13
myzone1.memory 4.9
myzone1.procs 248

So i've change the way Memory and RSS outputs are managed
Below is the core part of the modified script,

CODE

PRSTAT="prstat -Z -n1,20 1 1"
ZONEADM="/usr/sbin/zoneadm list -iv"

# first add up the running zones

ZR=`$ZONEADM | grep running | wc -l`
echo "zones_running $ZR"


# now the tricky part
$PRSTAT | grep -v 'ZONEID' | grep -v 'PID' | grep -v "\/" | grep -v Total | while read ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE; do

# Dump the last character so we just have numbers and no M characters or % signs.

CPU=`echo $CPU |sed 's/\(.*\)./\1/'`

# If Memory Size is in GB convert in MB and then dump the last character
if echo $SIZE | grep G > /dev/null 2<&1
then
SIZE=`echo $SIZE |sed 's/\(.*\)./\1/'`
SIZE=$(($SIZE * 1024))
else
SIZE=`echo $SIZE |sed 's/\(.*\)./\1/'`
fi

# If RSS Size is in GB convert in MB and then dump the last character
if echo $RSS | grep G > /dev/null 2<&1
then
RSS=`echo $RSS |sed 's/\(.*\)./\1/'`
RSS=$(($RSS * 1024))
else
RSS=`echo $RSS |sed 's/\(.*\)./\1/'`
fi

MEM=`echo $MEMORY |sed 's/\(.*\)./\1/'`


echo ${ZONE}.cpu $CPU
echo ${ZONE}.mem $SIZE
echo ${ZONE}.rss $RSS
echo ${ZONE}.memory $MEM
echo ${ZONE}.procs $NPROC
done

exit 0


With this version, i've now the following output for the same zone

myzone1.cpu 2.0
myzone1.mem 14336
myzone1.rss 13312
myzone1.memory 4.9
myzone1.procs 248


Hope others find this version useful

#5 Joel Pereira

Joel Pereira

    Guru

  • Members
  • PipPipPipPip
  • 174 posts
  • Gender:Male
  • Location:Toronto, Canada

Posted 06 June 2012 - 06:52 PM

Hey guys;
Just to let you know, we've made some updates to the Solaris Zone Workload monitor and the agent-side script by default is the one above from Steve Esso. We also made a bunch of updates to all the files, changed the monitoring station scripts to use PHP instead of Perl (so Perl is not a requirement anymore), and made some important updates to the monitor definition (XML).

You can find it here:
http://support.uptim...w.php?mod_id=24
Joel Pereira
Solutions Architect
uptime software ...because downtime is not an option




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users