Unix - Best Practices for Application Support - Log / Application / System Monitoring

Best Practices for Application Support on Unix / Linux



1. Configure the .profile file within your home directory –


Add aliasis for commonly used commands.

Alias logs=’cd /opt/WebSphere6/AppServer/profiles/Viva/logs/VivaWebClusterMemberPsc9800/’
Alias tlogs=’tail –f /opt/WebSphere6/AppServer/profiles/Viva/logs/VivaWebClusterMemberPsc9800/SystemOut.log | grep “ERROR”

Set the landing directory to the most visited directory ( could be log directory in many cases )

cd /opt/WebSphere6/AppServer/profiles/Viva/logs/VivaWebClusterMemberPsc9800


2. Use !, !! and history for recalling previous commands



3. Use FIND , GREP , SED , AWK –


To see only lines containing an error / exception in last 5000 lines -

tail -5000 /opt/WebSphere6/AppServer/profiles/Viva/logs/VivaWebClusterMemberPsc9800/SystemOut.log | grep -i "FileNotFoundException"

To see lines containing any of the multiple errors / exceptions in running logs -

tail -f SystemOut.log | egrep "(WSWS3713E|WSWS3734W|WSVR0605W|javax.net.ssl.SSLHandshakeException|ThreadMonitor)"

To see Error Snipets in running logs -

tail -f /opt/WebSphere6/AppServer/profiles/Viva/logs/VivaWebClusterMemberPsc9800/SystemOut.log | sed -n '/ERROR/,/EST/p'

To get all error / exception snippets in another file

sed -n '/ERROR/,/EST/p' /opt/WebSphere6/AppServer/profiles/Viva/logs
/VivaWebClusterMemberPsc9800/SystemOut* >> logAna.txt

To find occurences of a particular error in last n days

find /opt/WebSphere6/AppServer/profiles/Viva/logsOld/VivaWebClusterMemberPsc9800/ -iname "SystemOut*" -mtime -7 -exec zgrep "FileNotFoundException" {} \; >> logAnalysis.txt

To count number of error / exception occurences in a log file.

sed -n '/ERROR/,/EST/p' /opt/WebSphere6/AppServer/profiles/Viva/logs/VivaWebClusterMemberPsc9800/logAnalysis.txt | grep "LogicBlockSetupException" | wc -l
To report the file size of all files bigger than 2 mb and older than 30 days.

find . -type f -size +4096 -atime +30 -exec \du -sk '{}' \;


4. Use Shell script for Monitoring logs for critical errors –


Sample Script –

#!/bin/ksh
# Set the config variables

# *************************************************Configuration********************************************************
logFileName="SystemOut.log"
errorList="WSWS3713E|WSWS3734W|WSVR0605W|javax.net.ssl.SSLHandshakeException|ThreadMonitor"
EMAIL_SUBJECT="Viva - Critical ERROR"
EMAIL_TO="viva@viva.com"
# **********************************************************************************************************************

logFilepath=""

# Set the Log File path

if [ `hostname` = cpc9600 ]
then
logFilePath="/opt/WebSphere6/AppServer/profiles/Viva/logs/VivaWebClusterMemberCpc9600"
elif [ `hostname` = cpc9601 ]
then
logFilePath="/opt/WebSphere6/AppServer/profiles/Viva/logs/VivaWebClusterMemberCpc9601"
elif [ `hostname` = psc9800 ]
then
logFilePath="/opt/WebSphere6/AppServer/profiles/Viva/logs/VivaWebClusterMemberPsc9800"
elif [ `hostname` = psc9801 ]
then
logFilePath="/opt/WebSphere6/AppServer/profiles/Viva/logs/VivaWebClusterMemberPsc9801"
fi

if [ ! -s $logFilePath/$logFileName ]; then echo "ERROR- Log File Not Found , Please set the config properly"
exit
fi

# Get the first 30 characters of the first line linestart=$(awk 'NR>1{exit} ;1' $logFilePath/$logFileName | cut -c1-30)

lineend=""

# Never ending loop that will parse the SystemOut.log file every 5 sec

while true ; do

# get the last line of file , till which we need to parse the log in this iteration lineend=$(awk 'END{print}' $logFilePath/$logFileName | cut -c1-30)

# if log file not found , Do nothing and wait for the next iteration if [ ! -s $logFilePath/$logFileName ]; then echo "Log File Not Found .. Waiting for the next iteration ..."
fi

# error checking , in case we dont find the linestart , parse the whole file grep "$linestart" $logFilePath/$logFileName if [ $? != 0 ] then
echo "cat $logFilePath/$logFileName | egrep $errorList | /usr/sbin/sendmail -s $EMAIL_SUBJECT $EMAIL_TO"
cat $logFilePath/$logFileName | egrep "$errorList" | /usr/sbin/sendmail -s $EMAIL_SUBJECT $EMAIL_TO

else
#parse the log file from linestart to lineend for errors

echo 'awk "/$linestart/,/$lineend/" $logFilePath/$logFileName | egrep "$errorList" | /usr/sbin/sendmail -s $EMAIL_SUBJECT $EMAIL_TO'
awk "/$linestart/,/$lineend/" $logFilePath/$logFileName | egrep "$errorList" | /usr/sbin/sendmail -s $EMAIL_SUBJECT $EMAIL_TO #set the last line as the first line for next iteration linestart=$lineend fi

#set the last line as the first line for next iteration linestart=$lineend

sleep 5
done


5. Use Shell Scripts for Automating System Monitoring Task.


#!/bin/ksh

errorSnippet=''

# ********************************************************Configuration***************************************************************
homeBench='90'
VivaBench='90'
rootBench='90'
appHomeBench='90'
idleBench='95'
logsOldBench='15'
memUsageBench='2500'
avgLoadBench='5'
EMAIL_SUBJECT="Server Health Check Report for $(hostname)"
EMAIL_TO="test@test.com"
# ************************************************************************************************************************************

dfHome=`df | sed -n '/ \/home$/s/.* \([0-9][0-9]*\)%.*/\1/p'`
dfViva=`df | sed -n '/ \/apphome\/Viva$/s/.* \([0-9][0-9]*\)%.*/\1/p'`
dfRoot=`df | sed -n '/ \/$/s/.* \([0-9][0-9]*\)%.*/\1/p'`
dfApphome=`df | sed -n '/ \/apphome$/s/.* \([0-9][0-9]*\)%.*/\1/p'`
dfLogsOld=`df | sed -n '/ \/localvg-logsOld$/s/.* \([0-9][0-9]*\)%.*/\1/p'`
memUsage=`sar -q 1 | tail -1 | awk '{ print "" $3}' | sed 's/%//g'`
avgLoad=`uptime | awk -F "$FTEXT" '{ print $2 }' | cut -d, -f3`
iostatIdle=`iostat | awk '{print $5}' | awk 'NR==4' | cut -d '.' -f1`

if [[ $dfHome -gt $homeBench ]] then
errorSnippet="Disk Usage for /home exceedeed the benchmark, Its $dfHome now";
fi
if [[ $dfViva -gt $VivaBench ]] then
errorSnippet="$errorSnippet \n Disk Usage for /apphome/Viva exceedeed the benchmark, Its $dfViva now";
fi
if [[ $dfRoot -gt $rootBench ]] then
errorSnippet="$errorSnippet \n Disk Usage for /(root) exceedeed the benchmark, Its $dfRoot now";
fi
if [[ $dfRoot -gt $rootBench ]] then
errorSnippet="$errorSnippet \n Disk Usage for /(root) exceedeed the benchmark, Its $dfRoot now";
fi
if [[ $dfApphome -gt $appHomeBench ]] then
errorSnippet="$errorSnippet \n Disk Usage for /apphome exceedeed the benchmark, Its $dfApphome now";
fi
if [[ $dfLogsOld -gt $logsOldBench ]] then
errorSnippet="$errorSnippet \n Disk Usage for logs old exceedeed the benchmark, Its $dfLogsOld now";
fi
if [[ $iostatIdle -gt $idleBench ]] then
errorSnippet="$errorSnippet \n Iostat idle exceedeed the benchmark, Its $iostatIdle now";
fi
if [[ $memUsage -gt $memUsageBench ]] then
errorSnippet="$errorSnippet \n Memory Usage exceedeed the benchmark, Its $memUsage now";
fi
if [[ $avgLoad -gt $avgLoadBench ]] then
errorSnippet="$errorSnippet \n 15 minute Average Load exceedeed the benchmark, Its $avgLoad now";
fi

print $errorSnippet
if [ "$errorSnippet" != "" ]; then
`echo errorSnippet | /bin/mail -s $EMAIL_SUBJECT $EMAIL_TO`
fi

6. Use Scripts to continuously monitor the state(up/down) of application.

To Send an email if the website is down for maintanance

while true ; do
curl websiteaddress.com | grep -q "Down for Maintenance"
if [ $? -eq 0 ] ; then
echo "Website is Down" | mail -s "Website is down for maintenance" email@address.com
; fi
sleep 20
done

To Send an email if the website is down and doesn’t contain “Normal String”.

while true ; do
/usr/bin/wget "www.example.com" --timeout 30 -O - 2>/dev/null | grep "Normal operation string" || echo "The site is down" | /usr/bin/mail -v -s "Site is down" your@e-mail.address
sleep 20
done

Related Posts -

Unix Shell Scripts for Log Monitoring , Production and Application Support.