You're working away, and suddenly your edit session goes to hell because /tmp is full.
It's one thing when it happens on your workstation; it's a much bigger deal on a server with actual paying customers. Here are some scripts to make your life easier.
Use something like this for your crontab file if you want to check diskspace every 5 minutes around the clock:
# Set these environment variables within cron: CRON=yes # Scripts know when they're being run via cron. # MAILTO=yourname # Who to send mail to. Leave blank if no mail is to be sent. # # Environment variables set by cron: # SHELL=/bin/sh # USER=yourname # PATH=/usr/bin:/bin # PWD=/home/yourname # SHLVL=1 # HOME=/home/yourname # LOGNAME=yourname # # To test, uncomment this line: ## * * * * * /bin/env > /tmp/env$$ #=================================================================== # Everything on a line is separated by blanks or tabs. # #+----------------------------- Minute (0-59) #| +----------------------- Hour (0-23) #| | +----------------- Day (1-31) #| | | +------------- Month (1-12) #| | | | +--------- Day of week (0-6, 0=Sunday) #| | | | | +---- Command to be run #| | | | | | #v v v v v v #=================================================================== # Keep an eye on drives and disk space. Run every 5 minutes. 4-54/5 * * * * $HOME/cron/checkdrives
#!/bin/ksh
#
# $Revision: 1.3 $ $Date: 2010-11-10 13:20:42-05 $
# $UUID: ca583930-f781-3100-b878-5542c05bace9 $
#
#<checkdrives: send mail if a filesystem gets too full
# Try to avoid depending on GNU software being installed.
PATH=/bin:/usr/bin
BLOCKSIZE=1m
BLOCK_SIZE=1048576
export PATH BLOCKSIZE BLOCK_SIZE
tag=${0##*/}
# Portability and configuration stuff here.
subject='drives getting full'
to='admin-urgent'
host=$(hostname | cut -f1 -d.)
work=$HOME/var/drives
max=96 # more than this percent == drive is too full.
# What df should we use?
case "$(uname -s)" in
SunOS) DF='/usr/xpg4/bin/df -F ufs' ;;
FreeBSD) DF='df -t ufs' ;;
*) DF='df' ;;
esac
# Real work starts here. Run df, skip the header, kill %-sign,
# and list filesystems that are too full.
filesys=$($DF | sed -n -e '2,$p' |
tr -d '%' | awk -v max=$max '$5 >= max {print $6}')
case "X$filesys" in
X) exit 0 ;;
*) ;;
esac
# Keep current and previous drive status.
if test ! -d $work; then
mkdir -p $work 2> /dev/null
if test ! -d $work; then
echo "$host: $tag: mkdir $work failed" | mailx $to
exit 1
fi
fi
# Don't send the same message repeatedly.
cd $work
(echo $host; $DF $filesys) > cur
if test -f prev; then
cmp -s cur prev || mailx -s "$host $subject" $to < cur
else
mailx -s "$host $subject" $to < cur
fi
mv cur prev
exit 0
Notice that the script uses mail to tell you about problems; just replace mailx with something to send a popup message if you're running this on the same host that's being checked.
If you have several hosts to keep track of, it's better to set up a mail address that will automatically send you a popup message or alert of some type upon receipt of a message. Procmail will handle that very nicely.
These can be incredibly annoying, so I don't use them unless there's something requiring immediate attention. If you use X-Windows, have a look at the xalarm package. If not, "write" will do the trick:
#!/bin/ksh
#
# $Revision: 1.3 $ $Date: 2011-09-25 19:51:09-04 $
# $UUID: 97362b8a-57af-3b67-b751-ce8712d62c27 $
#
#<popup: send a quick popup message.
export PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
export USER=yourname
# If the user isn't taking calls, exit.
test -f "$HOME/.nopopup" && exit 0
# If no message, exit.
case "$#" in
0) exit 0 ;;
*) str=${1+"$@"} ;;
esac
# If running under X use xalarm, else use write.
case "$DISPLAY" in
"") set X $(who | grep pts/ | head -1)
tty="$3"
echo "$str" | write $USER $tty
;;
*) set X $(date)
today="$4 $3 $5"
msg=$(echo "$today @ $str" | tr '@' '\012')
export DISPLAY
xalarm -name xmemo -time +0 -geometry +20-40 -nowarn "$msg"
;;
esac
exit 0
If you know your system was fine a few hours ago, it's handy to have a timeline to see where things started going to hell. The examples below are run under Linux, but you only need trivial changes to use it under Solaris or FreeBSD.
Since "adm" is usually responsible for accounting stuff, I run these scripts under that userid. Here's the crontab file:
SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=yourname HOME=/var/log/sa # # Need this for performance log archives PERFLOG=/var/log/perflog #=================================================================== # Everything on a line is separated by blanks or tabs. # #+--------------------------- Minute (0-59) #| +----------------------- Hour (0-23) #| | +----------------- Day (1-31) #| | | +------------- Month (1-12) #| | | | +--------- Day of week (0-6, 0=Sunday) #| | | | | +---- Command to be run #| | | | | | #v v v v v v #=================================================================== # Run performance log every 10 min. 1-51/10 * * * * /usr/local/cron/perflog $PERFLOG #------------------------------------------------------------------- # Summarize files just before midnight. 55 23 * * * run-parts /etc/cron.perflog
My /var/log/perflog directory looks like this:
/var/log/perflog:
drwxr-s--- 3 adm mis 4096 Sep 24 00:01 2011/
drwxrwsr-x 2 adm mis 4096 Sep 23 23:55 2011.n/
/var/log/perflog/2011:
drwxr-s--- 145 adm mis 4096 Sep 24 23:51 0924/
/var/log/perflog/2011/0924:
drwxr-s--- 2 adm mis 4096 Sep 24 00:01 0001/
drwxr-s--- 2 adm mis 4096 Sep 24 00:11 0011/
drwxr-s--- 2 adm mis 4096 Sep 24 00:21 0021/
[...]
drwxr-s--- 2 adm mis 4096 Sep 24 23:51 2351/
/var/log/perflog/2011/0924/0001:
-rw-r----- 1 adm mis 754 Sep 24 00:01 cache
-rw-r----- 1 adm mis 1424 Sep 24 00:01 df
-rw-r----- 1 adm mis 848 Sep 24 00:01 ifconfig
-rw-r----- 1 adm mis 771 Sep 24 00:01 meminfo
-rw-r----- 1 adm mis 171 Sep 24 00:01 netstat
-rw-r----- 1 adm mis 1198 Sep 24 00:01 ping
-rw-r----- 1 adm mis 10724 Sep 24 00:01 ps
-rw-r----- 1 adm mis 3245 Sep 24 00:01 smbstatus
-rw-r----- 1 adm mis 104 Sep 24 00:01 swap
-rw-r----- 1 adm mis 84 Sep 24 00:01 uname
-rw-r----- 1 adm mis 71 Sep 24 00:01 uptime
/var/log/perflog/2011/0924/0011:
-rw-r----- 1 adm mis 754 Sep 24 00:11 cache
-rw-r----- 1 adm mis 1424 Sep 24 00:11 df
-rw-r----- 1 adm mis 848 Sep 24 00:11 ifconfig
-rw-r----- 1 adm mis 771 Sep 24 00:11 meminfo
-rw-r----- 1 adm mis 171 Sep 24 00:11 netstat
-rw-r----- 1 adm mis 1197 Sep 24 00:11 ping
-rw-r----- 1 adm mis 10776 Sep 24 00:11 ps
-rw-r----- 1 adm mis 3568 Sep 24 00:11 smbstatus
-rw-r----- 1 adm mis 104 Sep 24 00:11 swap
-rw-r----- 1 adm mis 84 Sep 24 00:11 uname
-rw-r----- 1 adm mis 71 Sep 24 00:11 uptime
/var/log/perflog/2011/0924/0021:
-rw-r----- 1 adm mis 754 Sep 24 00:21 cache
-rw-r----- 1 adm mis 1424 Sep 24 00:21 df
-rw-r----- 1 adm mis 848 Sep 24 00:21 ifconfig
-rw-r----- 1 adm mis 58362 Sep 24 01:19 iostat
-rw-r----- 1 adm mis 771 Sep 24 00:21 meminfo
-rw-r----- 1 adm mis 10752 Sep 24 01:20 mpstat
-rw-r----- 1 adm mis 171 Sep 24 00:21 netstat
-rw-r----- 1 adm mis 1197 Sep 24 00:21 ping
-rw-r----- 1 adm mis 10580 Sep 24 00:21 ps
-rw-r----- 1 adm mis 3435 Sep 24 00:21 smbstatus
-rw-r----- 1 adm mis 104 Sep 24 00:21 swap
-rw-r----- 1 adm mis 84 Sep 24 00:21 uname
-rw-r----- 1 adm mis 71 Sep 24 00:21 uptime
-rw-r----- 1 adm mis 10654 Sep 24 01:19 vmstat
[...]
/var/log/perflog/2011/0924/2351:
-rw-r----- 1 adm mis 754 Sep 24 23:51 cache
-rw-r----- 1 adm mis 1424 Sep 24 23:51 df
-rw-r----- 1 adm mis 848 Sep 24 23:51 ifconfig
-rw-r----- 1 adm mis 771 Sep 24 23:51 meminfo
-rw-r----- 1 adm mis 171 Sep 24 23:51 netstat
-rw-r----- 1 adm mis 1197 Sep 24 23:51 ping
-rw-r----- 1 adm mis 10281 Sep 24 23:51 ps
-rw-r----- 1 adm mis 2651 Sep 24 23:41 smbstatus
-rw-r----- 1 adm mis 104 Sep 24 23:51 swap
-rw-r----- 1 adm mis 84 Sep 24 23:51 uname
-rw-r----- 1 adm mis 72 Sep 24 23:51 uptime
Each file holds output from one specific command.
For example, the file /var/log/perflog/2011/0924/0001/cache holds output from "vmstat -s" at 12:01am, 9/24/2011:
1943948 total memory
1892424 used memory
49772 active memory
1806208 inactive memory
51524 free memory
6592 buffer memory
1819552 swap cache
2096472 total swap
75892 used swap
2020580 free swap
137464830 non-nice user cpu ticks
45415 nice user cpu ticks
7180476 system cpu ticks
651831801 idle cpu ticks
69403053 IO-wait cpu ticks
66205 IRQ cpu ticks
962645 softirq cpu ticks
0 stolen cpu ticks
1345948021 pages paged in
967598390 pages paged out
4522472 pages swapped in
4535965 pages swapped out
1806195040 interrupts
1877621550 CPU context switches
1312589033 boot time
2675983 forks
Every 20 minutes, output from iostat and mpstat is included:
Linux ... (server.com) 09/24/11 _i686_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
15.86 0.01 0.95 8.01 0.00 75.18
Device: r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 60.90 2.41 323.08 119.35 6.99 0.59 9.36 1.67 10.60
sdb 28.92 0.31 789.33 20.64 27.71 0.88 30.22 3.53 10.31
sdc 1.22 0.17 173.42 60.09 168.97 0.06 45.14 2.78 0.38
sdd 38.05 0.28 914.60 68.24 25.64 0.20 5.32 3.40 13.03
sde 0.13 0.14 22.63 51.03 271.50 0.05 166.80 3.48 0.09
sdf 6.47 0.14 675.25 55.67 110.54 0.10 14.80 2.60 1.72
sdg 17.62 0.22 705.99 71.35 43.56 0.59 33.05 4.55 8.11
Just before midnight, I like to jam the day's entries into one file to reduce storage space. One easy way is to use "head":
==> 0924/0001/df <==
Filesystem 1M-blocks Used Available Use% Mounted
/dev/sda1 15873 9821 5234 66% /
/dev/sda2 7933 2628 4896 35% /var
/dev/sda5 7933 247 7277 4% /home
/dev/sda6 335728 252096 66303 80% /rd01
tmpfs 950 0 950 0% /dev/shm
tmpfs 950 64 887 7% /tmp
/dev/sdb6 341144 233036 90780 72% /rd02
/dev/sdc6 341144 217470 106345 68% /rd03
/dev/sdd6 341144 225758 98058 70% /rd04
/dev/sdf6 341144 263800 60015 82% /rd07
/dev/sdg6 341144 244507 79308 76% /rd08
/dev/sde6 341144 28520 295295 9% /rd05
Filesystem Inodes IUsed IFree IUse% Mounted
/dev/sda1 4198176 446524 3751652 11% /
/dev/sda2 2097152 5429 2091723 1% /var
/dev/sda5 2097152 1387 2095765 1% /home
/dev/sda6 88735744 232375 88503369 1% /rd01
tmpfs 191235 1 191234 1% /dev/shm
tmpfs 191235 12 191223 1% /tmp
/dev/sdb6 44367872 187957 44179915 1% /rd02
/dev/sdc6 44367872 44983 44322889 1% /rd03
/dev/sdd6 44367872 276423 44091449 1% /rd04
/dev/sdf6 44367872 196609 44171263 1% /rd07
/dev/sdg6 44367872 147284 44220588 1% /rd08
/dev/sde6 44367872 6438 44361434 1% /rd05
==> 0924/0001/ifconfig <==
[...]
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:539988 errors:0 dropped:0 overruns:0 frame:0
TX packets:539988 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:44423496 (42.3 MiB) TX bytes:44423496 (42.3 MiB)
==> 0924/0001/meminfo <==
MemTotal: 1943948 kB
MemFree: 53076 kB
Buffers: 6180 kB
Cached: 1798304 kB
SwapCached: 3256 kB
[...]
These files compress very nicely under a separate directory tree:
/var/log/perflog/2011.n: -rw-r--r-- 1 adm mis 405844 Jun 1 23:55 0601.xz -rw-r--r-- 1 adm mis 398520 Jun 2 23:55 0602.xz [...] -rw-r--r-- 1 adm mis 394736 Sep 21 23:55 0921.xz -rw-r--r-- 1 adm mis 429668 Sep 22 23:55 0922.xz -rw-r--r-- 1 adm mis 4629315 Sep 23 23:55 0923
To create the summaries and do the cleanup, I have three scripts that run under their own directory:
me% cd /etc/cron.perflog me% ls -lF -rwxr-xr-x 1 root mis 1506 Mar 12 2011 100.perf-reduce* -rwxr-xr-x 1 root mis 1462 Mar 12 2011 110.perf-clean* -rwxr-xr-x 1 root mis 1161 Mar 12 2011 120.perf-compress* me% grep '#<' * | cut -f2 -d'<' 100.perf-reduce: merges separate perflog files to save space. 110.perf-clean: removes old perflog directory if "reduce" worked. 120.perf-compress: runs xz on yesterday's logfile.
If you want to keep an eye on who (or what) uses the most space over time, you can put this script under /etc/cron.daily.
This writes du summary output to a file named after the current date.
#!/bin/ksh
#
# $Revision: 1.4 $ $Date: 2010-11-09 15:11:35-05 $
# $UUID: 31a46065-7dd1-3f41-b27c-bc96ce22c12d $
#
#<dirsize: see how big each top-level group directory is.
# usage: dirsize [etc-file [output-file]]
export PATH=/usr/local/bin:/bin:/usr/bin
export BLOCKSIZE=1m
export BLOCK_SIZE=1048576 # BLOCK* sets du output to Mbytes.
umask 022
tag=$(basename $0)
host=$(hostname | cut -f1 -d.)
out='/var/adm/sa/du'
# Format output in consistent-width columns.
# Argument is the number of columns you want.
layout () {
case "$#" in
0) k=1 ;;
*) k=$1 ;;
esac
case "$k" in
[1-9]) ;;
*) echo 'layout botch'; exit 1 ;;
esac
awk '{printf "%6s %s\n", $1, $2}' | pr -o1 -w88 -${k}t | expand
}
say () {
echo; echo "$(date '+%Y-%m-%d %T'): $*"; echo
}
warn () {
echo "WARN: $(date '+%Y-%m-%d %T'): $*"
}
logmsg () {
logger -t $tag -p local1.info "$@"
}
die () {
logmsg "FATAL: $*"; exit 1
}
# Check the input settings file. Set an optional output file.
ofile=
case "$#" in
0) ifile="/usr/local/etc/$tag" ;;
1) ifile="$1" ;;
2) ifile="$1"; ofile="$2" ;;
esac
test -f "$ifile" || die "$ifile not found"
# Figure out the date.
logmsg start
set X $(date "+%Y %m%d"); shift
yr=$1
mday=$2
# Set up the output file.
test -d "$out/$yr" || mkdir -p $out/$yr
test -d "$out/$yr" || die "unable to mkdir $out/$yr"
# Redirect all stdout and stderr output.
case "$ofile" in
"") ofile="$out/$yr/$mday" ;;
*) ;;
esac
exec > $ofile
exec 2>&1
# Real work starts here. Read the directories, columns, etc.
grep '^[1-9]' $ifile | while read depth columns dir
do
if test -d "$dir"
then
say Directory $dir
else
warn "$dir: not a directory"
continue
fi
# NOTE: after awk, we could put "sort -nr" or "cat" depending
# on whether you wanted output sorted by directory size.
# Ignore anything under 10 Mb.
(
cd $dir
find . -mindepth $depth -maxdepth $depth -type d -print |
sort | tr '\012' '\000' | xargs -0 du -s |
awk '{ if ($1 > 9) print }' |
layout $columns |
sed -e 's! ./! !g'
)
done
say done
logmsg done
exit 0
Some sample output from 9/24/2011:
2011-09-24 04:27:56: Directory /fs1b/server5/2008
** 935 0104 495 0325 381 0530 407 0807 296 1015
897 0107 223 0328 230 0602 228 0813 260 1021
502 0110 441 0331 544 0605 789 0819 646 1024
435 0116 480 0403 282 0611 197 0822 387 1027
790 0122 440 0409 245 0617 276 0825 446 1030
561 0125 138 0415 231 0620 286 0828 177 1105
599 0128 277 0418 204 0623 308 0903 263 1114
425 0131 246 0421 131 0626 2602 0909 864 1117
409 0206 660 0424 461 0702 164 0912 576 1120
396 0212 283 0430 264 0708 322 0915 352 1126
556 0215 938 0506 513 0711 435 0918 596 1202
620 0221 358 0509 713 0714 37 0921 746 1205
574 0227 554 0512 338 0717 132 0924 431 1208
503 0304 688 0515 326 0723 625 0930 745 1211
204 0307 11 0518 252 0729 440 1003 288 1217
591 0310 355 0521 376 0801 544 1006 118 1223
307 0313 512 0527 284 0804 1069 1009 126 1229
435 0319
2011-09-24 04:32:28: Directory /fs1b/server5/2009
594 0707 174 0812 121 0921 627 1027 113 1202
536 0715 194 0820 203 0925 186 1104 112 1210
22 0719 345 0824 280 0929 275 1112 315 1214
227 0723 267 0828 252 1007 174 1116 104 1218
311 0727 672 0901 494 1015 322 1120 76 1222
469 0731 136 0909 188 1019 283 1124 38 1230
427 0804 303 0917 840 1023 **
For example, the directory holding backups for server5 on 1/4/2008 takes up 935 Mb. The directory holding backups for server5 on 10/23/2009 takes up 840 Mb.
"dirsize" reads directory and layout information from the file /usr/local/etc/dirsize:
# $Revision: 1.1 $ $Date: 2011-08-09 18:52:30-04 $ # $UUID: d9f49e1d-111c-35dd-9265-8d81644455c8 $ # # Expand this list into additional directories to check. # Field 1: min/max depth of directories to traverse # Field 2: number of columns to print # Field 3: starting directory # # EXAMPLE: # "2 3 /usr" runs "cd /usr; find . -mindepth 2 -maxdepth 2 -type d" # and prints 3-column output. 1 5 /fs1b/server5/2008 1 5 /fs1b/server5/2009
The lines for "server5" tell the script to descend one level into the given directory and print the results in 5 columns.
If I were doing this over again, I'd divvy up the work a bit differently. Instead of writing the report in one script, I'd store raw du output without any formatting in one directory, and have separate scripts to read that and write something suitable for a webpage display or database import.
Feel free to send comments.
Generated from disk-space.t2t by
txt2tags
$Revision: 1.8 $