Thinking Out Loud

December 18, 2012

Shell script – checking for NULL & RMAN backup

Filed under: 11g,RMAN,shell scripting,solaris — mdinh @ 2:13 pm

The existing backup strategy for archivelog is Disk-to-Disk-to-Tape (D2D2T), using RMAN for disk and NetBackup for tape.

Delete existing backup after successfully back up to tape.

Don’t know enough about NetBackup or whether it can do the deletion, and that’s a story for another day.

Here is the scenario: the file exist but contains NULL data, will the RMAN backup be deleted?

File exists:

> ll /home/oracle/check_backup/check_bkup.out
-rw-r--r--   1 mdinh    dba            0 Dec 15 13:15 /home/oracle/check_backup/check_bkup.out

NULL data: the file is  ZERO byte.

Script:

> cat t.sh
#!/bin/sh

DN=`dirname $0`
BN=`basename $0`
SID=${1:?"---> USAGE: $DN/$BN <ORACLE_SID>"}

date
FILE=/home/oracle/check_backup/check_bkup.out
LAST=`head /home/oracle/check_backup/check_bkup.out |grep Archive|head -1|awk '{print $1}'`
# Check for ora_smon process
if [ `pgrep -lf ora_smon_$1|awk -F_ '{print $3}'` != $1 ]; then
exit 1
elif [ `pgrep -lf "$BN $1"|wc -l` -gt 2 ]; then
exit 1
elif [ ! -f $FILE ]; then
mailx -s "Missing $FILE" maile@me.com < /dev/null
exit 1
fi

echo ---------- NO NULL check ----------
echo "delete noprompt backup completed before \"to_date('${LAST}','mm/dd/yyyy')\";"

# if $LAST IS NOT NULL
if [ ! -z "$LAST" ]; then
echo "delete noprompt backup completed before \"to_date('${LAST}','mm/dd/yyyy')\";"
fi
exit

Execution:

> ./t.sh db01
Tue Dec 18 05:31:20 PST 2012
---------- NO NULL check ----------
delete noprompt backup completed before "to_date('','mm/dd/yyyy')";

Curiosity: What will happen when delete noprompt backup is executed? Any guess?

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

RMAN> list backup;
using target database control file instead of recovery catalog
specification does not match any backup in the repository

RMAN> backup archivelog from sequence 157;

Starting backup at 18-DEC-2012 05:55:20
current log archived
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=102 device type=DISK
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=157 RECID=271 STAMP=802331584
input archived log thread=1 sequence=158 RECID=272 STAMP=802331720
channel ORA_DISK_1: starting piece 1 at 18-DEC-2012 05:55:21
channel ORA_DISK_1: finished piece 1 at 18-DEC-2012 05:55:25
piece handle=/oracle/oradata/flashrecovery/SAN_DBTEST/backupset/2012_12_18/o1_mf_annnn_TAG20121218T055521_8f0xgb07_.bkp tag=TAG20121218T055521 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:04
Finished backup at 18-DEC-2012 05:55:25

RMAN> list backup summary;
List of Backups
===============
Key     TY LV S Device Type Completion Time      #Pieces #Copies Compressed Tag
------- -- -- - ----------- -------------------- ------- ------- ---------- ---
1       B  A  A DISK        18-DEC-2012 05:55:24 1       1       NO         TAG20121218T055521

RMAN> delete noprompt backup completed before "to_date('','mm/dd/yyyy')";
using channel ORA_DISK_1

List of Backup Pieces
BP Key  BS Key  Pc# Cp# Status      Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
1       1       1   1   AVAILABLE   DISK        /oracle/oradata/flashrecovery/SAN_DBTEST/backupset/2012_12_18/o1_mf_annnn_TAG20121218T055521_8f0xgb07_.bkp
deleted backup piece
backup piece handle=/oracle/oradata/flashrecovery/SAN_DBTEST/backupset/2012_12_18/o1_mf_annnn_TAG20121218T055521_8f0xgb07_.bkp RECID=1 STAMP=802331722
Deleted 1 objects

RMAN>

November 15, 2012

SAN Insanity

Filed under: performance,solaris — mdinh @ 5:22 am

There was a migration to a new storage array for the DR site bringing down all the standby databases.

After 3 days of downtime, the DR site was available, media recovery was started and was painfully slow.

At first, I figured it could be slow since so many databases were performing media recovery.

After executing opatch apply for 11.2.3.4 PSU on top of 11.2.0.3, which took 20 minutes, I realized it has to be the I/O subsystem.

I was not able to prove it, but I trusted my gut instinct and nothing from the database has changed.

It seemed pretty obvious.  Of course, the system group denied all this, wanted proof and did not bother to look at the issue.

I googled and found a way to perform simple I/O benchmark.

time sh -c “dd if=/dev/zero of=dd-test-file bs=8k count=1000000 && sync”
1000000+0 records in
1000000+0 records out

real 14m34.625s
user 0m1.555s
sys 0m37.942s

More than 14 minutes to create ~8GB file. The throughput is less that 9MB/s. This time I got the system group’s attention.
Throughout the investigative process, Oracle was getting blamed for the I/O. After several hours, it was determined to be the SAN.

Admittedly, I have become rusty in utilizing system tools and wanted to know how I could have determined this.

I came across the following 2 sites and comment in my search:

http://prefetch.net/blog/index.php/2005/06/26/finding-busy-disks-with-iostat/

http://serverfault.com/questions/26286/what-does-it-mean-if-iostat-shows-100-busy-wait-but-the-cpu-is-completely-idle

IO Stat shouldn’t be 100%. If you have a heavy amount of disk IO, it may be high, above 50%, but exactly 100% usually means something is wrong your IO subsystem. This has happened to me when I’ve had a hard drive in the early stages of failure, when the disk takes longer and longer to respond to requests, but still responds.
Or it could just be a very badly written application. A simple DTrace script should tell you which it is. – Craig Lewis

Here is the output using: iostat -zxnM 5

                    extended device statistics
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.2    0.0    0.0  0.0  0.0    0.0   15.1   0   0 ssd10
    0.0    0.2    0.0    0.0  0.0  0.0    0.0   10.7   0   0 ssd11
   11.4  145.0    0.3    1.1  0.0  8.8    0.0   56.4   0 100 7/md101
   11.4  145.0    0.3    1.1  0.0  8.8    0.0   56.4   0 100 7/md1
    0.6   75.6    0.0    0.6  0.0  2.3    0.0   29.6   0 100 6/md100
    0.6   75.6    0.0    0.6  0.0  2.3    0.0   29.6   0 100 6/md1
    5.0  143.2    0.0    1.1  0.0  5.8    0.0   38.8   0 100 4/md101
    5.0  143.2    0.0    1.1  0.0  5.8    0.0   38.8   0 100 4/md1
    0.0    0.8    0.0    0.0  0.0  0.0    0.0   53.9   0   2 1/md100
   11.4   75.8    0.1    0.6  0.0  5.2    0.0   60.0   0  99 ssd81
   13.0   69.2    0.1    0.5  0.0  3.8    0.0   46.0   0  92 ssd82
    0.0    0.8    0.0    0.0  0.0  0.0    0.0   53.9   0   2 ssd91
    5.0  143.2    0.0    1.1  0.0  5.8    0.0   38.8   0 100 ssd114
    0.0   67.4    0.0    0.5  0.0  2.0    0.0   29.5   0 100 ssd121
    0.6    8.2    0.0    0.1  0.0  0.3    0.0   29.7   0  19 ssd122
                    extended device statistics
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   63.6   75.0    2.2    0.6  0.0  4.4    0.0   31.9   0 100 7/md101
   63.6   75.0    2.2    0.6  0.0  4.4    0.0   31.9   0 100 7/md1
    1.4   97.6    0.0    0.8  0.0  2.2    0.0   21.9   0 100 6/md100
    1.4   97.6    0.0    0.8  0.0  2.2    0.0   21.9   0 100 6/md1
   12.6  169.0    0.5    1.3  0.0  4.0    0.0   22.2   0 100 4/md101
   12.6  169.0    0.5    1.3  0.0  4.0    0.0   22.3   0 100 4/md1
    0.0    1.6    0.0    0.0  0.0  0.1    0.0   49.0   0   5 1/md100
   85.0   34.6    1.1    0.3  0.0  2.9    0.0   24.1   0  82 ssd81
   89.6   40.4    1.1    0.3  0.0  2.8    0.0   21.6   0  85 ssd82
    0.0    1.6    0.0    0.0  0.0  0.1    0.0   49.0   0   5 ssd91
   12.6  169.0    0.5    1.3  0.0  4.0    0.0   22.2   0 100 ssd114
    0.4   67.2    0.0    0.5  0.0  1.2    0.0   18.0   0  83 ssd121
    1.0   30.4    0.0    0.2  0.0  0.9    0.0   30.1   0  70 ssd122

Noticed all the disk with 100% busy? Are there any storage expert out there who can confirm this?

UPDATE:
As a DBA, I am on a need to know basis; hence, not having all the required information.
From my understanding now, the migration was to a new sever connected to the existing storage.
The system admin believes this could be a cabling issue and the migration to new storage is yet to come.
This is on Solaris 10.

For comparison, this is for a healthy system

                    extended device statistics
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.2    0.0    0.0  0.0  0.0    0.0   10.1   0   0 md0
    0.0    0.2    0.0    0.0  0.0  0.0    0.0    9.6   0   0 md4
    0.0    0.2    0.0    0.0  0.0  0.0    0.0    0.3   0   0 md5
    0.0    0.2    0.0    0.0  0.0  0.0    0.0    9.5   0   0 md6
    0.0    0.2    0.0    0.0  0.0  0.0    0.0    0.5   0   0 md7
    0.0    0.2    0.0    0.0  0.0  0.0    0.0   10.0   0   0 md8
    0.0    0.8    0.0    0.0  0.0  0.0    0.0    0.3   0   0 sd1
    0.0    0.8    0.0    0.0  0.0  0.0    0.0    8.2   0   1 sd2
    7.7    0.0    5.6    0.0  0.0  0.1    0.0   10.0   0   8 ssd41
    6.5    0.2    1.6    0.0  0.0  0.0    0.0    4.8   0   3 ssd42
    0.0    0.8    0.0    0.0  0.0  0.0    0.0    0.7   0   0 ssd43
    0.2    0.0    0.0    0.0  0.0  0.0    0.0    1.7   0   0 ssd44
   10.7    0.2    6.4    0.0  0.0  0.1    0.0    7.1   0   8 ssd45
   35.1    0.2    0.3    0.0  0.0  0.2    0.0    4.7   0  17 ssd46
   27.2    0.2    0.2    0.0  0.0  0.1    0.0    2.0   0   5 ssd49
    1.4    0.2    0.0    0.0  0.0  0.0    0.0    3.4   0   1 ssd50
    5.7    1.0    0.0    0.0  0.0  0.0    0.0    2.6   0   2 ssd51
    0.0    0.8    0.0    0.0  0.0  0.0    0.0    0.6   0   0 ssd52
    3.0    6.7    0.8    0.2  0.0  0.0    0.0    2.3   0   2 ssd53
    0.6    1.0    0.0    0.0  0.0  0.0    0.0    0.9   0   0 ssd58
    0.8    1.4    0.0    0.0  0.0  0.0    0.0    1.0   0   0 ssd61
    0.0    1.8    0.0    0.0  0.0  0.0    0.0    0.6   0   0 ssd63
    0.0    0.2    0.0    0.0  0.0  0.0    0.0    0.5   0   0 ssd65
    0.4    0.0    0.0    0.0  0.0  0.0    0.0    3.5   0   0 ssd66
    4.7    0.0    4.6    0.0  0.0  0.0    0.0    7.5   0   3 ssd68
    0.6    0.0    0.6    0.0  0.0  0.0    0.0    7.2   0   0 ssd69
    6.1    0.8    4.4    0.0  0.0  0.0    0.0    4.5   0   3 ssd70
    7.5    0.2    6.2    0.0  0.0  0.0    0.0    6.2   0   5 ssd71
    0.2    0.0    0.0    0.0  0.0  0.0    0.0   10.8   0   0 ssd72
    6.3    0.6    6.2    0.0  0.0  0.0    0.0    6.4   0   4 ssd73
    8.9    0.0    5.8    0.0  0.0  0.1    0.0    8.1   0   7 ssd121
    5.9    0.4    1.9    0.0  0.0  0.0    0.0    4.1   0   3 ssd122
    0.0    1.6    0.0    0.0  0.0  0.0    0.0    0.8   0   0 ssd123
   10.1    0.2    6.3    0.0  0.0  0.1    0.0    7.5   0   8 ssd125
   38.5    0.0    0.5    0.0  0.0  0.2    0.0    5.1   0  20 ssd126

June 6, 2012

pldd – list dynamic libraries linked into each process

Filed under: oracle,solaris — mdinh @ 2:49 am

I am using ldd and pldd on Solaris 10.

I was upgrading the database using dbua and was distracted. Later, I saw the listener process was running from 11.2.0.2

> ps -afe|grep tns
 oracle 26639 6520 0 Jun 04 ? 0:01 /u01/app/oracle/product/11.2.0.2/dbhome_1/bin/tnslsnr listener_db02 -inherit
 oracle 6963 7420 0 12:09:09 pts/3 0:00 grep tns

Strange – Hmm?  What happened and what version is oracle process running?

> ps -afe|grep ora_smon
 oracle 4563 6520 0 10:47:37 ? 0:02 ora_smon_db02
 oracle 7012 7420 0 12:09:19 pts/3 0:00 grep ora_smon
> pldd 4563
4563: ora_smon_db02
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libodmd11.so
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libcell11.so
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxp11.so
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libnnz11.so
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libclsra11.so
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libdbcfg11.so
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libhasgen11.so
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libskgxn2.so
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libocr11.so
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libocrb11.so
/u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libocrutl11.so
/lib/sparcv9/libkstat.so.1
/lib/sparcv9/libnsl.so.1
/lib/sparcv9/libsocket.so.1
/lib/sparcv9/libresolv.so.2
/lib/sparcv9/libgen.so.1
/lib/sparcv9/libdl.so.1
/usr/lib/sparcv9/libsched.so.1
/lib/sparcv9/librt.so.1
/lib/sparcv9/libc.so.1
/lib/sparcv9/libaio.so.1
/usr/lib/sparcv9/libpool.so.1
/lib/sparcv9/libm.so.2
/lib/sparcv9/libmd.so.1
/usr/lib/sparcv9/libxml2.so.2
/lib/sparcv9/libscf.so.1
/lib/sparcv9/libnvpair.so.1
/usr/lib/sparcv9/libexacct.so.1
/lib/sparcv9/libpthread.so.1
/usr/lib/sparcv9/libz.so.1
/lib/sparcv9/libdoor.so.1
/lib/sparcv9/libuutil.so.1
/platform/sun4u-us3/lib/sparcv9/libc_psr.so.1

I have must have not shutdown the listener from the previous version.

That was easy.

 

 

 

May 25, 2012

Extract tar files to current directory

Filed under: solaris — mdinh @ 3:34 am

I have a requirement to extract tar files residing in another directory to current directory, instead of copying tar files from the source to current directory, extracting tar files, and deleting tar files.

Here’s a shell script to do this. Did not think about including change directory in the shell script until now.

#!/bin/sh -x
 pax -rv -f /home/mdinh/fbo_ggs_Solaris_sparc_ora11g_64bit.tar -s ',^/,,'
 pax -rv -f /home/mdinh/ggs_FileWriter_Solaris_sparc_v11_1_1_0_0_007.tar -s ',^/,,'
 exit

The tar files reside on NFS which is mounted on every server I want to extract to.

Why do it this way? I am a lazy DBA.

Note: tested on Solaris 10

Reference:

>>directing solaris tar to extract to current directory

May 24, 2012

ldd – print shared library dependencies

Filed under: solaris — mdinh @ 2:43 am

I am sure most of you may already know about the command and just a quick post to demonstrate how ldd is used.

> ldd ggsci
 libm.so.1 => /lib/sparcv9/libm.so.1
 libnsl.so.1 => /lib/sparcv9/libnsl.so.1
 libsocket.so.1 => /lib/sparcv9/libsocket.so.1
 libdl.so.1 => /lib/sparcv9/libdl.so.1
 librt.so.1 => /lib/sparcv9/librt.so.1
 libumem.so.1 => /lib/sparcv9/libumem.so.1
 libgglog.so => /u01/app/ggs/libgglog.so
 libggrepo.so => /u01/app/ggs/libggrepo.so
 libdb-5.2.so => /u01/app/ggs/libdb-5.2.so
 libicui18n.so.38 => /u01/app/ggs/libicui18n.so.38
 libicuuc.so.38 => /u01/app/ggs/libicuuc.so.38
 libicudata.so.38 => /u01/app/ggs/libicudata.so.38
 libpthread.so.1 => /lib/sparcv9/libpthread.so.1
 libxerces-c.so.28 => /u01/app/ggs/libxerces-c.so.28
 libantlr3c.so => /u01/app/ggs/libantlr3c.so
 libclntsh.so.11.1 => /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libclntsh.so.11.1
 libnnz11.so => /u01/app/oracle/product/11.2.0.3/dbhome_1/lib/libnnz11.so
 libkstat.so.1 => /lib/sparcv9/libkstat.so.1
 libgen.so.1 => /lib/sparcv9/libgen.so.1
 libsched.so.1 => /usr/lib/sparcv9/libsched.so.1
 libCstd.so.1 => /usr/lib/sparcv9/libCstd.so.1
 libCrun.so.1 => /usr/lib/sparcv9/libCrun.so.1
 libthread.so.1 => /lib/sparcv9/libthread.so.1
 libc.so.1 => /lib/sparcv9/libc.so.1
 libmp.so.2 => /lib/64/libmp.so.2
 libmd.so.1 => /lib/64/libmd.so.1
 libscf.so.1 => /lib/64/libscf.so.1
 libaio.so.1 => /lib/64/libaio.so.1
 libresolv.so.2 => /lib/64/libresolv.so.2
 libpool.so.1 => /usr/lib/64/libpool.so.1
 libm.so.2 => /lib/64/libm.so.2
 libdoor.so.1 => /lib/64/libdoor.so.1
 libuutil.so.1 => /lib/64/libuutil.so.1
 libxml2.so.2 => /usr/lib/64/libxml2.so.2
 libnvpair.so.1 => /lib/64/libnvpair.so.1
 libexacct.so.1 => /usr/lib/64/libexacct.so.1
 libz.so.1 => /usr/lib/64/libz.so.1
 /platform/SUNW,Sun-Fire-V490/lib/sparcv9/libc_psr.so.1
 /platform/SUNW,Sun-Fire-V490/lib/sparcv9/libmd_psr.so.1

June 25, 2011

fuser – identify users of files and devices

Filed under: solaris — mdinh @ 4:22 am

I learned about fuser when I had deleted 64GB file from /tmp but the space was never reclaimed.

This shows you alert log is being opened in another session using vi.

> ll
total 28992
drwxr-xr-x  25 root     root        8192 Oct 23  2010 ..
-rw-r--r--   1 root     sys           60 Oct 23  2010 csn.6249
-rw-r--r--   1 root     root          76 Oct 23  2010 iconf_entries.6185
drwxrwxrwt   2 root     root         117 Oct 29  2010 .oracle
drwxrwxrwt   4 root     sys          536 Jun 24 21:05 .
-rw-r-----   1 oracle   dba      14785325 Jun 24 21:05 alert_dbtest.log

> ps -afe|grep alert_dbtest.log
  oracle 11589 16520   0 21:05:58 pts/8       0:00 grep alert_dbtest.log
  oracle 10555 19200   0 21:05:47 pts/6       0:01 vi alert_dbtest.log

I am now going to delete alert log file.

> rm alert_dbtest.log

> ll
total 112
drwxr-xr-x  25 root     root        8192 Oct 23  2010 ..
-rw-r--r--   1 root     sys           60 Oct 23  2010 csn.6249
drwxrwxrwt   2 root     root         117 Oct 29  2010 .oracle
drwxr-xr-x   2 oracle   dba          117 Nov  1  2010 hsperfdata_oracle
drwxrwxrwt   4 root     sys          463 Jun 24 21:06 .

Who’s got the file opened?

> ps -o pid,args -p "$(fuser /tmp 2>/dev/null)"
  PID COMMAND
10555 vi alert_dbtest.log
16520 -bash
19200 -bash

> kill -9 10555

Note:  tested on Solaris 10

Great reference here as well Less known Solaris Features: fuser

Create a free website or blog at WordPress.com.