Thinking Out Loud

April 24, 2019

Solving DBFS UnMounting Issue

Filed under: DBFS,GoldenGate,RAC — mdinh @ 11:34 pm

Often, I am quite baffle with Oracle’s implementations and documentations.

RAC GoldenGate DBFS implementation has been a nightmare and here is one example DBFS Nightmare

I am about to show you another.

In general, I find any implementation using ACTION_SCRIPT is good in theory, bad in practice, but I digress.

Getting ready to shutdown CRS for system patching to find out CRS failed to shutdown.

# crsctl stop crs
CRS-2675: Stop of 'dbfs_mount' on 'host02' failed
CRS-2675: Stop of 'dbfs_mount' on 'host02' failed
CRS-2673: Attempting to stop 'dbfs_mount' on 'host02'
CRS-2675: Stop of 'dbfs_mount' on 'host02' failed
CRS-2673: Attempting to stop 'dbfs_mount' on 'host02'
CRS-2675: Stop of 'dbfs_mount' on 'host02' failed
CRS-2799: Failed to shut down resource 'dbfs_mount' on 'host02'
CRS-2799: Failed to shut down resource 'ora.GG_PROD.dg' on 'host02'
CRS-2799: Failed to shut down resource 'ora.asm' on 'host02'
CRS-2799: Failed to shut down resource 'ora.dbfs.db' on 'host02'
CRS-2799: Failed to shut down resource 'ora.host02.ASM2.asm' on 'host02'
CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'host02' has failed
CRS-2675: Stop of 'ora.crsd' on 'host02' failed
CRS-2799: Failed to shut down resource 'ora.crsd' on 'host02'
CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'host02' has failed
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.

Check /var/log/messages to find errors and there’s no clue for resolution.

# grep -i dbfs /var/log/messages
Apr 17 19:42:26 host02 DBFS_/ggdata: unmounting DBFS from /ggdata
Apr 17 19:42:26 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata'
Apr 17 19:42:26 host02 DBFS_/ggdata: Stop - stopped, but still mounted, error

Apr 17 20:45:59 host02 DBFS_/ggdata: mount-dbfs.sh mounting DBFS at /ggdata from database DBFS
Apr 17 20:45:59 host02 DBFS_/ggdata: /ggdata already mounted, use mount-dbfs.sh stop before attempting to start

Apr 17 21:01:29 host02 DBFS_/ggdata: unmounting DBFS from /ggdata
Apr 17 21:01:29 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata'
Apr 17 21:01:29 host02 DBFS_/ggdata: Stop - stopped, but still mounted, error
Apr 17 21:01:36 host02 dbfs_client[71957]: OCI_ERROR 3114 - ORA-03114: not connected to ORACLE
Apr 17 21:01:41 host02 dbfs_client[71957]: /FS1/dirdat/ih000247982 Block error RC:-5

Apr 17 21:03:06 host02 DBFS_/ggdata: unmounting DBFS from /ggdata
Apr 17 21:03:06 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata'
Apr 17 21:03:06 host02 DBFS_/ggdata: Stop - stopped, now not mounted
Apr 17 21:09:19 host02 DBFS_/ggdata: filesystem /ggdata not currently mounted, no need to stop

Apr 17 22:06:16 host02 DBFS_/ggdata: mount-dbfs.sh mounting DBFS at /ggdata from database DBFS
Apr 17 22:06:17 host02 DBFS_/ggdata: ORACLE_SID is DBFS2
Apr 17 22:06:17 host02 DBFS_/ggdata: doing mount /ggdata using SID DBFS2 with wallet now
Apr 17 22:06:18 host02 DBFS_/ggdata: Start -- ONLINE

The messages in log of script agent show below error (MOS documentation).
Anyone know where script agent is located at?

2019-04-17 20:56:02.793903 :    AGFW:3274315520: {1:53477:37077} Agent received the message: AGENT_HB[Engine] ID 12293:16017523
2019-04-17 20:56:19.124667 :CLSDYNAM:3276416768: [dbfs_mount]{1:53477:37077} [check] Executing action script: /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh[check]
2019-04-17 20:56:19.176927 :CLSDYNAM:3276416768: [dbfs_mount]{1:53477:37077} [check] Checking status now
2019-04-17 20:56:19.176973 :CLSDYNAM:3276416768: [dbfs_mount]{1:53477:37077} [check] Check -- ONLINE
2019-04-17 20:56:32.794287 :    AGFW:3274315520: {1:53477:37077} Agent received the message: AGENT_HB[Engine] ID 12293:16017529
2019-04-17 20:56:43.312534 :    AGFW:3274315520: {2:37893:29307} Agent received the message: RESOURCE_STOP[dbfs_mount host02 1] ID 4099:16017535
2019-04-17 20:56:43.312574 :    AGFW:3274315520: {2:37893:29307} Preparing STOP command for: dbfs_mount host02 1
2019-04-17 20:56:43.312584 :    AGFW:3274315520: {2:37893:29307} dbfs_mount host02 1 state changed from: ONLINE to: STOPPING
2019-04-17 20:56:43.313088 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] Executing action script: /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh[stop]
2019-04-17 20:56:43.365201 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] unmounting DBFS from /ggdata
2019-04-17 20:56:43.415516 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] umounting the filesystem using '/bin/fusermount -u /ggdata'
2019-04-17 20:56:43.415541 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] /bin/fusermount: failed to unmount /ggdata: Device or resource busy
2019-04-17 20:56:43.415552 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] Stop - stopped, but still mounted, error
2019-04-17 20:56:43.415611 :    AGFW:3276416768: {2:37893:29307} Command: stop for resource: dbfs_mount host02 1 completed with status: FAIL
2019-04-17 20:56:43.415929 :CLSFRAME:3449863744:  TM [MultiThread] is changing desired thread # to 3. Current # is 2
2019-04-17 20:56:43.415970 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [check] Executing action script: /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh[check]
2019-04-17 20:56:43.416033 :    AGFW:3274315520: {2:37893:29307} Agent sending reply for: RESOURCE_STOP[dbfs_mount host02 1] ID 4099:16017535
2019-04-17 20:56:43.467939 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [check] Checking status now
2019-04-17 20:56:43.467964 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [check] Check -- ONLINE

ACTION_SCRIPT can be found using crsctl as shown if you have not found where log of script agent is at.

oracle@host02 ~ $ $GRID_HOME/bin/crsctl stat res -w "TYPE = local_resource" -p | grep mount-dbfs.sh
ACTION_SCRIPT=/u02/app/12.1.0/grid/crs/script/mount-dbfs.sh
ACTION_SCRIPT=/u02/app/12.1.0/grid/crs/script/mount-dbfs.sh

Here’s a test case to resolve “failed to unmount /ggdata: Device or resource busy”.

First thought was to use fuser and kill the process.

# fuser -vmM /ggdata/
                     USER        PID ACCESS COMMAND
/ggdata:             root     kernel mount /ggdata
                     mdinh     85368 ..c.. bash
                     mdinh     86702 ..c.. vim
# 

Second thought, might not be a good idea and better idea is to let the script handle this if it can.
Let’s see what options are available for mount-dbfs.sh:

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh -h
Usage: mount-dbfs.sh { start | stop | check | status | restart | clean | abort | version }

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh version
20160215
oracle@host02 ~ 

Stop DBFS failed as expected.

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status
Checking status now
Check -- ONLINE

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh stop
unmounting DBFS from /ggdata
umounting the filesystem using '/bin/fusermount -u /ggdata'
/bin/fusermount: failed to unmount /ggdata: Device or resource busy
Stop - stopped, but still mounted, error
oracle@host02 ~ $

Stop DBFS using clean option. Notice PID killed is 40047 and not the same as (mdinh 86702 ..c.. vim)
Note: not all output displayed for brevity.

oracle@host02 ~ $ /bin/bash -x /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh clean
+ msg='cleaning up DBFS nicely using (fusermount -u|umount)'
+ '[' info = info ']'
+ /bin/echo cleaning up DBFS nicely using '(fusermount' '-u|umount)'
cleaning up DBFS nicely using (fusermount -u|umount)
+ /bin/logger -t DBFS_/ggdata -p user.info 'cleaning up DBFS nicely using (fusermount -u|umount)'
+ '[' 1 -eq 1 ']'
+ /bin/fusermount -u /ggdata
/bin/fusermount: failed to unmount /ggdata: Device or resource busy
+ /bin/sleep 1
+ FORCE_CLEANUP=0
+ '[' 0 -gt 1 ']'
+ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status
+ '[' 0 -eq 0 ']'
+ FORCE_CLEANUP=1
+ '[' 1 -eq 1 ']'
+ logit error 'tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill'
+ type=error
+ msg='tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill'
+ '[' error = info ']'
+ '[' error = error ']'
+ /bin/echo tried '(fusermount' '-u|umount),' still mounted, now cleaning with '(fusermount' -u '-z|umount' '-f)' and kill
tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill
+ /bin/logger -t DBFS_/ggdata -p user.error 'tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill'
+ '[' 1 -eq 1 ']'
================================================================================
+ /bin/fusermount -u -z /ggdata
================================================================================
+ '[' 1 -eq 1 ']'
++ /bin/ps -ef
++ /bin/grep -w /ggdata
++ /bin/grep dbfs_client
++ /bin/grep -v grep
++ /bin/awk '{print $2}'
+ PIDS=40047
+ '[' -n 40047 ']'
================================================================================
+ /bin/kill -9 40047
================================================================================
++ /bin/ps -ef
++ /bin/grep -w /ggdata
++ /bin/grep mount.dbfs
++ /bin/grep -v grep
++ /bin/awk '{print $2}'
+ PIDS=
+ '[' -n '' ']'
+ exit 1

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status
Checking status now
Check -- OFFLINE

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status
Checking status now
Check -- OFFLINE

oracle@host02 ~ $ df -h|grep /ggdata
/dev/asm/acfs_vol-177                                  299G  2.8G  297G   1% /ggdata1
dbfs-@DBFS:/                                            60G  1.4G   59G   3% /ggdata

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status
Checking status now
Check -- ONLINE
oracle@host02 ~ $ 

What PID was killed when running mount-dbfs.sh clean?
It’s for dbfs_client.

mdinh@host02 ~ $ ps -ef|grep dbfs
oracle   34865     1  0 Mar29 ?        00:02:43 oracle+ASM1_asmb_dbfs1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   40047     1  0 Apr22 ?        00:00:10 /u01/app/oracle/product/12.1.0/db_1/bin/dbfs_client /@DBFS -o allow_other,direct_io,wallet /ggdata
oracle   40081     1  0 Apr22 ?        00:00:27 oracle+ASM1_user40069_dbfs1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
mdinh    88748 87565  0 13:30 pts/1    00:00:00 grep --color=auto dbfs
mdinh@host02 ~ $ 

It would have been so much better for mount-dbfs.sh to provide the info as part of the kill versus having user go debug script and trace process.

If you have read this far, then it’s only fair to provide log of script agent.

$ grep mount-dbfs.sh $ORACLE_BASE/diag/crs/$(hostname -s)/crs/trace/crsd_scriptagent_oracle.trc | grep "2019-04-17 20:5"
Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.

%d bloggers like this: