Thinking Out Loud

April 24, 2019

Solving DBFS UnMounting Issue

Filed under: DBFS,GoldenGate,RAC — mdinh @ 11:34 pm

Often, I am quite baffle with Oracle’s implementations and documentations.

RAC GoldenGate DBFS implementation has been a nightmare and here is one example DBFS Nightmare

I am about to show you another.

In general, I find any implementation using ACTION_SCRIPT is good in theory, bad in practice, but I digress.

Getting ready to shutdown CRS for system patching to find out CRS failed to shutdown.

# crsctl stop crs
CRS-2675: Stop of 'dbfs_mount' on 'host02' failed
CRS-2675: Stop of 'dbfs_mount' on 'host02' failed
CRS-2673: Attempting to stop 'dbfs_mount' on 'host02'
CRS-2675: Stop of 'dbfs_mount' on 'host02' failed
CRS-2673: Attempting to stop 'dbfs_mount' on 'host02'
CRS-2675: Stop of 'dbfs_mount' on 'host02' failed
CRS-2799: Failed to shut down resource 'dbfs_mount' on 'host02'
CRS-2799: Failed to shut down resource 'ora.GG_PROD.dg' on 'host02'
CRS-2799: Failed to shut down resource 'ora.asm' on 'host02'
CRS-2799: Failed to shut down resource 'ora.dbfs.db' on 'host02'
CRS-2799: Failed to shut down resource 'ora.host02.ASM2.asm' on 'host02'
CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'host02' has failed
CRS-2675: Stop of 'ora.crsd' on 'host02' failed
CRS-2799: Failed to shut down resource 'ora.crsd' on 'host02'
CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'host02' has failed
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.

Check /var/log/messages to find errors and there’s no clue for resolution.

# grep -i dbfs /var/log/messages
Apr 17 19:42:26 host02 DBFS_/ggdata: unmounting DBFS from /ggdata
Apr 17 19:42:26 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata'
Apr 17 19:42:26 host02 DBFS_/ggdata: Stop - stopped, but still mounted, error

Apr 17 20:45:59 host02 DBFS_/ggdata: mount-dbfs.sh mounting DBFS at /ggdata from database DBFS
Apr 17 20:45:59 host02 DBFS_/ggdata: /ggdata already mounted, use mount-dbfs.sh stop before attempting to start

Apr 17 21:01:29 host02 DBFS_/ggdata: unmounting DBFS from /ggdata
Apr 17 21:01:29 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata'
Apr 17 21:01:29 host02 DBFS_/ggdata: Stop - stopped, but still mounted, error
Apr 17 21:01:36 host02 dbfs_client[71957]: OCI_ERROR 3114 - ORA-03114: not connected to ORACLE
Apr 17 21:01:41 host02 dbfs_client[71957]: /FS1/dirdat/ih000247982 Block error RC:-5

Apr 17 21:03:06 host02 DBFS_/ggdata: unmounting DBFS from /ggdata
Apr 17 21:03:06 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata'
Apr 17 21:03:06 host02 DBFS_/ggdata: Stop - stopped, now not mounted
Apr 17 21:09:19 host02 DBFS_/ggdata: filesystem /ggdata not currently mounted, no need to stop

Apr 17 22:06:16 host02 DBFS_/ggdata: mount-dbfs.sh mounting DBFS at /ggdata from database DBFS
Apr 17 22:06:17 host02 DBFS_/ggdata: ORACLE_SID is DBFS2
Apr 17 22:06:17 host02 DBFS_/ggdata: doing mount /ggdata using SID DBFS2 with wallet now
Apr 17 22:06:18 host02 DBFS_/ggdata: Start -- ONLINE

The messages in log of script agent show below error (MOS documentation).
Anyone know where script agent is located at?

2019-04-17 20:56:02.793903 :    AGFW:3274315520: {1:53477:37077} Agent received the message: AGENT_HB[Engine] ID 12293:16017523
2019-04-17 20:56:19.124667 :CLSDYNAM:3276416768: [dbfs_mount]{1:53477:37077} [check] Executing action script: /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh[check]
2019-04-17 20:56:19.176927 :CLSDYNAM:3276416768: [dbfs_mount]{1:53477:37077} [check] Checking status now
2019-04-17 20:56:19.176973 :CLSDYNAM:3276416768: [dbfs_mount]{1:53477:37077} [check] Check -- ONLINE
2019-04-17 20:56:32.794287 :    AGFW:3274315520: {1:53477:37077} Agent received the message: AGENT_HB[Engine] ID 12293:16017529
2019-04-17 20:56:43.312534 :    AGFW:3274315520: {2:37893:29307} Agent received the message: RESOURCE_STOP[dbfs_mount host02 1] ID 4099:16017535
2019-04-17 20:56:43.312574 :    AGFW:3274315520: {2:37893:29307} Preparing STOP command for: dbfs_mount host02 1
2019-04-17 20:56:43.312584 :    AGFW:3274315520: {2:37893:29307} dbfs_mount host02 1 state changed from: ONLINE to: STOPPING
2019-04-17 20:56:43.313088 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] Executing action script: /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh[stop]
2019-04-17 20:56:43.365201 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] unmounting DBFS from /ggdata
2019-04-17 20:56:43.415516 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] umounting the filesystem using '/bin/fusermount -u /ggdata'
2019-04-17 20:56:43.415541 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] /bin/fusermount: failed to unmount /ggdata: Device or resource busy
2019-04-17 20:56:43.415552 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] Stop - stopped, but still mounted, error
2019-04-17 20:56:43.415611 :    AGFW:3276416768: {2:37893:29307} Command: stop for resource: dbfs_mount host02 1 completed with status: FAIL
2019-04-17 20:56:43.415929 :CLSFRAME:3449863744:  TM [MultiThread] is changing desired thread # to 3. Current # is 2
2019-04-17 20:56:43.415970 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [check] Executing action script: /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh[check]
2019-04-17 20:56:43.416033 :    AGFW:3274315520: {2:37893:29307} Agent sending reply for: RESOURCE_STOP[dbfs_mount host02 1] ID 4099:16017535
2019-04-17 20:56:43.467939 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [check] Checking status now
2019-04-17 20:56:43.467964 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [check] Check -- ONLINE

ACTION_SCRIPT can be found using crsctl as shown if you have not found where log of script agent is at.

oracle@host02 ~ $ $GRID_HOME/bin/crsctl stat res -w "TYPE = local_resource" -p | grep mount-dbfs.sh
ACTION_SCRIPT=/u02/app/12.1.0/grid/crs/script/mount-dbfs.sh
ACTION_SCRIPT=/u02/app/12.1.0/grid/crs/script/mount-dbfs.sh

Here’s a test case to resolve “failed to unmount /ggdata: Device or resource busy”.

First thought was to use fuser and kill the process.

# fuser -vmM /ggdata/
                     USER        PID ACCESS COMMAND
/ggdata:             root     kernel mount /ggdata
                     mdinh     85368 ..c.. bash
                     mdinh     86702 ..c.. vim
# 

Second thought, might not be a good idea and better idea is to let the script handle this if it can.
Let’s see what options are available for mount-dbfs.sh:

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh -h
Usage: mount-dbfs.sh { start | stop | check | status | restart | clean | abort | version }

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh version
20160215
oracle@host02 ~ 

Stop DBFS failed as expected.

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status
Checking status now
Check -- ONLINE

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh stop
unmounting DBFS from /ggdata
umounting the filesystem using '/bin/fusermount -u /ggdata'
/bin/fusermount: failed to unmount /ggdata: Device or resource busy
Stop - stopped, but still mounted, error
oracle@host02 ~ $

Stop DBFS using clean option. Notice PID killed is 40047 and not the same as (mdinh 86702 ..c.. vim)
Note: not all output displayed for brevity.

oracle@host02 ~ $ /bin/bash -x /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh clean
+ msg='cleaning up DBFS nicely using (fusermount -u|umount)'
+ '[' info = info ']'
+ /bin/echo cleaning up DBFS nicely using '(fusermount' '-u|umount)'
cleaning up DBFS nicely using (fusermount -u|umount)
+ /bin/logger -t DBFS_/ggdata -p user.info 'cleaning up DBFS nicely using (fusermount -u|umount)'
+ '[' 1 -eq 1 ']'
+ /bin/fusermount -u /ggdata
/bin/fusermount: failed to unmount /ggdata: Device or resource busy
+ /bin/sleep 1
+ FORCE_CLEANUP=0
+ '[' 0 -gt 1 ']'
+ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status
+ '[' 0 -eq 0 ']'
+ FORCE_CLEANUP=1
+ '[' 1 -eq 1 ']'
+ logit error 'tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill'
+ type=error
+ msg='tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill'
+ '[' error = info ']'
+ '[' error = error ']'
+ /bin/echo tried '(fusermount' '-u|umount),' still mounted, now cleaning with '(fusermount' -u '-z|umount' '-f)' and kill
tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill
+ /bin/logger -t DBFS_/ggdata -p user.error 'tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill'
+ '[' 1 -eq 1 ']'
================================================================================
+ /bin/fusermount -u -z /ggdata
================================================================================
+ '[' 1 -eq 1 ']'
++ /bin/ps -ef
++ /bin/grep -w /ggdata
++ /bin/grep dbfs_client
++ /bin/grep -v grep
++ /bin/awk '{print $2}'
+ PIDS=40047
+ '[' -n 40047 ']'
================================================================================
+ /bin/kill -9 40047
================================================================================
++ /bin/ps -ef
++ /bin/grep -w /ggdata
++ /bin/grep mount.dbfs
++ /bin/grep -v grep
++ /bin/awk '{print $2}'
+ PIDS=
+ '[' -n '' ']'
+ exit 1

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status
Checking status now
Check -- OFFLINE

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status
Checking status now
Check -- OFFLINE

oracle@host02 ~ $ df -h|grep /ggdata
/dev/asm/acfs_vol-177                                  299G  2.8G  297G   1% /ggdata1
dbfs-@DBFS:/                                            60G  1.4G   59G   3% /ggdata

oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status
Checking status now
Check -- ONLINE
oracle@host02 ~ $ 

What PID was killed when running mount-dbfs.sh clean?
It’s for dbfs_client.

mdinh@host02 ~ $ ps -ef|grep dbfs
oracle   34865     1  0 Mar29 ?        00:02:43 oracle+ASM1_asmb_dbfs1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   40047     1  0 Apr22 ?        00:00:10 /u01/app/oracle/product/12.1.0/db_1/bin/dbfs_client /@DBFS -o allow_other,direct_io,wallet /ggdata
oracle   40081     1  0 Apr22 ?        00:00:27 oracle+ASM1_user40069_dbfs1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
mdinh    88748 87565  0 13:30 pts/1    00:00:00 grep --color=auto dbfs
mdinh@host02 ~ $ 

It would have been so much better for mount-dbfs.sh to provide the info as part of the kill versus having user go debug script and trace process.

If you have read this far, then it’s only fair to provide log of script agent.

$ grep mount-dbfs.sh $ORACLE_BASE/diag/crs/$(hostname -s)/crs/trace/crsd_scriptagent_oracle.trc | grep "2019-04-17 20:5"

May 5, 2018

DBFS Nightmare

Filed under: 12c,DBFS,GoldenGate — mdinh @ 4:19 pm
====================================================================================================
How to cleanup DBFS tablespace after removing files at DBFS filesystem (Doc ID 2331565.1)	
====================================================================================================
High level overview of the DBMS_DBFS_SFS.REORGANIZEFS procedure:
----------------------------------------------------------------------------------------------------
1) Create a NEW tablespace
2) Create a temporary filesystem with dbms_dbfs_sfs.createFilesystem in the new tablespace.
3) Run dbms_dbfs_sfs.reorganizeFS -->>
   EXEC DBMS_DBFS_SFS.REORGANIZEFS(SRCSTORE=>'FS_FS1', DSTSTORE=>'FS_TMP_FS');
4) The dbfs data is now in the smaller NEW tablespace.
5) Drop the temporay filesystem with dbms_dbfs_sfs.dropFilesystem
6) The OLD original tablespace is empty now.
----------------------------------------------------------------------------------------------------
To reuse the same tablespace again please follow the below steps
----------------------------------------------------------------------------------------------------
7) Create a temporary filesystem with dbms_dbfs_sfs.createFilesystem in the OLD ORIGINAL tablespace.
8) Run dbms_dbfs_sfs.reorganizeFS
9) The dbfs data is now in the smaller the ORIGINAL tablespace.
10) Drop the temporay filesystem with dbms_dbfs_sfs.dropFilesystem
11) The NEW small tablespace is empty now and can be dropped.

====================================================================================================
DBFS tablespace keep growing not using expired blocks (Doc ID 2327299.1)	
====================================================================================================
----------------------------------------------------------------------------------------------------
First perform the DBFS re-org to cleanup the tablespace after removing the files at file system level by following below document
----------------------------------------------------------------------------------------------------
How to cleanup DBFS tablespace after removing files at DBFS filesystem (Doc ID 2331565.1)

----------------------------------------------------------------------------------------------------
After the re-org, set the  below parameter to reuse the expired blocks
----------------------------------------------------------------------------------------------------
ALTER SYSTEM SET "_ENABLE_SPACE_PREALLOCATION" = 0;

====================================================================================================
ORA-1654 - DBFS FREE SPACE NOT RECLAIMED AFTER CLEARING FILES (Doc ID 1948305.1)	
====================================================================================================
----------------------------------------------------------------------------------------------------
a) Check Lob retention setting:
----------------------------------------------------------------------------------------------------
  ex. SQL> select 
  table_name,retention_type,retention_value,retention,securefile from dba_lobs 
  where table_name = 'T_FS1';

----------------------------------------------------------------------------------------------------  
b) Change LOB retention to none.
----------------------------------------------------------------------------------------------------
  ex. SQL> alter table DBFS_USER.T_FS1 modify lob (FILEDATA) (retention none);

After countless discussions, team found GOLD.

12.1 Shrinking and Reorganizing DBFS Filesystems

STEPS:

sqlplus /as sysdba
create bigfile tablespace DBFS_NEW datafile size 8G autoextend on next 1G maxsize 70G;
grant dba to dbfs_user;
alter user dbfs_user default role all;
exit

+++ REORGANIZE DBFS FILESYSTEM FS1 IN TABLESPACE DBFS_TS INTO A NEW TABLESPACE DBFS_NEW, 
+++ USING A TEMPORARY FILESYSTEM NAMED TMP_FS, WHERE ALL FILESYSTEMS BELONG TO DATABASE USER DBFS_USER
cd $ORACLE_HOME/rdbms/admin
sqlplus dbfs_user
@dbfs_create_filesystem DBFS_NEW TMP_FS
exec dbms_dbfs_sfs.reorganizefs('FS1','TMP_FS');
@dbfs_drop_filesystem TMP_FS
purge user_recyclebin;
exit

sqlplus / aa sysdba 
revoke dba from dbfs_user;
select count(*) from dba_extents where tablespace_name='DBFS_TS'; 
select count(*) from dba_extents where tablespace_name='DBFS_NEW'; 
-- DROP TABLESPACE HAVING ZERO EXTENTS
-- BE CAREFUL IN CASE USER AND DBFS ARE USING SAME TABLESPACE
Example:
create user dbfs_user identified by **** default tablespace dbfs_ts quota unlimited on dbfs_ts;
@$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem.sql dbfs_ts FS1

TRUE CONFESSION:

I created an imperfect plan at first that might have saved me from a major catastrophe.

Create DBFS_TMP in a separate Disk Group since I was concern existing may not have enough storage.
Reorg from DBFS_TS to DBFS_TMP (bad idea to name tablespace as DBFS_TMP as LOB Segments migrated here).
Reorg from DBFS_TMP back to DBFS_TS.
Drop tablespace DBFS_TMP.

There’s 29G difference between old and new. How much if this space will be reclaimable due to HWM in data file?

Performing reorg twice is really not an option in prod.

RESULTS:

SEGMENT_NAME                   TABLESPACE_NAME                        MB    EXTENTS
------------------------------ ------------------------------ ---------- ----------
LOB_SFS$_FST_1                 DBFS_TS                        30647.1875       1851
LOB_SFS$_FST_6225924           DBFS_NEW                         1025.125        130

WARNINGS:

This has only been tested in a vacuum (non-prod env w very low activities) – YMMV.

Create a free website or blog at WordPress.com.