Thinking Out Loud

October 7, 2016

Troubleshooting Goldengate Memory Usage

Filed under: GoldenGate — mdinh @ 4:35 pm

I was paged for GoldenGate processes memory RSS 4507248 KBytes

Here are a few ways to check memory usage.

Initially, I used ps aux –sort -rss|head -5 and Goldengate process was at the top.

After extract was restarted, I used ps ax -o rss,user,command | grep [e]xtract to determine memory usage for Goldengate.

Before: 4,763,432 
After:     31,528  

++++++++++++++++++++++++++++++

$ ps aux --sort -rss|head -5
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ggsuser  59102 13.2  0.8 4763432 4461212 ?     Ss   Jun19 20905:40 $GG_HOME/extract PARAMFILE $GG_HOME/e_test.prm REPORTFILE $GG_HOME/dirrpt/E_TEST.rpt PROCESSID E_TEST USESUBDIRS

++++++++++++++++++++++++++++++

$ ps ax -o rss,user,command | grep [e]xtract
31528 ggsuser  $GG_HOME/extract PARAMFILE $GG_HOME/dirprm/e_test.prm REPORTFILE $GG_HOME/dirrpt/E_TEST.rpt PROCESSID E_TEST USESUBDIRS

 

Attempting to stop extract during long running transactions resulted in WARNINGS and SEND EXTRACT E_TEST, FORCESTOP

I don’t really like using FORCESTOP or KILL since it will result in recovery not knowing how long this will be.

 

$ ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 10.4.0.19 Build 002
Linux, x64, 64bit (optimized), Oracle 10 on Sep 17 2009 23:52:18

Copyright (C) 1995, 2009, Oracle and/or its affiliates.  All rights reserved.



GGSCI (localhost) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                          
EXTRACT     RUNNING     E_TEST      00:00:00      00:00:04   

GGSCI (localhost) 2> lag e*

Sending GETLAG request to EXTRACT E_TEST ...
Last record lag: 0 seconds.
At EOF, no more records to process.


GGSCI (localhost) 3> send e* status

Sending STATUS request to EXTRACT E_TEST ...


  EXTRACT E_TEST (PID 59102)
  Current status: Recovery complete: At EOF
  Sequence #: 142474
  RBA: 1291502592
  Timestamp: 2016-10-06 15:47:38.000000

  Output trail #1
  Current write position:
  Sequence #: 27484
  RBA: 20092406
  Timestamp: 2016-10-06 15:47:29.717308
  Extract Trail: ./dirdat/aa



GGSCI (localhost) 4> stop e*

Sending STOP request to EXTRACT E_TEST ...

There are open, long-running transactions. 
Before you stop Extract, make the archives containing data for those transactions available for when Extract restarts. 
To force Extract to stop, use the SEND EXTRACT E_TEST, FORCESTOP command.

Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 142472, RBA 542064656.


GGSCI (localhost) 5> send extract e*, showtrans duration 10 MIN

Sending showtrans request to EXTRACT E_TEST ...


Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 142472, RBA 542064656

------------------------------------------------------------
XID:                  57.29.20520692
Items:                122554  
Extract:              E_TEST   
Redo Thread:          1     
Start Time:           2016-10-06:15:32:26 
SCN:                  143.1016911815 (615197235143)    
Redo Seq:             142472
Redo RBA:             542064656          
Status:               Running            


------------------------------------------------------------
XID:                  64.35.9949688
Items:                131751  
Extract:              E_TEST   
Redo Thread:          1     
Start Time:           2016-10-06:15:33:29 
SCN:                  143.1017267252 (615197590580)    
Redo Seq:             142472
Redo RBA:             1450478608         
Status:               Running            


GGSCI (localhost) 6> send extract e*, showtrans duration 20 MIN

Sending showtrans request to EXTRACT E_TEST ...
No transactions found

Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 142472, RBA 542064656.


GGSCI (localhost) 7> send extract e*, showtrans duration 15 MIN

Sending showtrans request to EXTRACT E_TEST ...


Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 142472, RBA 542064656

------------------------------------------------------------
XID:                  57.29.20520692
Items:                123453  
Extract:              E_TEST   
Redo Thread:          1     
Start Time:           2016-10-06:15:32:26 
SCN:                  143.1016911815 (615197235143)    
Redo Seq:             142472
Redo RBA:             542064656          
Status:               Running            


------------------------------------------------------------
XID:                  64.35.9949688
Items:                133438  
Extract:              E_TEST   
Redo Thread:          1     
Start Time:           2016-10-06:15:33:29 
SCN:                  143.1017267252 (615197590580)    
Redo Seq:             142472
Redo RBA:             1450478608         
Status:               Running            


GGSCI (localhost) 8> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                          
EXTRACT     RUNNING     E_TEST      00:00:00      00:00:06   

GGSCI (localhost) 9>

 

Wait a little bit and try to stop extract again and this time succeeded.

 

GGSCI (localhost) 9> send extract e*, showtrans duration 15 MIN

Sending showtrans request to EXTRACT E_TEST ...
No transactions found

Oldest redo log file necessary to restart Extract is:

Redo Log Sequence Number 142472, RBA 542064656.

-- In hindsight, I should have "send extract e*, showtrans duration 1 MIN" before stopping extract !!!
GGSCI (localhost) 10> stop e*

Sending STOP request to EXTRACT E_TEST ...
Request processed.


GGSCI (localhost) 11> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     STOPPED     E_TEST      00:00:00      00:00:03    

GGSCI (localhost) 12> exit

++++++++++++++++++++++++++++++

Top RSS memory consumer is from Agent.

$ ps aux --sort -rss|head -5
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
oracle   89345  0.5  0.0 1055812 441812 ?      Sl   Jun10 854:16 /u01/app/oracle/product/12.1/cc_agent/core/12.1.0.3.0/jdk/bin/java 

++++++++++++++++++++++++++++++

GGSCI (localhost) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     STOPPED     E_TEST      00:00:00      00:00:28    

GGSCI (localhost) 2> start e*

Sending START request to MANAGER ...
EXTRACT E_TEST starting


GGSCI (localhost) 3> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     E_TEST      00:00:35      00:00:01    

GGSCI (localhost) 4> lag e*

Sending GETLAG request to EXTRACT E_TEST ...
No records yet processed.


GGSCI (localhost) 5> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     E_TEST      00:00:45      00:00:01    

GGSCI (localhost) 6> send e* status

Sending STATUS request to EXTRACT E_TEST ...

-- Note Sequence 142475 is much better than 142472. Also, how large is REDO?

  EXTRACT E_TEST (PID 82213)
  Current status: In recovery[1]: Reading from data source
  Sequence #: 142475
  RBA: 594610192
  Timestamp: 2016-10-06 15:53:00.000000

  Output trail #1
  Current write position:
  Sequence #: 27485
  RBA: 1033
  Timestamp: 2016-10-06 15:55:22.043495
  Extract Trail: ./dirdat/aa

GGSCI (localhost) 7> exit

++++++++++++++++++++++++++++++

$ ps ax -o rss,user,command | grep [e]xtract
31220 ggsuser  $GG_HOME/extract PARAMFILE $GG_HOME/dirprm/e_test.prm REPORTFILE $GG_HOME/dirrpt/E_TEST.rpt PROCESSID E_TEST USESUBDIRS

++++++++++++++++++++++++++++++

GGSCI (localhost) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     E_TEST      00:00:00      00:00:03    

GGSCI (localhost) 2> send e* status

Sending STATUS request to EXTRACT E_TEST ...


  EXTRACT E_TEST (PID 82213)
  Current status: Recovery complete: At EOF
  Sequence #: 142475
  RBA: 3804984832
  Timestamp: 2016-10-06 16:00:44.000000

  Output trail #1
  Current write position:
  Sequence #: 27485
  RBA: 163836918
  Timestamp: 2016-10-06 16:00:42.310056
  Extract Trail: ./dirdat/aa

GGSCI (localhost) 3> lag e*

Sending GETLAG request to EXTRACT E_TEST ...
Last record lag: 0 seconds.
At EOF, no more records to process.


GGSCI (localhost) 4> exit

++++++++++++++++++++++++++++++

$ ps ax -o rss,user,command | grep [e]xtract 31528 ggsuser $GG_HOME/extract PARAMFILE $GG_HOME/dirprm/e_test.prm REPORTFILE $GG_HOME/dirrpt/E_TEST.rpt PROCESSID E_TEST USESUBDIRS
Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: