wtorek, 8 marca 2016

Oracle Golden Gate for Big Data - in practice

In this post I will try to show how can we move our activities like inserts, updates or deletes on Oracle Database 12c to Apache HDFS.

All the actions shown in this post, were performed on a single machine.

Oracle Golden Gate 12.1 documentation - link here

If you find any bugs, mistakes or not working code please leave me comment below. I will try to correct it and update as soon as possible.


Software version:

Software required for this tutorial is listed below. I will describe only Oracle Golden Gate installation and configuration process. You can use external links posted in this article to find a way how to install and configure others.



At the beginning

You should be familiar with Oracle Database, Oracle Golden Gate, Apache Hadoop.
If you have questions and you won't find answers in this post, please visit Oracle Golden Gate documentation page for more information.
First of all you should create and grant required privileges to database user. This documentation describe which privileges will you need.
The second very important step is Preparing the Database for Oracle Golden Gate.



Installation

Oracle Golden Gate is available by edelivery.oracle.com
You should download Oracle Golden Gate (core) and Oracle Golden Gate for Big Data.
Remember about versions - I tested software described above.

All steps were performed on the user 'oracle'.

So, lets start:
user@dm:~$ su oracle

Oracle Golden Gate installation

Run installation of Oracle Golden Gate 12.1.2.0.0:
oracle@dm:~$ cd Disk1/
oracle@dm:~$ ./runInstaller

If this message appear:
Some requirement checks failed. You must fulfill these requirements before

continuing with the installation,

Continue? (y/n) [n]
Press 'y' and continue.

Then follow:

Installer should find Oracle Database installation directory, like below:
Manager port is by default 7809.

In next steps follow the installer.
When installation is complete, go to Oracle Golden Gate home directory (in my case: /u01/app/oracle/product/12/oggd/):
oracle@dm:~$ cd /u01/app/oracle/product/12/oggd/

and enter:
oracle@dm:~$ ./ggsci

you should see:
Oracle GoldenGate Command Interpreter for Oracle
Version 12.1.2.0.0 17185003 OGGCORE_12.1.2.0.0_PLATFORMS_130924.1316_FBO
Linux, x64, 64bit (optimized), Oracle 12c on Sep 25 2013 02:33:54
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2013, Oracle and/or its affiliates. All rights reserved.



GGSCI (dm) 1> 
where 'dm' is host name.

At this point we have first step done.
Next we have to install Oracle Golden Gate for Big Data.

Oracle Golden Gate for Big Data installation

In this step we have to unzip package with adapter.
In my case zip file name is 'V100447-01.zip' - unzip it, next untar 'ggs_Adapters_Linux_x64.tar' and copy all files to working directory (in my case /home/oracle/ogg/).

You should be able to run Oracle Golden Gate for Big Data:
oracle@dm:~$ cd /home/oracle/ogg
oracle@dm:~$ ./ggsci

and see:
Oracle GoldenGate Command Interpreter
Version 12.2.0.1.0 OGGCORE_12.2.0.1.0_PLATFORMS_151101.1925.2
Linux, x64, 64bit (optimized), Generic on Nov 10 2015 16:18:12
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.



GGSCI (dm) 1> 

Now installation is finished.


Configuration

Now it's time to configure our Oracle Golden Gate (OGG) and Oracle Golden Gate for Bid Data (OGGBD).

In OGG:

Run Oracle GoldenGate Command Interpreter:
oracle@dm:~$ ./ggsci
GGSCI (dm) 1> edit param mgr

and check or enter:
PORT 7809

then:
GGSCI (dm) 6> start mgr
Manager started.


GGSCI (dm) 7> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
Remember - always check if all yours extract are running by 'info all' command.

You can check manager status:
GGSCI (dm) 1> status mgr

Manager is running (IP port dm.7809, Process ID 11420).

In OGGBD:

Run Oracle GoldenGate Command Interpreter:
oracle@dm:~$ ./ggsci
GGSCI (dm) 1> edit param mgr

and enter:
PORT 7811

then:
GGSCI (dm) 6> start mgr
Manager started.


GGSCI (dm) 7> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           

You can check manager status:
GGSCI (dm) 1> status mgr

Manager is running (IP port dm.7811, Process ID 11528).


Extracting

It's time for main part of this tutorial - configure extracts in OGG and OGGBD.

In OGG:

1. Connect to Oracle Database (you can do this only in OGG)
GGSCI (dm) 1> dblogin userid ogg@db, password ogg
Successfully logged into database.


1. Edit param CUST
GGSCI (dm) 2> edit param cust

extract cust
SETENV (ORACLE_SID=DB)
userid ogg, password ogg
exttrail ./dirdat/se
table ogg.*;

2. Edit param PHDFS - host and port are very important here
GGSCI (dm) 3> edit param phdfs

extract phdfs
USERID ogg, PASSWORD ogg
rmthost 127.0.0.1, mgrport 7811 
rmttrail ./dirdat/tf
table ogg.*;

3. Create cust.def - file required for extract HDFS in OGGBD
GGSCI (dm) 4> edit param defgen

defsfile /u01/app/oracle/product/12/oggd/dirdef/cust.def
USERID ogg, PASSWORD ogg
table ogg.*;

GGSCI (dm) 2> exit

oracle@dm:~$  ./defgen paramfile ./dirprm/defgen.prm

***********************************************************************
        Oracle GoldenGate Table Definition Generator for Oracle
 Version 12.1.2.0.0 17185003 OGGCORE_12.1.2.0.0_PLATFORMS_130924.1316
   Linux, x64, 64bit (optimized), Oracle 12c on Sep 24 2013 16:23:06
 
Copyright (C) 1995, 2013, Oracle and/or its affiliates. All rights reserved.


                    Starting at 2016-03-08 15:02:01
***********************************************************************

Operating System Version:
Linux
Version #36~14.04.1-Ubuntu SMP Fri Feb 26 18:49:23 UTC 2016, Release 4.2.0-30-generic
Node: dm
Machine: x86_64
                         soft limit   hard limit
Address Space Size   :    unlimited    unlimited
Heap Size            :    unlimited    unlimited
File Size            :    unlimited    unlimited
CPU Time             :    unlimited    unlimited

Process id: 12921

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************
defsfile /u01/app/oracle/product/12/oggd/dirdef/cust.def
USERID ogg, PASSWORD ***
table ogg.*;
Expanding wildcard table specification ogg.*:

Retrieving definition for OGG.CUST_INFO.


Definitions generated for 1 table in /u01/app/oracle/product/12/oggd/dirdef/cust.def.

4. Copy file cust.def from directory '/u01/app/oracle/product/12/oggd/dirdef/' to '/home/oracle/ogg/dirdef/'

5. Add extracts CUST and PHDFS
GGSCI (dm) 1> add extract cust, tranlog, begin now
GGSCI (dm) 2> add exttrail ./dirdat/se, extract cust, megabytes 10
GGSCI (dm) 1> add extract phdfs, exttrailsource ./dirdat/se 
GGSCI (dm) 1> add rmttrail ./dirdat/tf, EXTRACT phdfs, megabytes 10

6. Start extracts CUST and PHDFS and check status
GGSCI (dm) 2> start cust

Sending START request to MANAGER ...
EXTRACT CUST starting


GGSCI (dm) 3> start phdfs

Sending START request to MANAGER ...
EXTRACT PHDFS starting


GGSCI (dm) 4> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     CUST        01:54:46      00:00:07    
EXTRACT     RUNNING     PHDFS       00:00:00      01:52:52    

7. If something goes wrong check report:
GGSCI (dm) 6> view report phdfs


In OGGBD:

1. Edit param HDFS - remember about file cust.def from previous step
GGSCI (dm) 1> edit param hdfs

extract hdfs
discardfile ./dirrpt/hdfs.dsc, purge
sourcedefs ./dirdef/cust.def
SETENV (GGS_JAVAUSEREXIT_CONF = "dirprm/hdfs.props")
CUSEREXIT libggjava_ue.so CUSEREXIT PASSTHRU, INCLUDEUPDATEBEFORES, PARAMS "dirprm/hdfs.props"
GETUPDATEBEFORES

table ogg.*;

2. Create configuration file for extract HDFS - hdfs.props (I used the example file prepared by Oracle - /home/oracle/ogg/AdapterExamples/big-data/hdfs/hdfs.props)

oracle@dm:~/ogg/dirprm$ cat hdfs.props 

gg.handlerlist=hdfs

gg.handler.hdfs.type=hdfs
gg.handler.hdfs.includeTokens=false
gg.handler.hdfs.maxFileSize=1g
gg.handler.hdfs.rootFilePath=/ogg1
gg.handler.hdfs.fileRollInterval=0
gg.handler.hdfs.inactivityRollInterval=0
gg.handler.hdfs.fileSuffix=.txt
gg.handler.hdfs.partitionByTable=true
gg.handler.hdfs.rollOnMetadataChange=true
gg.handler.hdfs.authType=none
gg.handler.hdfs.format=delimitedtext
gg.handler.hdfs.format.includeColumnNames=true

gg.handler.hdfs.mode=tx

goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE

gg.log=log4j
gg.log.level=INFO

gg.report.time=30sec

gg.classpath=/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/etc/hadoop/:

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar
You have to change gg.classpath variable - use your directory structure.

3. Add extract HDFS
GGSCI (dm) 2> add extract hdfs, exttrailsource ./dirdat/tf

4. Start extract HDFS and check status
GGSCI (dm) 3> start hdfs

Sending START request to MANAGER ...
EXTRACT HDFS starting


GGSCI (dm) 4> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     HDFS        00:00:00      00:00:01 

5. If something goes wrong check report (in this step I had problem with LD_LIBRARY_PATH variable - look for details in Additional configuration paragraph)
GGSCI (dm) 5> view report hdfs


6. You are done! Everything should work now!


Testing

1. Insert/update/delete row to table (in my case 'OGG.CUST_INFO'), i.e.:
SQL> insert into  ogg.cust_info(id, name, city, phone) values(1, 'Dariusz Mach', 'Warsaw', '+48 22 690 87 00');

2. Check stats in OGG and OGGBD on all extract (you should see only one insert - I performed others operations before).

GGSCI (dm) 1> stats hdfs

Sending STATS request to EXTRACT HDFS ...

Start of Statistics at 2016-03-08 16:07:12.

Output by User Exit:

Extracting from OGG.CUST_INFO to OGG.CUST_INFO:

*** Total statistics since 2016-03-08 16:06:30 ***
 Total inserts                               1.00
 Total updates                               0.00
 Total deletes                              24.00
 Total discards                              0.00
 Total operations                           25.00

*** Daily statistics since 2016-03-08 16:06:30 ***
 Total inserts                               1.00
 Total updates                               0.00
 Total deletes                              24.00
 Total discards                              0.00
 Total operations                           25.00

*** Hourly statistics since 2016-03-08 16:06:30 ***
 Total inserts                               1.00
 Total updates                               0.00
 Total deletes                              24.00
 Total discards                              0.00
 Total operations                           25.00

*** Latest statistics since 2016-03-08 16:06:30 ***
 Total inserts                               1.00
 Total updates                               0.00
 Total deletes                              24.00
 Total discards                              0.00
 Total operations                           25.00

End of Statistics.

All extracts should have similar output after running 'stats <extract name>command.

3. Time to check our HDFS space. In configuration I chose /ogg1 directory for HDFS extract output.

oracle@dm:~$ hadoop fs -ls /ogg1
Found 1 items
drwxr-xr-x   - oracle supergroup          0 2016-03-08 16:06 /ogg1/ogg.cust_info
oracle@dm:~$ hadoop fs -ls /ogg1/ogg.cust_info
Found 1 items
oracle@dm:~$ hadoop fs -cat /ogg1/ogg.cust_info/ogg.custogg.cust_info_2016-03-08_16-06-31.194.txt
I OGG.CUST_INFO 2016-03-08 15:06:25.000133 2016-03-08T16:06:32.249014 00000000040000003982  NAME Dariusz Mach GENDER NULL CITY Warsaw PHONE +48 22 690 87 00 ID 1 ZIP NULL CUST_DATE NULL

4. You are done! Extract working!


Additional configuration:

I had a lot of problems with missing libraries, permissions and others, that's why below you will find some config stuff - it should be helpful ;)

1. Environment variable

.bashrc - edit it by (you can use your favorite text editor: vi, nano, etc.):
oracle@dm:~$ gedit ~/.bashrc

On the bottom of file you can paste it and edit with your directory structure:
# Oracle Settings
ORACLE_HOSTNAME=127.0.0.1 
export ORACLE_HOSTNAME
ORACLE_BASE=/u01/app/oracle 
export ORACLE_BASE
ORACLE_HOME=$ORACLE_BASE/product/12/dbhome_1 
export ORACLE_HOME
ORACLE_SID=DB 
export ORACLE_SID
PATH=$PATH:/usr/sbin 
export PATH
PATH=$PATH:$ORACLE_HOME/bin 
export PATH
LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib:/usr/lib64:/opt/java/jdk1.7.0_79/jre/lib/amd64/server/; 
export LD_LIBRARY_PATH
CLASSPATH=$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib; 
export CLASSPATH
##########
HDFS_ROOT_DIRECTORY=/usr/local/hadoop
export HDFS_ROOT_DIRECTORY
GG_ROOT_DIRECTORY=/u01/app/oracle/product/12/oggd
export GG_ROOT_DIRECTORY



#HADOOP VARIABLES START
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$HADOOP_INSTALL/bin:$PATH
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"
#HADOOP VARIABLES END

Be careful with LD_LIBRARY_PATH variable - OGGBD will need it - part with java is required.
Java contain two libraries required by libjjjava_ue.so:
  • libjvm.so
  • libjsig.so
You can find it using i.e.:

oracle@dm:~$ find -name libjsig.so

Please do not choose these from Oracle Database directory - for sure you will get Unsupported major.minor version 51.0 error

If this variable is set properly, you should see something like that:
oracle@dm:~$  ldd libggjava_ue.so 
 linux-vdso.so.1 =>  (0x00007ffedc7c1000)
 libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd07abcc000)
 libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd07a9ad000)
 libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd07a6a7000)
 libjsig.so => /opt/java/jdk1.7.0_79/jre/lib/amd64/server/libjsig.so (0x00007fd07a4a4000)
 libjvm.so => /opt/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so (0x00007fd079629000)
 libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fd079325000)
 libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd07910f000)
 libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd078d49000)
 /lib64/ld-linux-x86-64.so.2 (0x00005563ec5ce000)

2. HDFS privileges 

I had also problem with user privileges to write into HDFS space.
You can try to add missing privileges to user or change owner of directory, using chmod or chown commands, similarly to Ubuntu terminal commands, for example:

oracle@dm:~$ hadoop fs -mkdir /dm 
oracle@dm:~$ hadoop fs -chown oracle:oinstall /dm
oracle@dm:~$ hadoop fs -chmod 754 /dm
oracle@dm:~$ hadoop fs -ls /
Found 5 items
drwxr-xr--   - oracle oinstall            0 2016-03-08 17:04 /dm





Summary