Getting started with AWS Athena – Part 2

Reading Time: 2 minutes

In previous blog, I talked about how to get going with Athena as service.  In this post, we will create two type of tables:

  • Table on existing text data under S3 bucket
  • Table on S3 bucket, formatted by json

From looking at the structure AWS is converting their S3 storage as HDFS external storage via HiveServer2 and HiveMeta store. So, suppose  you have table in Hadoop and you need to migrate the data to S3 or you are landing the data into S3 directly; features like Hive SerDe can transform your data-set on fly when accessing from S3 bucket. Interesting feature, Now let see it in action:

I assume you already have some data in S3 bucket or copy data from from source to S3 bucket.  I might do another post explaining how to extend your HDFS cluster with S3.

First table that I will create is based on comma separated textfile:

DDL for Text Table

CREATE EXTERNAL TABLE IF NOT EXISTS default.kinesis_stream_test (
 `trans_id` string,
 `trans_num` string,
 `source_ip` string,
 `kinesis_stream_event` string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
 'serialization.format' = ',',
 'field.delim' = ','
) LOCATION 's3://myS3bucket/kinesis_stream/text/'

Interesting thing to notice here is that I am not using “create table wizard”, I can simple generate my DDL and create external table on top of my data.  All I need is my S3 end-point and my table is ready.  I used similar method for both Text dataset and JSON formatted tables.

athena-text-table

Sample dataset staged in S3 bucket:

ff6d3ee7-cd2b-4d62-8a5c-3e50ff96f120,201701210601,ip-172-31-36-113,AWS Kinesis stream test # 2017012106011484980006
92abd107-333b-4749-8b4c-64fa78a6f8d3,201701210601,ip-172-31-36-113,AWS Kinesis stream test # 2017012106011484980006
a67deb57-36c6-48cd-99b3-0cb89814c1bf,201701210601,ip-172-31-36-113,AWS Kinesis stream test # 2017012106011484980006
96dafd35-b8d1-4334-bb01-73796e823be5,201701210601,ip-172-31-36-113,AWS Kinesis stream test # 2017012106011484980006
2eec7e53-e696-4a93-a541-1f3f2757e804,201701210601,ip-172-31-36-113,AWS Kinesis stream test # 2017012106011484980006
b16e4fa4-8971-4dd9-9701-20a60f603618,201701210601,ip-172-31-36-113,AWS Kinesis stream test # 2017012106011484980006
1e8be437-66d8-4262-9409-5638f7305c33,201701210601,ip-172-31-36-113,AWS Kinesis stream test # 2017012106011484980006
b7345fde-73e5-4fb2-945d-7189af290ea9,201701210601,ip-172-31-36-113,AWS Kinesis stream test # 2017012106011484980006
ac324869-aa5f-4c7b-865d-210ec79a14d8,201701210601,ip-172-31-36-113,AWS Kinesis stream test # 2017012106011484980006

DDL for JSON Table

CREATE EXTERNAL TABLE IF NOT EXISTS default.kinesis_stock_stream (
 `change` double,
 `price` double,
 `sector` string,
 `ticker_symbol` string 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
 'serialization.format' = '1'
) LOCATION 's3://aeg-awsfirehose/2017/01/';

Getting Started with AWS Athena – Part 1

Reading Time: 3 minutes

At last AWS ReInvent, AWS announced new service called “Athena” (Greek virgin goddess of reason). There has been a lot fuss about AWS S3 service, as I am getting more and more comfortable with AWS platform, I thought let put Athena to test.

In this five part series, we will do deep-dive how the service works out-of-box and with some customizations in mix.

So, let’s get our hands in the mix

  • Login to your AWS console
  • Search for Athena service
  • Once on landing page, you get option to create ELB table
  • You will get to tutorial page, let’s use that for our initial table creation.

  • Wizard will ask you to create new table

  • To keep structure more simple, I created new metadata database for all my Athena tables
    • DB Name = my_athena_db
    • Table name = my_first_athena_table
    • S3 example location = s3://athena-examples-us-east-2/elb/plaintext

  • Next, since we are going to mine weblog – AWS provide RegEx for us

  • Next, pre-populate the columns using prebuilt method

  • Verify and validate everything looks good

  • Execute “Run Query”

So, I created external table using S3 bucket, DDL is below:

CREATE EXTERNAL TABLE IF NOT EXISTS my_athena_db.my_first_athena_table (
  `request_timestamp` string,
  `elb_name` string,
  `request_ip` string,
  `request_port` int,
  `backend_ip` string,
  `backend_port` int,
  `request_processing_time` double,
  `backend_processing_time` double,
  `client_response_time` double,
  `elb_response_code` string,
  `backend_response_code` string,
  `received_bytes` bigint,
  `sent_bytes` bigint,
  `request_verb` string,
  `url` string,
  `protocol` string,
  `user_agent` string,
  `ssl_cipher` string,
  `ssl_protocol` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1',
  'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) ([-0-9]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*)$'
) LOCATION 's3://athena-examples-us-west-2/elb/plaintext/';

Notice that above format looks quite similar to Hive SerDe table.

Now that I have my table created, let take a look what we can do with it..

select count(*) from my_athena_db.my_first_athena_table;

Pretty cool! I just extracted data from S3 bucket without having any database/Hadoop cluster. Although, I don’t think performance will be that great with Athena, but I think depending on use-case this service might be great and cost effective.

Now that we are on subject of cost, with Athena, you only pay for Queries that you execute. there is no cost compute or storage(Storage cost still apply for S3).

Another feature that I noticed, Athena front-end/UI looks too similar to Cloudera HUE, but I can’t find any document that is referencing that.

But, from offering aspect, I think Athena will do great! comments or feedback are welcome!

Note: Athena is currently only available in North Virginia, Ohio and Oregon.

Resuming blogging!

Reading Time: < 1 minute

Back in Action!

It has been a while since I posted anything, I have been busy learning new things. I will share my journey in multi-part series.

I will first start with how tech industry is changing and how individuals needs to evolve and learn new skill sets.

My days typically have been very tedious and running 100 MPH trying to get things done. Sometimes I feel very exhausted and want to just give-up, but my love for technology and solving problems always pull me back.

As of last summer, I completed my AWS solution architect certification and hoping to build on Amazon skill set. I am amazed with AWS platform and enjoying learning new things everyday.

I will start to post my learning and challenges that I have gone through, also I would like to start two-way conversion and understand what readers like me to focus on bit more.

Hope to see you soon!

Thanks,

Big data SQL in action

Reading Time: 2 minutes

Recently I have been engaged in implementing Oracle Big Data connector for customer we are helping.

Here is a preview of Big data SQL connector which can benefits from both Exadata smart scans and Hadoop massive palatalization…I will publish article in future and steps to implement BD SQL with Exadata and BDA appliance.

BDSQL> select /*+ MONITOR */ /* TESTAHK_YR */ count(*) FROM ORA_FLIGHTS group by YEAR;

COUNT(*)
———-
5411843
5967780
5683047
5270893
5327435
7129270
5180048
5271359
5076925
22
7140596
5070501
7141922
5527884
5384721
1311826
5351983
7453215
5041200
5202096
6488540
5092157
7009728
Elapsed: 00:00:15.34

Execution Plan
———————————————————-
Plan hash value: 3679660899

————————————————————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
————————————————————————————————————————————–
| 0 | SELECT STATEMENT | | 22 | 88 | 204K (2)| 00:00:08 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 22 | 88 | 204K (2)| 00:00:08 | Q1,01 | P->S | QC (RAND) |
| 3 | HASH GROUP BY | | 22 | 88 | 204K (2)| 00:00:08 | Q1,01 | PCWP | |
| 4 | PX RECEIVE | | 22 | 88 | 204K (2)| 00:00:08 | Q1,01 | PCWP | |
| 5 | PX SEND HASH | :TQ10000 | 22 | 88 | 204K (2)| 00:00:08 | Q1,00 | P->P | HASH |
| 6 | HASH GROUP BY | | 22 | 88 | 204K (2)| 00:00:08 | Q1,00 | PCWP | |
| 7 | PX BLOCK ITERATOR | | 123M| 471M| 202K (1)| 00:00:08 | Q1,00 | PCWC | |
| 8 | EXTERNAL TABLE ACCESS STORAGE FULL| ORA_FLIGHTS | 123M| 471M| 202K (1)| 00:00:08 | Q1,00 | PCWP | |
————————————————————————————————————————————–

Note
—–
– Degree of Parallelism is 2 because of table property
Statistics
———————————————————-
293 recursive calls
100 db block gets
302 consistent gets
8 physical reads
0 redo size
995 bytes sent via SQL*Net to client
563 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
23 rows processed

BDSQL>

BDS

Oracle RAT Step by Step -Part 3

Reading Time: 3 minutes

Replay workload on target database – Part 2

 

Prepare Capture data

Copy Capture data

Copy the capture files from source system(Production) to the directory created in above step.

 

Preprocessing Capture data

 

 

BEGIN

dbms_workload_replay.process_capture ( capture_dir => ‘<REPLAY_DIR_NAME_WHERE_CAPTURE_FILES_BEEN_STAGED>’ );

END;

/

 

Example:

BEGIN

dbms_workload_replay.process_capture ( capture_dir => ‘REPLAY_TGT_1’ );

END;

/

 

— Monitor time to process data –-

 

The statements below will give EST time remaining, and data set size in MB which has been processed. The statements below and only informational.

 

 

SELECT dbms_workload_replay.process_capture_remaining_time FROM dual;

SELECT dbms_workload_replay.process_capture_completion FROM dual;

select sum(bytes_done/1024/1024/1024) as GB from wrr$_processing_progress;

 

 

example Output:

 

SQL> SELECT dbms_workload_replay.process_capture_remaining_time FROM dual;

 

PROCESS_CAPTURE_REMAINING_TIME

——————————

5.55503472

 

 

SQL> SELECT dbms_workload_replay.process_capture_completion FROM dual;

 

PROCESS_CAPTURE_COMPLETION

————————–

4.47761194

 

In Oracle 11.2.0.3 the PROCESS_CAPTURE procedure creates a new subdirectory called pp11.2.0.3.0 in the replay directory containing the following files:

  • wcr_calibrate.html
  • wcr_commits.extb
  • wcr_conn_data.extb
  • wcr_data.extb
  • wcr_dep_graph.extb
  • wcr_login.pp
  • wcr_process.wmd
  • wcr_references.extb
  • wcr_scn_order.extb
  • wcr_seq_data.extb

 

 

Work Load analyzer

 

Java program that analyses a workload capture directory

java -classpath $ORACLE_HOME/jdbc/lib/ojdbc5.jar:$ORACLE_HOME/rdbms/jlib/dbrparser.jar:$ORACLE_HOME/rdbms/jlib/dbranalyzer.jar: oracle.dbreplay.workload.checker.CaptureChecker <OS location of Capture Staged files> jdbc:oracle:thin:@<Hostname>:<Listener Port>:<Service_name/SID>

 

 

Example:

java -classpath $ORACLE_HOME/jdbc/lib/ojdbc5.jar:$ORACLE_HOME/rdbms/jlib/dbrparser.jar:$ORACLE_HOME/rdbms/jlib/dbranalyzer.jar: oracle.dbreplay.workload.checker.CaptureChecker /export/home/oracle/rat/target_db jdbc:oracle:thin:@hostname1-vip:1521:TARGET

 

—Results are stored in capture directory in the following files:

<REP_DIR>/wcr_cap_analysis.html

<REP_DIR>/wcr_cap_analysis.xml

–END—

Initialize Replay

 

 

–Get Capture name

 

set lines 150

col NAME for a45

select id, NAME, DBNAME, DBVERSION, STATUS, to_char(START_TIME, ‘MM/DD/YYYY HH24:MI:SS’) as START_TIME , to_char(END_TIME, ‘MM/DD/YYYY HH24:MI:SS’) as END_TIME from DBA_WORKLOAD_CAPTURES

;

BEGIN

dbms_workload_replay.initialize_replay ( replay_name => ‘<REPLAY_NAME>’,

replay_dir => ‘<REPLAY_DIRECTORY_NAME>’ );

END;

/

 

Example:

BEGIN

dbms_workload_replay.initialize_replay ( replay_name => ‘REPLAY_QECMTRP_1’,

replay_dir => ‘REPLAY_TGT_1’ );

END;

/

 

—Known to populate the following base tables:

WRR$_REPLAYS

WRR$_CONNECTION_MAP

–END—

Generate Re-Map Connection

 

–get connection info

set pages 100

set lines 150

col CAPTURE_CONN for a50

col REPLAY_CONN for a50

spool save_pre_remap.out

select * from dba_workload_connection_map;

spool off

 

— SQL will generate re-map connection SQL which will need to be executed to connect to REPLAY HOST —

Warning: This is important steps and verification must be done before proceeding to next step.

 

 

 

set lines 189

set echo off

set head off

set term off

set long 200

set feedback off

spool /tmp/remap_replay_connection.sql

SELECT ‘EXEC dbms_workload_replay.remap_connection (connection_id=>’||a.conn_id||’,replay_connection => ”(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP) (HOST=<TARGET DB LISTENER HOST>)(PORT=<LISTENER-PORT>)) (CONNECT_DATA=(SID=<DB SID/SERVICE NAME>)))”);’

FROM dba_workload_connection_map a, dba_workload_replays b WHERE a.replay_id = b.id AND b.status =’INITIALIZED’ ORDER BY a.conn_id;   spool off

 

 

Example:

 

 

Sqlplus –s

set lines 250

set echo off

set head off

set term off

set long 2000

set feedback off

spool /tmp/remap_replay_connection.sql

SELECT ‘EXEC dbms_workload_replay.remap_connection (connection_id=>’||a.conn_id||’,replay_connection => ”(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP) (HOST=Hostname1-vip)(PORT=1521)) (CONNECT_DATA=(SID=TARGET1)))”);’

FROM dba_workload_connection_map a, dba_workload_replays b WHERE a.replay_id = b.id AND b.status =’INITIALIZED’ ORDER BY a.conn_id;

spool off

exit;

 

 

 

Execute Re-Map connection script

In this step execute script that was generated above and verify the results

Execute remap script

SQL> @/tmp/remap_replay_connection.sql

 

Validate Replay Connections

  • –Save Post re-map connection info

 

set pages 100

set lines 150

col CAPTURE_CONN for a50

col REPLAY_CONN for a50

spool ${ORACLE_SID}_post_remap_connection.out

select * from dba_workload_connection_map;

spool off

 

  • –Get Count of null connections

 

select count(*) from dba_workload_connection_map where REPLAY_CONN is null;

 

–if count is > 0 then, generate new connection re-map script and run again

 

 

  • –Get count by connection name and status

 

set pages 100

set lines 180

col CAPTURE_CONN for a50

col REPLAY_CONN for a50

SELECT count(*) , a.REPLAY_CONN, a.REPLAY_ID , b.NAME, b.STATUs

FROM dba_workload_connection_map a, dba_workload_replays b WHERE a.replay_id = b.id group by a.REPLAY_CONN, a.REPLAY_ID , b.NAME,b.STATUs;

 

set pages 100

set lines 150

col CAPTURE_CONN for a50

col REPLAY_CONN for a50

select count(*) , REPLAY_CONN, REPLAY_ID from dba_workload_connection_map group by REPLAY_CONN, REPLAY_ID ;

Oracle RAT Step by Step -Part 3

Reading Time: 2 minutes

Replay workload on target database – Part 1

Restore Target database

 

RMAN or export/import can be used to stage target/replay database. To restore database using RMAN please refer to Oracle RMAN documentation.

Drop Db_links

 

Drop db_links that may be pointing to external systems or re-point db_links to test systems.

set lines 150

col db_link for a40

select OWNER,DB_LINK,USERNAME,to_char(CREATED, ‘MM/DD/YYYY HH24:MI:SS’) as CREATED from dba_db_links;

 

–generate drop statement for public database links

sqlplus -s / as sysdba <<EOF

set feedback 0 pages 0 lines 132 echo on timing off

set echo off

set term off

spool /tmp/drop_db_links.sql

 

select ‘DROP ‘||OWNER||’ DATABASE LINK ‘||DB_LINK||’ ; ‘ from dba_db_links where OWNER=’PUBLIC’;

 

spool off

EOF

 

Note: If db_links can’t be dropped using “drop database link” use below.

 

Check sys db_link tables –

select * from sys.link$;

select db_link, username, host from user_db_links;

 

Delete records manually –

delete from sys.link$;

commit;

 

Gather Schema Stats

 

BEGIN

DBMS_STATS.gather_schema_stats (ownname => ‘SCHEMA1’ ,

cascade =>true,

estimate_percent => dbms_stats.auto_sample_size,

method_opt => ‘FOR ALL COLUMNS SIZE AUTO’,

degree => 8);

END;

/

 

 

BEGIN

DBMS_STATS.gather_schema_stats (ownname => ‘SCHEMA2’ ,

cascade =>true,

estimate_percent => dbms_stats.auto_sample_size,

method_opt => ‘FOR ALL COLUMNS SIZE AUTO’,

degree => 8);

END;

/

 

 

 

 

BEGIN

DBMS_STATS.gather_schema_stats (ownname => ‘SCHEMA3’,

cascade =>true,

estimate_percent => dbms_stats.auto_sample_size,

method_opt => ‘FOR ALL COLUMNS SIZE AUTO’,

degree => 8);

END;

/

 

 

Create Flashback Restore Point

 

To replay capture data multiple times, Oracle database flashback will allow us to restore database to SCN # without having to use RMAN restore.

—Check for existing restore point

 

set linesize 121

col name format a15

col time format a32

 

SELECT name, scn, time, database_incarnation#, guarantee_flashback_database, storage_size

FROM gv$restore_point;

–CREATE GUARANTEED RESTORE POINT

 

 

CREATE RESTORE POINT <RESTORE POINT NAME> GUARANTEE FLASHBACK DATABASE;

 

Example:

 

CREATE RESTORE POINT PRE_REPLAY_RAT GUARANTEE FLASHBACK DATABASE;

 

 

 

 

 

 

 

Create Directory

 

–Check the Directory

 

set lines 1024;

col DIRECTORY_PATH for a90

 

select * from dba_directories;

 

—CREATE RAT REPLAY DIRECTORY

 

 

create or replace directory <DIRECTORY NAME> as ‘<DIRECTORY LOCATION>’;

 

example : create or replace directory REPLAY_TGT_1 as ‘/export/home/oracle/rat/traget_db’;

 

 

Oracle RAT Step by Step -Part 2

Reading Time: 3 minutes

Oracle RAT – Step by Step – Part 2

 

Step Source database for capture.

 

Steps:

 

Create Directory

 

–Check for existing dir’s

 

set lines 1024;col DIRECTORY_PATH for a90

 

select * from dba_directories;

 

exit;

 

In this step, directory is created to store capture transactions that will stored while capture is runing

 

 

 

create or replace directory <CAPTURE_DIRECTORY> as ‘<Directory_Filesystem_Location>’;

 

 

Example:

 

create or replace directory CAPTURE_DIRECTORY as ‘/backup/oracle/CAPTURE’;

Current Time and SCN #

 

ALTER SESSION SET NLS_DATE_FORMAT = ‘YYYY-MM-DD HH:MI:SS’;

select sysdate from dual;

 

Example:

 

SQL> ALTER SESSION SET NLS_DATE_FORMAT = ‘YYYY-MM-DD HH:MI:SS’;

 

Session altered.

 

SQL> select sysdate from dual;

 

SYSDATE

——————-

2014-03-25 12:44:12

 

SQL> SELECT name, TO_CHAR(CURRENT_SCN) FROM V$DATABASE;

 

NAME     TO_CHAR(CURRENT_SCN)

——— —————————————-

RPSTST   10863666225560

 

Capture SCN and start time can also be gathered after the capture workload report is created.

 

 

 

 

Start Capture

 

Backup

 

If there is existing level 0 backup prior to starting capture, we will only need to take backup of archivelogs backup. Otherwise Level 0 backup will need to be taken.

Note: If Level 0 backup is not taken, take a backup before the capture process starts to eliminate RMAN backup noise. Otherwise capture will track RMAN activity.

 

 

connect target;connect catalog catalog_user/catalog_password@<Catalog_Database>

run {

backup archivelog all not backed up delete all input;

}

 

Example:

connect target;

connect catalog rcat_user/<password>@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=hostname1)(PORT=1521)))(CONNECT_DATA=(SID=catalog_database_sid)(SERVER=DEDICATED)))

run {

backup archivelog all not backed up delete all input;

}

Capture

 

 

EXEC DBMS_WORKLOAD_CAPTURE.START_CAPTURE(name=>'<CAPTURE_NAME>’,dir=>'<CAPTURE_DIRECTORY>’, duration=><TIME_IN_SECS>); 

Example:

 

EXEC DBMS_WORKLOAD_CAPTURE.START_CAPTURE(name=>’CAPTURE_RPSTST_1′,dir=>’CAPTURE_DIRECTORY’, duration=>10800);

 

 

 

Monitor Capture

 

Monitor Capture progress 

 

–Time remaining for capture

select dbms_workload_replay.process_capture_remaining_time from dual;

 

 

–Capture ID

 

SELECT DBMS_WORKLOAD_CAPTURE.get_capture_info(‘CAPTURE_DIRECTORY’) FROM     dual;

 

COLUMN name FORMAT A30

SELECT id, name FROM dba_workload_captures;

 

 

–Get status of capture with incomplete:

 

select id, NAME, DBNAME, DBVERSION, STATUS, START_TIME from DBA_WORKLOAD_CAPTURES

where STATUS <> ‘COMPLETED’;

 

Get status of All capture:

 

Set lines 150

select NAME, DBNAME, DBVERSION, STATUS, to_char(START_TIME,’YYYY/MM/DD HH24:MI:SS’) from DBA_WORKLOAD_CAPTURES

;

 

Connection count:

For RAC:

select count(*) , S.status, S.INST_ID, G.INSTANCE_NAME from gV$SESSION S, gV$instance G where S.INST_ID=G.INST_ID   group by S.status, S.INST_ID, G.INSTANCE_NAME;

For single instance:

select count(*) , status, INST_ID from gV$SESSION group by status, INST_ID;

 

select dbms_workload_replay.process_capture_remaining_time from dual;

 

 

 

 

 

 

 

 

 

 

 

 

Capture Reports

 

Generate HTML Capture report

–Get Capture ID

 

select id, name from dba_workload_captures where name='<CAPTURE_NAME>’;

 

Example:

SQL> select id, name from dba_workload_captures where name=’CAPTURE_RPSTST_1′;

 

ID NAME

———- ——————————

52 CAPTURE_RPSTST_1

 

 

 

DECLARE

l_report   CLOB;

BEGIN

l_report := DBMS_WORKLOAD_CAPTURE.report(capture_id =><CAPTURE_ID>,

format     => DBMS_WORKLOAD_CAPTURE.TYPE_HTML);

END;

/

 

 

Example:

DECLARE

l_report   CLOB;

BEGIN

l_report := DBMS_WORKLOAD_CAPTURE.report(capture_id => 52,

format     => DBMS_WORKLOAD_CAPTURE.TYPE_HTML);

END;

/

 

 

Step above will generate HTML report for capture from start to end in “cap” sub directory

 

<CAPTURE_DIR> à CAP à wcr_cr.html

 

Sample wcr_capture report for RPSTST

Export AWR snapshot

–Get Capture ID

 

select id, name from dba_workload_captures where name='<CAPTURE_NAME>’;

 

Example:

SQL> select id, name from dba_workload_captures where name=’CAPTURE_RPSTST_1′;

 

ID NAME

———- ——————————

52 CAPTURE_RPSTST_1

 

 

–Export AWR Data –

 

BEGIN

DBMS_WORKLOAD_CAPTURE.export_awr (capture_id => <CAPTURE_ID>);

END;

/

 

Example:

BEGIN

DBMS_WORKLOAD_CAPTURE.export_awr (capture_id =>52);

END;

/

 

 

Step above will generate awr dmp file for capture in “cap” sub directory

 

  • wcr_ca.dmp – Data Pump export
  • wcr_ca.log – Data Pump log file
  • wcr_cap_uc_graph.extb – User calls graph

 

 

 

##—————————————————————–##

##               End of Capture                                                        ##

##—————————————————————–##

Oracle RAT Step by Step

Reading Time: < 1 minute

Oracle RAT – Step by Step – Part 1

 

I am currently working on gig where Oracle RAT will be system stress test tool to simulate production load.  Oracle RAT is multi step process and it consist of following.

Here is breakdown on Oracle RAT from my experience.

  • Identify system workload – Source
  • Baseline backup of Source environment
  • Staging capture directory and making sure there is plenty of space
  • Starting Capture process and monitoring capture/database while it is running
  • Generate and export capture workload
  • Identify target system
  • Moving capture transactions file to target system
  • Processing capture data
  • Re-Map target connections for replay
  • Start Replay and monitor database load – Active Session History and SQL monitoring reports can be used
  • Generate reports and analysis of replay

 

 

 

 

 

 

SparkR config with Hadoop 2.0

Reading Time: 2 minutes

I am engaged on gig where sparkR will be used to run R jobs and currently  I am working on config. Once I troubleshoot all issues I will post steps to get Spark cluster working.

Followup from my initial fustration with SparkR. I was pretty close to giving up as why SparkR would not work on cluster I was working on. After much follow up with Shivaram(SparkR package author) we were finally able to get SparkR working as cluster job.

SparkR can be downloaded from https://github.com/amplab-extras/SparkR-pkg

SparkR configuration

Install R

Instruction below are for Ubuntu

ALPHA root@host:~$ nano /etc/apt/sources.list
ALPHA root@host:~$ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
ALPHA root@host:~$ add-apt-repository ppa:marutter/rdev
ALPHA root@host:~$ apt-get install python-software-properties
ALPHA root@host:~$ add-apt-repository ppa:marutter/rdev
ALPHA root@host:~$ apt-get update
ALPHA root@host:~$ apt-get install r-base

Conf Java for R

wget http://cran.cnr.berkeley.edu/src/contrib/rJava_0.9-6.tar.gz

sudo R CMD INSTALL rJava_0.9-6.tar.gz

Modify spark-env.sh
#!/usr/bin/env bash

export STANDALONE_SPARK_MASTER_HOST=hostname.domain.com

export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST

export SPARK_LOCAL_IP=xxx.xxx.xxx.xxx

### Let’s run everything with JVM runtime, instead of Scala
export SPARK_LAUNCH_WITH_SCALA=0
export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib
export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib
export SPARK_MASTER_WEBUI_PORT=18080
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_PORT=7078
export SPARK_WORKER_WEBUI_PORT=18081
export SPARK_WORKER_DIR=/var/run/spark/work
export SPARK_LOG_DIR=/var/log/spark

if [ -n “$HADOOP_HOME” ]; then
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi

### Comment above 2 lines and uncomment the following if
### you want to run with scala version, that is included with the package
#export SCALA_HOME=${SCALA_HOME:-/usr/lib/spark/scala}
#export PATH=$PATH:$SCALA_HOME/bin

Note: This will need to done for worker nodes as well.

Switch user to HDFS
suhdfs
Git Clone

git clone https://github.com/amplab-extras/SparkR-pkg

Building SparkR

SPARK_HADOOP_VERSION=2.2.0-cdh5.0.0-beta-2
./install-dev.sh

Copy SparkRpkg to worker nodes

Example : scp –r SparkR-pkg hdfs@worker1:

Execute Test Job

cd SparkR-pkg/

export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark

source /etc/spark/conf/spark-env.sh

./sparkR examples/pi.R spark://hostname.domain.com:7077

Sample results
hdfs@xxxx:~/SparkR-pkg$ ./sparkR examples/pi.R spark://xxxx.xxxxx.com:7077
./sparkR: line 13: /tmp/sparkR.profile: Permission denied
Loading required package: SparkR
Loading required package: methods
Loading required package: rJava
[SparkR] Initializing with classpath /var/lib/hadoop-hdfs/SparkR-pkg/lib/SparkR/sparkr-assembly-0.1.jar

14/02/27 16:29:09 INFO Slf4jLogger: Slf4jLogger started
Pi is roughly 3.14018
Num elements in RDD 200000
hdfs@xxxx:~/SparkR-pkg$I