0% found this document useful (0 votes)
17 views52 pages

DataStage_EndToEnd_Interview_Question & Answers

The document provides a comprehensive overview of Data Stage interview questions, covering topics such as SMP and MPP systems, node configuration, parallelism, partition techniques, and various DataStage stages like aggregator, join, and lookup. It explains key concepts like pipeline parallelism, dataset management, and the differences between sequential files and datasets. Additionally, it addresses specific functionalities within DataStage, such as handling null values, reading Excel files, and generating mock data.

Uploaded by

pbhavani2131
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views52 pages

DataStage_EndToEnd_Interview_Question & Answers

The document provides a comprehensive overview of Data Stage interview questions, covering topics such as SMP and MPP systems, node configuration, parallelism, partition techniques, and various DataStage stages like aggregator, join, and lookup. It explains key concepts like pipeline parallelism, dataset management, and the differences between sequential files and datasets. Additionally, it addresses specific functionalities within DataStage, such as handling null values, reading Excel files, and generating mock data.

Uploaded by

pbhavani2131
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Complete Data Stage Interview Questions

1) what is SMP system?

- Symmetric multiprocessing (SMP) involves a multiprocessor computer hardware and software


architecture where two or more identical processors are connected to a single shared main memory,
have full access to all I/O devices, and are controlled by a single OS instance, and in which all processors

2) what is MPP system?

- A distributed-memory parallel computer designed to scale to hundreds if not thousands of processors.


To better support high scalability, the computer elements or nodes in the MPP machine are custom -
designed for use in a scalable computer.

3) Node Configuration: **
o Node is software that is created in operating system.
o Node configuration is a technique to create the logical CPU.
o “Node is a logical CPU i.e., is instance of physical CPU.
o Hence, “the process of creating virtual CPU’s is called Node Configuration.”
o Node configuration concept is exclusively work on the DataStage; it is the best
Feature comparing from other ETL tools.

4) what is the parallelism?


executing your application on multiple CPUs

5) what is the partition parallelism?

o The partition parallelism run the same job would effectively by multiples CPU’S
o Split the source data into subset is known as Partitions.
o Partition is a distributing the data across the nodes, based on partition
techniques.

1
o Each partition of the data is processed by same node.
o Partition parallelisms facilitate near linier scalability.
ex: 8times faster on 8 processors
24times faster on 24 processors

6) what are all the partition Techniques? ******


there are 2 types of partition techniques

 Key based
 Hash - records with similar hash key values received by same node and partition by
same node
 Modulus – it is similar to the Hash, but it will perform on only numeric key columns,
 Range – similar hash values it will process, but it is slightly over head to process the
range of records
 Db/2 -- db2 connector stage.
 Key less
 Round robin – all the records distributed to each node evenly across multiple nodes.
Ex: if we have 4 nodes, first records goes to 1st node 2nd record goes to 2nd node.
 Random -- all the records distributed to each node evenly across multiple nodes. It is
similar to the Round Robin Ex: if we have 4 nodes, first records goes to 1st node 2nd
record goes to 2nd node.
 Entire – it will accommodate all input data into each node.
 Same - it will process the same node data to next preceding stages.

7) what is the pipeline parallelism?

“All pipes carry the data parallel and the process done simultaneously”
In server environment: the execution process is called traditional batch
Processing.

8) what is the node APT configuration file / Node configuration file? ***********

Configuration file:
 Configuration file contain information about processors. Each processor is known as Node.
 A node is logical representation of CPU. It is instance of Physical CPU.

2
 Configuration file is created and managed by Datastage designer client. Tools-> configuration
 Configuration file created and saved with an extinction “.apt” (Advanced Parallel Technology)
 Configuration file is activated by “Runtime environment variable called “$APT_CONFIG_FILE”.
 We can determine the parallelism from configuration file
No. Of Nodes = No.of Partitions
No.of Nodes= Degree of parallelism

Node Components:
There are 4 node components are in Configuration file
1) Node name
2) Fast Name
3) Pool
4) Resource – Resource Disk
-- Resource Scratch Disk.

Note: in which path datasets default save?


C:/IBM/InformationServer/Server/Datasets2

3
9) what is the difference between Sequential File stage and Dataset? *******
Sequential File stage:
Sequential file is a file stage, it will read flat files for different extensions/outputs (.csv, .txt,.psv)
file formats.

it reads/writes the data sequentially by default when it reads/writes singe file.


It also read/writes parallel when read/writes from multiple files.
it supports one input link (or) one output link and a (one) reject link.
In sequential file stage we do have some limitations like:
 Memory limit is 2 GB (.txt format)
 Problems with sequential is conversions.
Like: ASCII – NF – ASCII – NF (NF -- > Native Format)
To overcome this Datastage introduced Dataset
Notes: I am reading a single file, but how I can run my job parallel? ***

Enable the “No of Readers per Node = true” Or set the “Read from multiple node =2”

DataSet:
Dataset is file stage, which is used for staging the data when we design
dependent jobs. It allows you to read or write data to a dataset. The stage
can have single input link or a single output link. It can be configured to
execute in parallel or sequential mode.

 Data sets operating system files. Each referred by control files and
which is stored with a .ds format.
 We can manage the datasets independently by using dataset
management utility in datastage desingner, tools -- > data set
management utility
 It supports More than 2 GB data processing.
 No conversion required, because in datasets data represent/resides in
native format.
 Dataset Extension is .ds

Two types of Dataset’s: They are


 Virtual dataset (temporary)
 Persistent dataset (permanent)

4
Q: How many files are created internally when we created a dataset?
Dataset is not a single file; it creates multiple files when it created internally.
o Descriptor file
o Data file
o Control file
o Header file

Descriptor File : it contains schema details and address of data.


Data File: consists of data in the Native Format and resides in
DataStage repository.
Control File/ Header File: It resides in the operating system and
both acting as interface between descriptor file and data file.
Physical file means it stores in the local drive/ local server.
Permanently stores in the install program files c:\ibm\inser..\server\
dataset{“pools”}.

10) how to delete the datasets in unix / from command line?

orachadmin rm dataset.ds
11) how many ways we can see the datasets?

1) Using Dataset Management Utility


2) Using command line -- > $orachadmin grep <dataset file>

12) I have a source file. It contains header and footer values; I need remove them before processing
the data from sequential file stage?

Ex:
current date : 2022
empno,ename,job,sal
12,abc,abc,1000

5
13,aadaf,afa,2000
total count: 2

Output:

empno,ename,job,sal
12,abc,abc,1000
13,aadaf,afa,2000

In the sequential file stage we have a “filter” option to remove the header and footer, using Unix
command we can achieve this.

Filter = sed ‘1d;$d’

13) I have a file it contains 100 records, in the target I need to generate 100 files, how can you achieve
this?

14) How can you handle the null values in Sequential file stage?

6
Step1 – In the format tab,

And in columns tab All the null ability set as Yes

15) How can you read the Excel files?

By using Unstructured stage, we can able to read the Excel data. This stage is exclusively from Excel data
read.

We can able to read the password protected excel files as well.

16) what are all the stages can generate the mock data?

In the Develop and debug stage using Row Generator Stage and Column Generator Stage we can
generate the mock data.

16) what is the difference between Row Generator and Column Generator Stage?

Row Generator: If we don’t’ have any mock data from the client then directly we can use Row generator
stage. It will support only 1 output link.

Note: in entire data stage only one stage can support one output link what is that?- Row generator stage

Column Generator Stage: If we have some sample data for few columns, and if we need to generate
sample data for other columns then we can have used Column generator stage.

17) what is the peak stage?

It is a development and debug stage; it is used to see the logs of input data. It also can act as copy stage.

7
18) I have a data with 15 records, I wanted to seed the data only for 6 th to 9th records? Using
datastage?

By using head and tail stage can achieve the above problem solution.

Step1) read data from source

Step2) use head stage and set the option as “head -9”

Step3) take tail stage and set the option as tail -4

It gives the result as you expected

DB2 Connector Stage


19) what is Isolation Level in db2 connector stage /any connector stage?

Isolation level
Specify the degree to which the data that is being accessed by the Db2 connector
stage is locked or isolated from other concurrently executing transactions, units of
work, or processes
Cursor stability
This is the default value. Takes exclusive locks on modified data and sharable locks on
all other data. Exclusive locks are held until a commit or rollback is executed.
Uncommitted changes are not readable by other transactions. Sharable locks are
released immediately after the data has been processed, allowing other transactions to
modify it.
Read uncommitted
Takes exclusive locks on modified data. Locks are held until a commit or rollback is
executed. No other locks are taken. However, other transactions can still read but not
modify the uncommitted changes.
Read stability

8
Takes exclusive locks on modified data and takes sharable locks on all other data. All
locks are held until a commit or rollback is executed, preventing other transactions
from modifying any data that has been referenced during the transaction.
Repeatable read options
Takes exclusive locks on all data. All locks are held until a commit or rollback is
executed, preventing other transactions from modifying any data that has been
referenced during the transaction.

20) what is the difference between “Insert then update” & “Update then insert”?

Insert then update


You can create an upsert statement by creating the insert statement in
the Insert property and then by creating the update statement in
the Update property. The Insert statement is run before the Update statement.
The Update statement is run on only those records that fail to be inserted.
Update then insert
You can create an upsert statement by creating the update statement in
the Update property and then by creating the insert statement in
the Insert property. The Update statement is run before the Insert statement.
The Insert statement is run on only those records that fail to be updated .

21) what is Record count and Array Size?

Record count
Specify the number of records to process before the connector commits the
current transaction or unit of work. You must specify a value that is a multiple of
the value that you set for Array size. The default value is 2000. If you
set Record count to 0, all available records are included in the transaction.
Valid values are integers 0 - 999999999.

Array size
Specifies the number of records or rows to use in each read or write database
operation. The default value is 2000. Valid values are from 1 to a database-
specific maximum.

9
Processing Stages
1) what is the aggregator stage?

Using aggregator stage we can get the aggregated results, it will support 1 input link and 1 output link.

2) what are all the aggregation types?

a) Calculation
b) Count Rows
c) Column for Calculation

3) Can we do count function along with min, max , avg, sum using single Aggregator stage?

No, we can’t perform Count with min,max and avg….

Reason: Aggregator type = Calculation ( min max avg(median) sum)


Aggregator type = Count rows ( count)

So these 2 options in single aggregator stage can not use.

4) what is the default datatype of the Aggregator stage?

Decimal (8,2) – if we do not mention this default type, basically it give the result in Double datatype
format.

5) what is the difference between Hash & Sort Method?

hash mode for a relatively small number of groups; fewer than about 1000 groups per megabyte of
memory.

Sort mode requires the input data set to have been partition sorted with all of the grouping keys
specified as hashing and sorting keys.

6) what is the copy stage and what is the use of force option in copy stage?
Copy stage is a processing stage, it supports 1 input link and ‘n’ number of output links, we can copy all
input data to multiple output links.

Force: true/false
True to specify that DataStage should not try to optimize the job by removing the Copy operation.

Default Copy operator contains in the sort stage.

We can rename the column; we can drop the columns.

10
7) what is the difference between filter stage and switch stage?
Filter:
1. it is a processing stage, it supports 1 input link and N o/p and 1 reject link
2. Using we can filter the data based on where clause option, and addition unmatched records we can
send to the reject link.

Switch stage:
1) it is a processing stage, it supports 1 input link and 128 output links and 1 reject links.
2) using this we can filter the data based on ‘C’ case statement level filter.
3) it is native stage to the datastage, hence whenever we want to filter only specific data then instead of
filter stage we can use switch stage.

11
4. Main difference in the switch stage, we can’t use the ‘in / or ‘operators in Switch stage, but the same
operators we can use in filter stage.

8) what is the funnel and types of funnel?


Funnel stage is a processing stage, Using the we can combine the multiple input datasets into single
output dataset.
It is a similar to the SQL Union All operator

It has 3 types of Funnels:


a) Continues funnel -- combines records as they arrive (i.e. no particular order);
b) Sequential funnel -- copies all records from the first input data set to the output data set, then all the
records from the second input data set, etc.
c) Sort Funnel -- combines the input records in the order defined by one or more key fields;

Note: Make all input datasets numbers columns must same in every input datasets

9) what is the difference between JOIN, LOOKUP, MERGE stage? ****************************

Join Stage:
a) Join stage is processing stage, it supports Multiple Input links and 1 output link, using this
join stage we can join the multiple inputs and send the data to 1 target.

b) Join stage can perform 4 kinds of joins.


1) Inner Join
2) Left Join
3) Right Join
4) Full join

C) Whenever we are using Full Join, then it will support only 2 Input links.
For rest of the joins it will support any no. of input links. Join stage will not support any reject
link. In

D) Usually whenever reference tables contain huge volume of the data, then Join Stage is
Appropriate, reason, it will not create any paging in the databases.

E) While doing the join, make sure, sort the input data to get appropriate joining results.
Default in the join stage we will perform Hash Partition Technique.

12
Lookup Stage:
A) Lookup stage is processing stage; The Lookup stage is most appropriate when the reference data
for all lookup stages in a job is small enough to fit into available physical memory

B) Each lookup reference requires a contiguous block of shared memory. If the Data Sets are larger
than available memory resources, the JOIN or MERGE stage should be used.

C) The lookup key columns do not have to have the same names in the primary and the reference
links.

D) Lookup stage can support 1 input link, ‘N’ No.of Reference links, 1 output link and 1 reject link.
The optional reject link carries source records that do not have a corresponding entry in the input
lookup tables.

E) Lookup Supports 2 types of Joins


1) Inner Join (lookup failure = Drop)

2) Left outer join (lookup failure = Continues)

F) Using Lookup, we can perform 3 kinds of lookups.


1) Range lookup
2) Normal Lookup
3) Sparse Lookup -- 2 & 3 whenever we are using any connector stages in the reference then
only we can access these 2 options.

Note: Whenever using the sparse lookup it will support only 1 reference link

13
Normal Lookup:
Normal lookup will available in any connector stage in the reference. And normal lookup is appropriate
when the reference data is small enough to fit to the available physical memory (RAM).

In the Reference set the lookup type = Normal

Sparse Lookup:

DataStage sparse lookup is considered an expensive operation because of a round-trip


database query for each incoming row. It is appropriate if the following 2 conditions are
met.

1. The size of reference table is huge, i.e., more than millions of rows. If the
reference table is small enough to fit into memory entirely, normal lookup is a
better choice.
2. The number of input rows is less than 1% of the reference table. Otherwise, use a
Join stage.
3. Here each source record directly executes in the reference database level. In the
reference we will write the SQL code like below using ORCHESTRATE operator.

14
Default Lookup Stage Supports Entire Partition Technique.

15
Merge:

A) The Merge stage is a processing stage. It can have 1 Master input links, 1 Master output link, N no.
of Updated Input links and same Number of Reject link.

B) Merge stage combines a master dataset with one or more update datasets based on the key
columns. The output record contains all the columns from master record plus any additional columns
from each update record that are required.

C) The data sets input to the Merge stage must be key partitioned and sorted. This ensures that
rows with the same key column values are located in the same partition and will be processed by the
same node. It also minimizes memory requirements because fewer rows need to be in memory at any
one time.

D) Merge Stage support 2 types of joins


1) Left Join (unmatched master mode = keep)
2) Inner Join (unmatched master mode =drop)

10) What is parameter and What is Parameter Set and How many Ways we can create?

Using Parameter, we can provide the different values in the run time, this is called parameter.

In Data stage we can create parameter in 2 ways, they are


.
A) Project level Parameters (we can create it 2 ways)
I) Parameter Set
II) Administrator Level
These are useful and we can reuse anywhere in the project
B) Job level
These we can create with in the job and these are job specific.

16
11) what is the modify stage, how it will use?

1) Modify stage is a processing stage that alters the record schema of the input data
2) Modify stage can have a single input and a single output link.
3) Modify stage can also be used to handle NULL values, string, date, time and timestamp manipulation
functions.
4) Modify stage is a native parallel stage and has performance benefits over the Transformer stage
Keeping and Dropping Fields
Invoke the modify operator to keep fields in or drop fields from the output. Here are the effects of keeping
and dropping fields:

12) what is pivot Enterprise Stage, what is horizontal/vertical pivot?

Pivot enterprise stage is a processing stage which pivots data vertically and horizontally depending upon
the requirements. There are two types

1. Horizontal
2. Vertical

17
13) How Many Ways we can Remove the Duplicates in data stage? ***********

Input data:
empno,ename,job,sal
1200, abc123, xyz,2500
1200, abc123, xyz,2500
1200, abc123, xyz,2500
1202, lmn789, aaa,2600
1202, lmn789, aaa,2600
1204, abc, xyz,2700
1206, lmn, aaa,2800
1208, abc, xyz,2900

Out Put:
empno,ename,job,sal
1200, abc123, xyz,2500
1202, lmn789, aaa,2600
1204, abc, xyz,2700
1206, lmn, aaa,2800
1208, abc, xyz,2900

Case-1:
Using Remove Duplicate stage, we can remove the duplicates.

18
Case-2:

Using Sort Stage, we can remove the duplicates in 3 types.

Process1: in the Sort we have an option called ‘Allow Duplicates = true/false’, if we enable to false,
then it will restrict the data without duplicates

Process2: in Sort stage using “create key change column = true’, then it will sort the data based on sort
key columns, and it will generate additional column called ‘key change’ and in that, I a group first row it
will treat a 1, next subsequent records it will treat a 0. After this conversion, then using filter stage
where ever key_change=1 then we can send to one output.

19
Then in filter stage filter the data like below

Process:3 from the link sort we can remove the duplicates and find the options
Partition > hash >select column> perform sort > unique

20
Way3:
Using Transformer stage variable can perform/remove the duplicates like below
 Extract the data
 Sort the data based on key columns
 In transformer stage, take 2 stage variable and apply the below logic.

 Use the stage variable in constraint level, to filter the data where ever key is 1

21
Way4:
Using Aggregator Stage

14) what is the surrogate key stage in Data stage?

Surrogate Key is a unique identification key. It is alternative to natural key,


And in natural key, it may have alphanumeric composite key, but the surrogate is always single numeric
key.

The surrogate key generates sequential incremental and unique integers for a provided start point. It can
have a single input and a single output link.

WHAT IS THE IMPORTANCE OF SURROGATE KEY?

Surrogate Key is a Primary Key for a dimensional table/ (Surrogate key is alternate to Primary Key) The
most importance of using Surrogate key is not affected by the changes going on with a database.

And in Surrogate Key Duplicates are allowed, where it can’t be happened in the Primary Key.

15) what is the order of Transformer Stage Execution?


 Stage variable
 Loop Condition
 Constraint
 Derivation

Note: Transformer stage is a processing stage, it will support 1 input link, N Output links, using this we
can perform all kinds of data validation and filter conditions we can apply.

16) what are all the system variables?


 @TRUE
 @INROWNUM
 @OUTROWNUM
 @ITERATION
 @NUMPARTITION
 @PARTITIONNUM

22
 @FALSE

17) What is the stage variables in Transformer stage?


In transformer stage, stage variables are used to store the values temporarily, and can usable any time
in the same job.

18) how to generate the counter or sequence number in transformer stage, if a job is running on 2 or
4 nodes?

@PARTITIONNUM + @NUMPARTITIONS * (@INROWNUM-1) +1

19) what are all the Null Handing functions in Transformer stage?
 IsNotNull
 IsNull
 NullToEmpty
 NullToZero
 NullToValue
 SetNull

20) what is macro in Transformer stage?

21) what is the Resource Estimation in Datastage Designer?

23
Use the Resource Estimation window to estimate and predict the system resource utilization of
parallel job runs.

22) How to Import/Export the Datastage Job Using Command line/Unix? ****

ISTOOL for EXPORT IMPORT Datastage Components

Location of command:
UNIX: /opt/IBM/InformationServer/Clients/istools/cli
Windows: \IBM\InformationServer\Clients\istools\cli

Complete Project Export : **


cd /opt/IBM/InformationServer/Clients/istools/cli
./istool export -dom XYZ123:9080 -u dsadm -p dsadm -ar /tmp/Test1.isx -
ds XYZ123/Test1/*/*.*

JobWise Export :
cd /opt/IBM/InformationServer/Clients/istools/cli
./istool export -dom XYZ123:9080 -u dsadm -p dsadm -ar /tmp/Test1.isx -
ds XYZ123/Test1/Jobs/TestJob.dsx

Syntax : ./istool -dom [domain]:9080 -u [user] -p [password] -ar


[Path/ExportFileName.isx] -ds [domain/ProjectName/Jobs/JobNameName.pjb] -
[Options]

Import Datastage Components From Command Line


-----------------------------------------------------------------
cd /opt/IBM/InformationServer/Clients/istools/cli
./istool import -dom XYZ123:9080 -u dsadm -p dsadm -ar /tmp/Test1.isx -
ds XYZ123/Test1

24
Syntax : ./istool -dom [domain]:9080 -u [user] -p [password] -ar
[Path/ExportFileName.isx] -ds [domain/ProjectName]
Check List Of Datastage Projects from Command Line:

DSHOME=`cat /.dshome`;
echo $DSHOME

DSHOME=`cat /.dshome`
cd $DSHOME
. ./dsenv
cd ./bin
./dsjob -lprojects => { For Listing the Datastage Projects}
./dsjob -ljobs project => { For Listing the Datastage Jobs in given Project}

23) To get a list of DataStage jobs that are running use a command similar to the following
UNIX command:

ps -ef | grep DSD.RUN

ps -ef | grep *.RUN

ps -ef | grep userid (userid is your userid)

3) Look for any process with dsapi_slave and kill them using this command

kill -9 pid (process id is the first numeric column after your id)

25
24) I have a field in a Project repository, I need to identify that field is flowing in which job?

Using data flow analysis option, we can analyze.

Above screen shot is explaining us, empno column is flow till where.

25) What is local Container/shared containers in data stage? **

Containers:
A container is a group of stages and links. Containers enable you to simplify and modularize
your job designs by replacing complex areas of the diagram with a single container stage.

There are 2 types of containers in datastage

Local Containers:

 Local containers. These are created within a job and are only
accessible by that job. Their main use is to 'tidy up/ simplify' a job
design. Local container we can de-construct as well. Right click on that
container and select the de-construct

26
.
Step1: Select list of the stages and links from job canvas
Step2: go to Edit  containers  Local

These local containers can’t re-use, it just simplifies the job design.

Shared Containers:

These are created separately and are stored in the Repository in the same way that jobs are.
These containers logic can reuse anywhere in the project.

If we wanted to destructive the job, first need to convert into local container, then local to we
have to de-construct the job.

27
Edit -- > Construct Containers  Select Shared

This portion of the logic will save in the repository and can able to re-use the same logic in entire
project level.

Ref: https://www.ibm.com/docs/en/iis/11.7?topic=reusable-shared-containers

26) how to run the datastage job from command line interface/unix?

.\dsjob –run <projectname> <job name>

28
Sequencers Questions:

1) what is job activity stage?


Job activity usually invoke the parallel job and run

In this we can invoke single parallel job at a time, if we have many parallel job those many Job activities
need to use.

2) I have a 5 job activity stages in a sequencer, here first job completed, 2 nd job aborted even
though without stop 3 and 4 & jobs need to run, what is the procedure?

In the job activity stage triggers, don’t mention any condition just keep as is as ‘unconditional’

3) I have 5 jobs, in sequencer in that 3rd job failed, job should restart from where it is aborted,
how to achieve this?
 In each activity enable to execution action = reset if required, then run

29
 In job properties enable the “Add check point so sequence is restart able” option

It will identify the error code, once the parallel job issue is fixed and compile the job, run
the sequencer from Datastage Directory itself.
4) how to execute the Unix commands in sequencer?
Using execute command activity stage

5) how can we pass the multiple commands in execute command activity stage?
Each command can separate by secolon (;)

30
6) what is mean by Any /All in Sequencer?
Any: if any job activity got finished, then it will go to next dependent jobs
ALL: jobs activity will wait until all jobs got completed

7) what are all the types of routines?

Datasatge has 2 types of routines ,Below are the 2 types.

1.Before/After Subroutine.
2.Transformer routines/Functions.

Before/After Subroutines :

These are built-in routines.which can be called in before or after subroutines.


Below is the list of the same.

1.DSSendMail :Used to Send mail using Local send mail program.


2.DSWaitForFile : This routine is called to suspend a job until a named job either
exists, or does not exist.
3.DSReport :Used to Generate Job Execution Report.
4.ExecDos: This routine executes a command via an MS-DOS shell. The command
executed is specified in the routine's input argument.
5.ExecDOSSilent. As ExecDOS, but does not write the command line to the job log.
6. ExecTCL. This routine executes a command via an Info Sphere Information
Server engine shell. The command executed is specified in the routine's input
argument.
7.ExecSH:This routine executes a command via a UNIX Korn shell.
8.ExecSHSilent:As ExecSH, but does not write the command line to the job log.

Transformer Routines:

Transformer Routines are custom developed functions, as you all know even DS has
some limitations corresponding to inbuilt functions(TRIM,PadString,.etc), like in DS
version 8.1 we don’t have any function to return ASCII value of a character, Now
from 8.5 they have introduced seq() function for above mentioned scenario.

These Custom routines are developed in C++ Writing a routine in CPP and
linking it to our datastage project is really simple task as follows,

 Write CPP code


 Compiling with the required flags.
 Put the output file in a shared dir.
 Link it in the datastage.
 Use it in a transformer like other functions.

31
3. Import .dsx file from command line
SOL: DSXImportService -ISFile dataconnection –DSProject dstage –DSXFile
c:\export\oldproject.dsx
4. Generate Surrogate Key without Surrogate Key Stage
SOL: @PARTITIONNUM + (@NUMPARTITIONS * (@INROWNUM – 1)) + 1
Use above Formula in Transformer stage to generate a surrogate key.

6. The connection was refused or the RPC daemon is not running


(81016)
RC: The dsprcd process must be running in order to be able to login to
DataStage.
If you restart DataStage, but the socket used by the dsrpcd (default is
31538) was busy, the dsrpcd will fail to start. The socket may be held by
dsapi_slave processes that were still running or recently killed when
DataStage was restarted.
SOL: Run “ps -ef | grep dsrpcd” to confirm the dsrpcd process is not running.
Run “ps -ef | grep dsapi_slave” to check if any dsapi_slave processes exist. If
so, kill them.
Run “netstat -a | grep dsprc” to see if any processes have sockets that are
ESTABLISHED, FIN_WAIT, or CLOSE_WAIT. These will prevent the dsprcd from
starting. The sockets with status FIN_WAIT or CLOSE_WAIT will eventually
time out and disappear, allowing you to restart DataStage.
Then Restart DSEngine. (if above doesn’t work) Needs to reboot the
system.

9. To stop the datastage jobs in linux level


SOL: ps –ef | grep dsadm
To Check process id and phantom jobs
Kill -9 process_id

10. To run datastage jobs from command line


SOL: cd /opt/ibm/InformationServer/server/DSEngine

32
./dsjob -server $server_nm -user $user_nm -password $pwd -run
$project_nm $job_nm

5. To display all the jobs in command line


SOL:
cd /opt/ibm/InformationServer/Server/DSEngine/bin ./dsjob -ljobs
<project_name>

REUSABILITY IN DATASTAGE

Below are some of the ways through which reusability can be achieved in
DataStage.
 Multiple Instance Jobs.
 Parallel Shared Container
 After-job Routines.

what is Multiple-Instance Jobs?


Generally, in data warehousing, there would be scenarios/requirement to
load the maintenance tables in addition to the tables in which we load the
actual data. These maintenance tables typically have data like
1. Count of records loaded in the actual target tables.
2. Batch Load Number/Day on which the load occurred
3. The last processed sequence id for a particular load

Parallel Shared Container:


A shared container would allow a common logic to be shared across multiple
jobs. Enable RCP at the stage preceding the Shared container stage.For the
purpose of sharing across various jobs, we will not propagate all the column
metadata from the job into the stages present in shared container.
Also ,ensure that jobs are re-compiled when the shared container is changed.
In our project, we have used shared container for implementing the logic to
generate unique sequence id for each record that is loaded into the target
table. [We will discuss different ways to generate sequence id (one-up
number) in later section.]

33
After-job Routines:
After/Before job subroutines are types of routines which run after/before the
job to which the routine is attached. We might have a scenario where in we
shouldn’t have any of the input records to be rejected by any of the stages in
the job. So we design a job which have reject links for different stages in the
job and then code a common after-job routine which counts the number of
records in the reject links of the job and aborts the job when the count
exceeds a pre-defined limit.
This routine can be parameterised for stage and link names and can then be
re-used for different jobs

Types SORT OPERATION IN DATASTAGE?


DataStage provides two methods for parallel sorts:
 Link Sort : This is used when we use a keyed partitioning
method when using any other stage (i.e lookup, join, remove duplicate
etc)
 Sort Stage : This is used when we intend to implement a sort
operation using a separate sort stage

What is stable sort?


Since keeping track of relative record location means more work, setting
Stable to "False" will speed up performance.

Stable sort means "if you have two or more records that have the same
exact keys, keep them in the same order on output that they were on
input".

How to handle values in Sequential file stage?


Open Sequential File---> Go to Format--->click on Field Defaults--->bottom
right side you will find AVAILABLE Properties to ADD, Under that select,
"NULL FIELD VALUE" and give the value as " 0" [zero]. You'll get the Null
records in your output sequential file.

Have you ever used or use RCP option in your project?


Yes, to Load Data from Multiple Sources to Target Tables/Files with out Much
Transformation i.e Straight Load.

34
what is the main difference between key change column and cluster key change
column in sort stage?
create key change column generates while sorting the data..It generate one
for first record and zero for rest of the records by group wise..

cluster key column generates on sorted data when sort mode is donot sort

PERFORMANCE TUNING IN DATASTAGE?

It is more important to do the performance tuning in any job of datastage.


If Performance of the Job taking too much time to compile, we need to modify the job design. So that we
can good performance to the Job.

For that

a) Avoid using Transformer stage where ever necessary. For example if you are using Transformer stage
to change the column names or to drop the column names. Use Copy stage, rather than using
Transformer stage. It will give good performance to the Job.

b)Take care to take correct partitioning technique, according to the Job and requirement.

c) Use User defined queries for extracting the data from databases .

d) If the data is less , use Sql Join statements rather then using a Lookup stage.

e) If you have more number of stages in the Job, divide the job into multiple jobs.

Data Profiling:-

Data Profiling performs in 5 steps. Data Profiling will analysis weather the source data is good or dirty or
not.
And these 5 steps are

a) Column Analysis
b) Primary Key Analysis
c) Foreign Key Analysis
d) Cross domain Analysis
e) Base Line analysis

After completing the Analysis, if the data is good not a problem. If your data is dirty, it will be sent for
cleansing. This will be done in the second phase.

35
Data Quality:-

Data Quality, after getting the dirty data it will clean the data by using 5 different ways.

They are

a) Parsing
b) Correcting
c) Standardize
d) Matching
e) Consolidate

Data Transformation:-

After completing the second phase, it will gives the Golden Copy. Golden copy is nothing but single
version of truth. That means, the data is good one now.

how to move project from development to UAT?


By using the Information Server Manager we can move the project from Dev to Uat. Through datastage
manager Export the project into your local machine as .dsx format (project.dsx) from DEV server. The
same .dsx (project.dsx) import into UAT server by using the datastage manager.

How to do error handling in datastage?

Error handling can be done by using the reject file link.what are the errors coming through job needs to
be capture in sequential file and that file needs to be fetch in job which will load this exceptions or errors
in database.

How can u Call the Shell Scripting/Unix Commands in Job Sequence?


There are two scenarios where u might want to call a script

Scenario 1(Dependency exists between script and a job): Where a job has to be executed first then the
script has to run, upon completion of script execution only the sec job has to be invoked. In this case
develop a sequencer job where first job activity will invoke the first job then using Execute command
activity call the script u would desire to invoke by typing "sh <script name>" in the command property of
the activity, then with the other job activity call the second job.

Scenario 2: (Script and job are independent) : In this case right in your parallel job say job1, under job
properties u can find "After-job subroutine" where u need to select "ExecSH" and pass the script name
which you would like to execute. By doing this once the job1 execution completes the script gets invoked.
The job succeeding the job1 say job2 doesn’t wait for the execution of the script.

What is Preserve partitioning?

A stage can also request that the next stage in the job preserves whatever partitioning it
has implemented.

36
It does this by setting the preserve partitioning flag for its output link. Note,
however, that the next stage might ignore this request.

In most cases you are best leaving the preserve partitioning flag in its
default state. The exception to this is where preserving existing partitioning
is important. The flag will not prevent repartitioning, but it will warn you that
it has happened when you run the job. If the Preserve Partitioning flag is
cleared, this means that the current stage doesn’t care what the next stage
in the job does about partitioning. On some stages, the Preserve Partitioning
flag can be set to Propagate. In this case the stage sets the flag on its output
link according to what the previous stage in the job has set. If the previous
job is also set to Propagate, the setting from the stage before is used and so
on until a Set or Clear flag is encountered earlier in the job. If the stage has
multiple inputs and has a flag set to Propagate, its Preserve Partitioning flag
is set if it is set on any of the inputs, or cleared if all the inputs are clear.

Which partition method is faster, Hash, or Modulus?

Answer: Hash and Modulus techniques are Key based on partition


techniques. If all the key columns are numeric data types then we use the
Modulus partition technique. If one or more key columns are text then we
use the Hash partition technique. Modulus is fast comparing with Hash
because we are using only numeric fields for calculation purposes and Mod
function calculation will be done faster than the Hash function.

12. What is a “degenerate dimension”?

According to Ralph Kimball,[1] in a data warehouse, a degenerate dimension is a dimension key in


the fact table that does not have its own dimension table, because all the interesting attributes have
been placed in analytic dimensions.

What are the different options associated with "dsjob" command?

ex: $dsjob -run and also the options like

 stop -To stop the running job

 lprojects - To list the projects

 ljobs - To list the jobs in the project

 lstages - To list the stages present in the job.

 llinks - To list the links.

37
 projectinfo - returns the project information(hostname and project name)

 jobinfo - returns the job information(Job-status,job runtime,endtime, etc.,)

 stageinfo - returns the stage name, stage type, input rows, etc.,)

 linkinfo - It returns the link information

 lparams - To list the parameters in a job

 paraminfo - returns the parameters info

 log - add a text message to log.

 logsum - To display the log

 logdetail - To display with details like event_id, time, message

 lognewest - To display the newest log id.

 report - display a report contains Generated time, start time, elapsed time, status,

etc.,

 jobid - Job id information.

What are the collectors available in the collection library?

The collection library contains three collectors:

1. The ordered collector

2. The round-robin collector

3. The sortmerge collector

I have a source file with 14 columns. How to extract 10 fields out of it without
changing jobs.
Answer: awk –F ‘delimiter‘ {print $1,$2..$10}

How can you display the duplicate values from unix?


Answer: Uniq –D

38
How to keep recent 5 days log files and remove rest of log files from log directory?
Answer: Unix Command : find /home/input/files* -mtime +5 -exec rm {} ;

How to kill the data stage sessions from unix.


Answer: ps –eaf | grep slave| grep ds api and then find job pid and then kill -9

Interview Questions:
Question)
My source Path Contain 10 files (csv files) with same metadata, but data is different, day by day files may
increase, but I have to write each file into different files with psv file format.w

How to achieve this task?

Question) What is complex job design in your project.

Question) I have sequence which contains a 5 Job Activity Stages, if 2nd job is getting aborted, 3rd job
should run without stop the job.

Question) I have a to pass the 5 different commands in using one Execute command activity stage, how
can we do that?

Question) I have a sequencer job with 10 job activities, in that 5th job got aborted, how to restart the job,
from the point of the failure.

Question) what is the sequencer and what are all the options it contains. (difference between ANY/ALL)

Question) What is Start loop and end loop, what it will do?

Question) what is Terminator Activity stage and how it works?

1) What is the parameter and how many ways we can create the parameters?
Parameters: we can pass the value in run time, for that we have to create a parameter
dynamically.
There are 2 ways we can create the parameters.
1) Local Parameter/ Job level parameter
2) Project Level parameter
a. Parameter set

39
b. Administrator Level Parameters

Job level parameter: It will specific only for that particular job, we can use anywhere in other jobs or
project.

Project level Parameter: we can reuse anywhere any job in the project level.

1) Parameter Set: it is a container, it can hold all the parameters like file path, database
username, password, hostname ….. and it will save in the repository.
2) Administrator Level Environment Variables: In Administrators client in Environment
Variables we can create the parameters and we can all them anywhere in the project.

What is funnel stage?

Funnel Stage is a processing stage it will support multiple input links and 1 output link.
It will combine the multiple input datasets to single target output link.

It is similar to the SQL UNION All function, make sure all the input datasets must be the same number of
columns. And order also the same.

There are 3 types funnel types

1) Continues Funnel
2) Sequence Funnel
3) Sort Funnel

How to remove duplicates without using remove duplicates stage?

Different ways we can remove the duplicates

 Using remove duplicate stage.

Using sort stage


a) In sort stage we have option called “Allow Duplicates= True/False “, if we set False,
we can restrict them.
b) In sort stage using Link level we can remove the duplicates, in input table select ‘Hash’ Partition,
then select ‘Perform sort’ option, there we can find the ‘Unique’ Option, with this we can remove.
c) In the sort stage, using create key change column option enable as true. It will create additional key
column called “Key_Change”, it will generate codes like, in the similar group records first records will
treat as ‘1’ and next subsequent records will treat as ‘0’, after sort stage take a filter and use the where
clause condition, send the 1 records to one target 0 records to another target.

 Using Transformer stage variable concept, we can generate the code like Key_Change codes then put
the filter in constraint level. Below is design and solution

40
How to remove duplicates in SQL?

 Select deptno,count(*) from emp group by deptno having count(*)>1; --only pure duplicates will
select;
 Delete from (select deptno,row_number() over (partition by deptno order by deptno) rn from
emp ) where rn >1;
 Select * from (select deptno,row_number() over (partition by deptno order by deptno) rn from
emp ) where rn >1;

8) Rank and dense_rank?, where and having?

 Rank() – Will skip the sequence


 Dense_Rank() – will not skip the sequence
 Where – where clause we have to use to filter the data before group by.
 Having – Having is also a filter, but we can filter it after grouping the data.

9) Diff between delete truncate, drop? which gives more performances?

41
 Truncate is DDL command, we can delete entire table without paging and we can’t rollback the
data
 Delete is DML command, we can delete portion of the data using “where” clause or we can
delete entire table data. Here we can roll back the data unless commit

10) Tab 1 having 1 1 1 2 5 data tab 2 having 1 1 2 null null what is count for all 4 types of join?

Table-1 table-
(left) 2(right)
1 1
1 1
1 2
2 null
5 null

Inner join: Matched record from the both the tables

total 7 records (here table1 records every time match with table-2)

Table-1 table-2
1 1
1 1
1 1
1 1
1 1
1 1
2 2
Left Join: Matched records from the Left and right tables and unmatched record from the Left table,
corresponding right table will be null

Total count: 8 records

Table-1 table-2
1 1
1 1
1 1
1 1
1 1
1 1
2 2
5 null left unmatched

Right Join: Matched records from the both the tables and unmatched records from the Right table

Total Count: 9

Table-1 table-2

42
1 1
1 1
1 1
1 1
1 1
1 1
2 2
null null
null null

Full join: Matched and unmatched records from both the tables.

Total count: 10

Table-1 table-2
1 1
1 1
1 1
1 1
1 1
1 1
2 2
null null right unmatched
null null
5 null left unmatched

11) Diff between union and union all?

 Union – Union will all allow any duplicates from the both input dataset
 Union All- Will allow duplicates

12) How you achieve union and union all in datastage?

 Union All- we can achieve using Funnel Stage


 Union – we don’t have direct functionality in datastage, after Funnel we can use remove
duplicates stage to eliminate the duplicates

13) I want 5 th line how you do in unix?

 sed -n 5p lines.txt

14) how to remove Control-M characters in unix?

 tr -d '\r' < infile.txt > outfile.txt


(or )

43
 sed -e “s/^M//” filename > newfilename.

15) How you find Control-M characters in datastage?

For single file: $ grep ^M file_name

16) How to find the 4th word (nth word ) from the file:

 echo "This is a temporary change to complete the export"|awk '{ print $4}'

17) how to get the 3 files data into single file?

unixcode@DESKTOP-LCH0EUQ:~$ ls file1 file2.txt file3.txt lines.txt

unixcode@DESKTOP-LCH0EUQ:~$ cat file2.txt file2.txt lines.txt >newfile.txt

18) To find ^M (control +M) characters in the file:


For single file: $ grep ^M file_name

19) scenario from file

I have a text file:

1 Q0 1657 1 19.6117 Exp


1 Q0 1410 2 18.8302 Exp
2 Q0 3078 1 18.6695 Exp
2 Q0 2434 2 14.0508 Exp
2 Q0 3129 3 13.5495 Exp
I want to take the 2nd and 4th word of every line like this:

1657 19.6117
1410 18.8302
3078 18.6695
2434 14.0508
3129 13.5495

Ans) cat filename.txt | awk '{ print $2 $4 }'


awk '{ print $2 $4 }' filename.txt

cut -d' ' -f3,5 < datafile.txt


awk -F, '{print $4}' myFile.csv

20) how to find the unique records 4th columns of the file?

So, I have a dataset which is of the format:

BBS1 ` Bbs1 reg 7 Heart


ASAP2 Asap2 reg 5 Heart
SPATA22 Spata22 reg 1 Heart

44
MYLK4 Mylk4 reg 1 Heart
ATP8A1 Atp8a1 reg 5 Heart

Now the organ name (here Heart) can be different. I there are several
organs that the data is about. I am wondering how I can figure out the
names of the unique elements of that column (column 5)? The data file is
huge.

Ans)

awk '{print $5}' inputFile | sort | uniq


awk '{print $5}' inputFile | sort -u
or
awk '{arr[$5] = 1} END {for (key in arr) {print key}}' inputFile

delete both the header & footer record

#To create a new file with header & trailer removed

sed '1d;$d' FF_EMP.txt > FF_EMP_NEW.txt

How to remove blank lines from a Unix file


sed -i '/^$/d' foo

how to identify a unix file contain header and footer?

How to find 0 KB files in Unix Directory?


find /path/to/dest -type f -empty

#to delete the empty file


$ find . -type f -empty -print -delete

Define Repository tables in Datastage?

In Datastage, the Repository is another name for a data warehouse. It


can be centralized as well as distributed.

45
To run a job when the content of a file is 0
Here is the simple scenario,

To run a DataStage Job based on the count from file "File1". When File1 contains no rows, the
trigger Job 1 else Job 2.

Solution:

Use Exec command stage:

Exec command (wc -l File1)


In Triggers of the same stage (Custom)

(Trigger 1 to trigger Job 1 when the file Count is 0)


Job1:Convert(@FM,"", Exec_Command.$CommandOutput)=0

(Trigger 2 to trigger Job 1 when the file Count is 0)


Job2:Convert(@FM,"", Exec_Command.$CommandOutput)>0

Error calling DSSetParam, code=-4 [ParamValue/Limitvalue is not appropriate]


Whenever we read a value from a file and when we try to write it in to an user variable activity(in
order to use it as parameter) we need to use this below conversion using EReplace.

EReplace(Execute_Command_6.$CommandOutput,@FM,"")

Else we will get this below error message

"Error calling DSSetParam, code=-4 [ParamValue/Limitvalue is not appropriate]"

Reason:

Whenever we read any content from a file to user variable, the field marks will also be written. So in
order the remove the FM, we are using the above command.

Unix command to bring all records(within a column) in a single row with


delimiters

Unix command to bring all records(within a column) in a single row with delimiters:

sed -e 's/$/<CRLF>/' $* | tr -d "\r\n" | sed 's/<CRLF>/,/g' | sed 's/.$//' | sed 's/,/'"','"'/g' | sed
's/$/'"'"'/g' | sed 's/^/'"'"'/'

Example:

Actual File:

46
[user@123]$ cat NewFile.txt

Rule1
Rule2
Rule3
Rule4
Rule5

After applying the above Unix command:

[user@123]$ cat NewFile.txt | sed -e 's/$/<CRLF>/' $* | tr -d "\r\n" | sed 's/<CRLF>/,/g' | sed 's/.$//' |
sed 's/,/'"','"'/g' | sed 's/$/'"'"'/g' | sed 's/^/'"'"'/'

'Rule1′,'Rule2′,'Rule3′,'Rule4′,'Rule5′

dsjob utility to run a DataStage Job with Parameters/Parametersets


cd `cat /.dshome

. ./dsenv > /dev/null 2>&1

This will run the dsenv file which contains all the environment variables.
Without doing this, your UNIX commands won't run on the command prompt.

Run the job after completing the above steps

To run a job:
Using the dsjob command you can start,stop,reset or run the job in
validation mode.

dsjob –run –mode VALIDATE/RESET/RESTART project_name job_name

dsjob –run project_name job_name | job_name.invocationid

Running with the invocationid would mean that the job would be run with that
specific invocation id

Now if you have parameters to set or paratemeterset values to set then this
can also be as set as shown below

dsjob –run –param variable_name="VALUE" –param


psParameterSet="vsValueSet" project_name job_name

To stop a job:

47
Stopping a job is fairly simple. You might not actually require it but still its
worth to take a look. It acts the same way as you would stop a running job the
Datastage director.

dsjob –stop project_name job_name|job_name.invocationid

To list projects, jobs, stages in jobs, links in jobs, parameters in jobs and
invocations of jobs
dsjob can very easily give you all the above based on the different keywords.
It will be useful for you if you want to get a report of what's being used in what
project and things like that

The various commands are shown below

'dsjob –lprojects' will give you a list of all the projects on the server

'dsjob –ljobs project_name' will give you a list of jobs in a particular project

'dsjobs –lstages project_name job_name' will give you a list of all the stages
used in your job. Replacing –lstage with –links will give you a list of all the links in
your job. Using –lparams will give you a list of all the parameters used in your job.
Using –linvocations will give you a list of all the invocations of your multiple instance
job.

To generate reports of a job


You can get the basic information of a job buy using the 'jobinfo' option as
shown below

dsjob -jobinfo project_name job_name

Running this command will give you a short report of your job which includes
The current status of the job, the name of any controlling job for the job, the date
and time when the job started , the wave number of the last or current run (internal
InfoSphere Datastage reference number) and the user status

You can get a more detailed report using the below command

dsjob -report project job_name BASIC|DETAIL|XML

48
To access logs:
You can use the below command to get the list of latest 5 fatal errors from
the log of the job that was just run

dsjob -logsum –type FATAL –max 5 project_name job_name

You can get different types of information based on the keyword you specify
for –type. Full list of allowable types are available in the help guide for reference

Removing the field marks in DataStage sequence Job


Following command can be used in to remove Field Marks while reading data from Execute Command
stage in DS Sequence job

Trim(Convert(@FM, ", ExeCmd.$CommandOutput))

Parameters to sequence Jobs using shell script


Parameters can be passed to the job sequence using a shell script.

Following is the command to run the sequence with passing the parameters :

dsjob -run mode Normal -param param=value projectname SequenceName

Here,

"param" is the name of parameter defined in the job properties of the sequence.
"value" is the value of the parameter that you want to pass to this sequence.

For More Real time concepts follow the below link:

http://mydatastagesolutions.blogspot.com/2015/04/how-to-test-odbc-connection-from-putty.html

49
What is a routine in DataStage?
DataStage Manager defines a collection of functions within a routine. There are basically three
types of routines in DataStage, namely, job control routine, before/after subroutine, and
transform function.

What is the quality state in DataStage?


The quality state is used for cleansing the data with the DataStage tool. It is a client-server
software tool that is provided as part of the IBM Information Server.

How to do DataStage jobs performance tuning?


First, we have to select the right configuration files. Then, we need to select the right partition
and buffer memory. We have to deal with the sorting of data and handling null-time values. We
need to try to use modify, copy, or filter instead of the transformer. Reduce the propagation of
unnecessary metadata between various stages.

What is a repository table in DataStage?


The term ‘repository’ is another name for a data warehouse. It can be centralized or
distributed. The repository table is used for answering ad-hoc, historical, analytical, or
complex queries.

Describe the DataStage architecture briefly.


IBM DataStage preaches a client-server model as its architecture and has different
types of architecture for its various versions. The different components of the client-
server architecture are :

 Client components
 Servers
 Stages
 Table definitions
 Containers
 Projects
 Jobs

50
Name the command line functions to import and export the DS jobs?
The dsimport.exe function is used to import the DS jobs, and to export the DS jobs,
dsexport.exe is used.

What is Usage Analysis in DataStage?


If we want to check whether a certain job is part of the sequence, then we need to right-click on
the Manager on the job and then choose the Usage Analysis.

How do we clean a DataStage repository?


For cleaning a DataStage repository, we have to go to DataStage Manager > Job in the menu bar >
Clean Up Resources.

If we want to further remove the logs, then we need to go to the respective jobs and clean up the log files.

How do we call a routine in DataStage?

Routines are stored in the Routine branch of the DataStage repository. This is where we
can create, view, or edit all the Routines. The Routines in DataStage could be the
following: Job Control Routine, Before-after Subroutine, and Transform function.

Q #20) How do you import and export the Datastage jobs?


Answers: For this, below command-line functions for this
 Import: dsimport.exe
 Export: dsexport.exe

Q #26) What is the difference between passive stage and active stage?
Answers: Passive stages are utilized for extraction and loading whereas active stages are
utilized for transformation.

Q #27) What are the various kinds of containers available in Datastage?


Answers: We have below 2 containers in Datastage:
 Local container
 Shared container

Q #30) What is the use of Datastage director?


Answers: Through Datastage director, we can schedule a job, validate the job, execute the
job and monitor the job.

51
Q #32) What is a quality stage?
Answers: The quality stage (also called as integrity stage) is a stage that aids in combining
the data together coming from different sources.

Ref: https://www.naukri.com/learning/articles/top-datastage-interview-questions-and-answers/

52

You might also like