0% found this document useful (0 votes)
92 views7 pages

Cassandra and Data Handling

The document discusses various concepts related to Apache Cassandra such as data modeling, compaction strategies, wide rows, real-time ingestion using Spark and Kafka, analytics using Spark, performance tuning and monitoring metrics, and data loading tools like sstableloader and COPY command. It provides answers to questions on Cassandra features like secondary indexes, consistency, replication factor, configuration files, APIs, and best practices for modeling Cassandra data.

Uploaded by

Lynch George
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views7 pages

Cassandra and Data Handling

The document discusses various concepts related to Apache Cassandra such as data modeling, compaction strategies, wide rows, real-time ingestion using Spark and Kafka, analytics using Spark, performance tuning and monitoring metrics, and data loading tools like sstableloader and COPY command. It provides answers to questions on Cassandra features like secondary indexes, consistency, replication factor, configuration files, APIs, and best practices for modeling Cassandra data.

Uploaded by

Lynch George
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 7

The type of __________ strategy Cassandra performs on your data is configurable and

can significantly affect read performance.


compaction

In which of the following scenarios can we use 'Wide rows'?


All the options

What is key that dictates how the rows are ordered on reads?
Shapes

Which among the following is undesirable in a relational data model, but not in
Cassandra?
Denormalization

Point out the correct statement :


All of the mentioned

Cassandra searches the __________ to determine the approximate location on disk of


the index entry.
All of the mentioned

Cassandra searches the __________ to determine the approximate location on disk of


the index entry.
partition summary

Sqoop works via JDBC connection


True

___________________ is the tool that work with cassandra to make data transfer from
RDBMS systems possible.
Sqoop

Cassandra has support to which among the following RDBMS systems


All the options

Cassandra support data transfer from RDBMS systems in and out of Cassandra
True

________ is used to ingest data into Cassandra in Real-Time while working with
Spark
SparkStreaming

In real time processing the data under consideration is Data-in-motion


True

Real-time Analytics Use cases include


All the options

Which among the following is possible in cassandra-Spark combination and not


delivered as a feature of cassandra alone
All the options

Does Cassandra support JOIN operations?


Yes,It support JOINS while in conjuction with Spark.

____________ is used to connect Spark and Cassandra


Spark connector
Which of the hadoop Components enables you to run analytics on your Cassandra data?
All the options

What is the purpose of using Thrift in Cassandra?


facilitate access to the DB

What is SuperColumn in Cassandra?


unique element

Select the best advantage of Analytics with cassandra?


Fraud detection

What is the use of Source Command in Cassandra?


execute a file

ColumnFamily refers to a structure having infinite number of rows.


True

What is the default log level used by Log4J in cassandra


INFO

What is the optimum number of concurrent_reads per processor core ?


4

Key Cassandra metrics that are important in Performance monitoring


All the options

What is the biggest perrformance gain in cassandra write operations?


commit log in a separate disk

Which is the least verbose logging level in cassandra


trace

In cassandra consistency is achieved through consistency tuning mechanisms


True

Which among the following is not a performance measurement tool


None

Use secondary index if you want to query a column which is not a primary key/not
part of composite key
True

Which among the following is a high-level goals of your Data-Model?


minimizing the number of data-duplication -- wrong

_________ is a Cassandra feature that optimizes the cluster consistency process


Hinted handoff

When do you have to avoid using secondary indexes


can use it on any columns without effecting performance -- wrong

Using sstableloader we can load


both pre-exiting sstables and external data

What can also be attributed as wide-row in apache Cassandra


Compound Key
What is Replication Factor in Cassandra?
number of data copies existing

Command used to imports data from CSV file into an existing table
COPY FROM

Which directory contain Cassandra configuration files


conf

Which among the following is undesirable in a relational data model, but not in
Cassandra?
Denormalization

Which among the following can be used as COPY option


All the options

While loading external data into a cluster


both the options -- wrong

State whether the statement is true or false : Cassandra runs on RedHat


True

COPY command can be used to read data


All the options

Cassandra has API support for which of the following


All the options

sstableloader uses __________ protocol to learn the topology of the cluster.


gossip

Using sstableloader data loading into a live, active cluster is not allowed.
False

Partition index is list of partition keys and the start position of rows in the
data file (on disk).
True

What is the default Partitioner in apache Cassandra cluster


Murmur3Partitioner

What kind of files can be imported or exported using the COPY command
csv

Is there any relation between the directory that hold sstables and the name of the
keyspace of the sstable
Both name has to be same

Tool that streams sstables to a live cluster


sstableloader

What is the need of a partition key?


decompression -- wrong

Real-time data ingestion in Cassandra can be done using


both Spark and Kafka
It is wise to use secondary indexes on the columns you want to be querying on has
few unique values
True

Hi Shiva,

Thanks.. I completed yesterday only and sent you dumps .

Thanks,
Hiren Kalavadia
Mob: +91 75670 70987
TCS – Comcast Relationship

From: Gande, Shiva (Contractor)


Sent: Monday, October 8, 2018 4:35 AM
To: Kalavadia, Hiren (Contractor)
Subject: Cassandra Data Modeling - I got 18 out of 25 some may be wrong,,,

Point out the correct statement


- All the optins

Cassandra searches the __________ to determine the approximate location on disk of


the index entry.
partition summary

Which among the following is undesirable in a relational data model, but not in
Cassandra?
DeNormalization

What is key that dictates how the rows are ordered on reads?
Comparator

In which of the following scenarios can we use 'Wide rows'


All the options

Cassandra searches the __________ to determine the approximate location on disk of


the index entry
partition record -- Wrong
partition search -- Wrong

The type of __________ strategy Cassandra performs on your data is configurable and
can significantly affect read performance
compaction

Cassandra has support to which among the following RDBMS system


All the options

Cassandra support data transfer from RDBMS systems in and out of Cassandra
True
What can also be attributed as wide-row in apache Cassandra
Clustering Key

Real-time data ingestion in Cassandra can be done using


both Spark and Kafka

Source Command in Cassandra is used to?


execute a file

In real time processing the data under consideration is Data-in-motion


True

Which among the following is possible in cassandra-Spark combination and not


delivered as a feature of cassandra alone
All the options

Cassandra supports joins while working in conjunction with spark


True

is used to connect Spark and Cassandra


Spark Combiner -- Wrong

Does Cassandra support JOIN operations?


Yes it supports JOINS while in conjunction

is used to ingest data into Cassandra in Real-Time while working with Spark

SparkStreaming

Real-time Analytics Use cases include


All the options

Select the best advantage of Analytics with cassandra


Fraud detection

Which of the hadoop Components enables you to run analytics on your Cassandra data
All the options

ColumnFamily refers to a structure having infinite number of rows


True

What is SuperColumn in Cassandra


column keys

What is the purpose of using Thrift in Cassandra


access to DB

Which is the least verbose logging level in cassandra


error

What is the default log level used by Log4J in cassandra


INFO
What is the biggest perrformance gain in cassandra write operations
commit log in separate desk

Which among the following is not a performance measurement tool


iostat

In cassandra consistency is achieved through consistency tuning mechanisms


True

Key Cassandra metrics that are important in Performance monitoring


All the options

What is the optimum number of concurrent_reads per processor core


4

Use secondary index if you want to query a column which is not a primary key/not
part of composite key
True

Which among the following is true


All the options

What is the best method to store row data in a sorted order


use a primary key

What is the need of a partition key


identify the partition

Which among the following is true about Thrift API


used to read and write to DB

When do you have to avoid using secondary indexes


less account of unique values

sstableloader uses __________ protocol to learn the topology of the cluste


all of the mentioned

_ is a Cassandra feature that optimizes the cluster consistency process


hindeted handoff

What is Replication Factor in Cassandra


number of data copies existing

Which among the following is a high-level goals of your data-model


date-duplicaiton minimize

Using sstableloader external data cannot be loaded into the cluster


false

Which among the ffollowing is true about COPY command


all the above

Cassandra supports which of the below API's to retrieve and manipulate data
Thrift API

COPY command can be used to read data


All the options
You can enable or disable hinted handoff in the cassandra.yaml file
true

JMX stands for


Java management extension

Which directory contain Cassandra configuration files


conf

While loading external data into a cluster


both the options

Which of the following is used to load the data in batch


all the options

Using sstableloader we can load


external data

Command used to imports data from CSV file into an existing table
COPY FROM

Thanks,
Shiva Gande
Cell : +1 610 998 5523
Desk : +1 856 792 2288
TCS – Comcast Relationship

You might also like