0% found this document useful (0 votes)
47 views4 pages

6. Week 6 Assignment 06

The document contains solutions to a quiz on Cloud Computing and Distributed Systems, covering topics such as HBase, Apache Cassandra, and the CAP theorem. It includes multiple-choice questions with correct answers and explanations related to distributed databases and their functionalities. Key concepts discussed include data storage structures, scaling strategies, and the role of components like memtables and snitches in Cassandra.

Uploaded by

lokshanadv7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views4 pages

6. Week 6 Assignment 06

The document contains solutions to a quiz on Cloud Computing and Distributed Systems, covering topics such as HBase, Apache Cassandra, and the CAP theorem. It includes multiple-choice questions with correct answers and explanations related to distributed databases and their functionalities. Key concepts discussed include data storage structures, scaling strategies, and the role of components like memtables and snitches in Cassandra.

Uploaded by

lokshanadv7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Quiz Assignment-VI Solutions: Cloud Computing and Distributed Systems (Week-6)

___________________________________________________________________________

Q. 1 HBase is a distributed ________ database built on top of the Hadoop file system.

A. Row-oriented
B. Tuple-oriented
C. Column-oriented
D. None of the mentioned

Answer: C) Column-oriented

Q. 2 Apache Cassandra is a massively scalable open source _______ database.

A. SQL
B. NewSQL
C. NoSQL
D. None of the mentioned

Answer: C) NoSQL

Explanation: Apache Cassandra is a free and open-source, distributed, wide column store,
NoSQL database management system designed to handle large amounts of data across many
commodity servers, providing high availability with no single point of failure.

Q. 3 A small chunk of data residing in one machine which is part of a cluster of machines
holding one Hbase table is known as__________________

A. Rowarea
B. Tablearea
C. Split
D. Region

Answer : D) Region

Q. 4 Cell in HBase Table is a combination of _________________________


A. Row and column family
B. Row, column family and column qualifier
C. Row, column family, column qualifier and row keys
D. Row, column family, column qualifier and contains a value and a timestamp

Answer: D) Row, column family, column qualifier and contains a value and a timestamp
Q. 5 A bloom filter is a bit vector V with m entries and k (ideally-independent) hash
functions. To insert a new element i, we set V[hj(i)] =1 for each of the k hash functions hj. To
check whether an element i is in the set, we AND together each value V[hj(i)] for all k hash
functions.
𝑚𝑚
The optimal number of hash functions is k= 𝑛𝑛 ln2. For that number of hash functions, and
𝑚𝑚 𝑙𝑙𝑙𝑙𝑙𝑙
given a choice of false positive rate, the ratio 𝑛𝑛
= − (𝑙𝑙𝑙𝑙2)2 , where n is the number of keys in
the table and p is the false positive rate.

Compute the approximate number of bits per entry and the number of hash functions needed
for a false positive rate of 1%.

A. 6.6 bits per key and 6 hash functions


B. 9.6 bits per key and 7 hash functions
C. 12.6 bits per key and12 hash functions
D. 9.6 bits per key and 6 hash functions

Answer: B) 9.6 bits per key and 7 hash functions


𝑚𝑚 ln 0.01
Explanation: For p = 0.01 (i.e., 1%), we need 𝑛𝑛
= − (𝑙𝑙𝑙𝑙2)2 ≈ 9.6. So, we need about 9.6
times as many entries as there are keys, which is 9.6 bits per key or about 1.2 billion bytes.
𝑚𝑚
Number of hash functions will be k = 𝑛𝑛
ln 2 ≈ 9.6 ln 2 ≈ 6.6. Of course, we cannot use 6.6
hash functions. There are a few ways we could handle this, but a natural one is just to round
up to 7 hash functions.

Q. 6 Consider the following statements:

Statement 1: Scale out means grow your cluster capacity by replacing with more powerful
machines.

Statement 2: Scale up means incrementally grow your cluster capacity by adding more
COTS machines (Components Off the Shelf).

A. Only statement 1 is true


B. Only statement 2 is true
C. Both statements are true
D. Both statements are false

Answer: D) Both statements are false


Explanation: The correct statements are:

Scale up: grow your cluster capacity by replacing with more powerful machines

Scale out: incrementally grow your cluster capacity by adding more COTS machines
(Components Off the Shelf)
Q. 7 Cassandra uses a protocol called _______ to discover location and state information.

A. HBase
B. Gossip
C. Key-value
D. None of the mentioned

Answer: B) Gossip

Q. 8 Fill the correct choices for the given scenarios:

P: _________________Reads/writes complete reliably and quickly.

Q: __________________When thousands of customers are looking to book a flight, all


updates from any client (e.g.- book a flight) should be accessible by other clients.

R: _________________Can happen across datacenters when the Internet gets disconnected

A. P: Availability, Q: Consistency, R: Partition tolerance


B. P: Consistency, Q: Availability, R: Partition tolerance
C. P: Partition tolerance, Q: Consistency, R: Availability
D. P: Consistency, Q: Partition tolerance, R: Availability

Answer: A) P: Availability, Q: Consistency, R: Partition tolerance


Explanation:

CAP Theorem:

1. Consistency: All nodes see same data at any time, or reads return latest written value by
any client.

2. Availability: The system allows operations all the time, and operations return quickly.

3. Partition-tolerance: The system continues to work in spite of network partitions.

Q. 9________________ a memory cache to store the in memory copy of the data. It


accumulates writes and provides read for data which are not yet stored to disk.

A. Distributed Hash tables (DHT)


B. Collection
C. SSTable
D. Memtable

Answer: D) Memtable
Explanation: Memtable is a memory cache to store the in memory copy of the data. Each
node has a memtable for each CQL table. The memtable accumulates writes and provides
read for data which are not yet stored to disk.

Q. 10 In Cassandra, ______________job is to determine which data centers and racks it


should use to read data from and write data to.

A. Client requests
B. Partitioner
C. Snitch
D. None of the mentioned

Answer: C) Snitch

Explanation:

Snitch maps IPs to racks and data centers configured in cassandra.yaml config file .

__________________________________________________________________________

You might also like