6. Week 6 Assignment 06
6. Week 6 Assignment 06
___________________________________________________________________________
Q. 1 HBase is a distributed ________ database built on top of the Hadoop file system.
A. Row-oriented
B. Tuple-oriented
C. Column-oriented
D. None of the mentioned
Answer: C) Column-oriented
A. SQL
B. NewSQL
C. NoSQL
D. None of the mentioned
Answer: C) NoSQL
Explanation: Apache Cassandra is a free and open-source, distributed, wide column store,
NoSQL database management system designed to handle large amounts of data across many
commodity servers, providing high availability with no single point of failure.
Q. 3 A small chunk of data residing in one machine which is part of a cluster of machines
holding one Hbase table is known as__________________
A. Rowarea
B. Tablearea
C. Split
D. Region
Answer : D) Region
Answer: D) Row, column family, column qualifier and contains a value and a timestamp
Q. 5 A bloom filter is a bit vector V with m entries and k (ideally-independent) hash
functions. To insert a new element i, we set V[hj(i)] =1 for each of the k hash functions hj. To
check whether an element i is in the set, we AND together each value V[hj(i)] for all k hash
functions.
𝑚𝑚
The optimal number of hash functions is k= 𝑛𝑛 ln2. For that number of hash functions, and
𝑚𝑚 𝑙𝑙𝑙𝑙𝑙𝑙
given a choice of false positive rate, the ratio 𝑛𝑛
= − (𝑙𝑙𝑙𝑙2)2 , where n is the number of keys in
the table and p is the false positive rate.
Compute the approximate number of bits per entry and the number of hash functions needed
for a false positive rate of 1%.
Statement 1: Scale out means grow your cluster capacity by replacing with more powerful
machines.
Statement 2: Scale up means incrementally grow your cluster capacity by adding more
COTS machines (Components Off the Shelf).
Scale up: grow your cluster capacity by replacing with more powerful machines
Scale out: incrementally grow your cluster capacity by adding more COTS machines
(Components Off the Shelf)
Q. 7 Cassandra uses a protocol called _______ to discover location and state information.
A. HBase
B. Gossip
C. Key-value
D. None of the mentioned
Answer: B) Gossip
CAP Theorem:
1. Consistency: All nodes see same data at any time, or reads return latest written value by
any client.
2. Availability: The system allows operations all the time, and operations return quickly.
Answer: D) Memtable
Explanation: Memtable is a memory cache to store the in memory copy of the data. Each
node has a memtable for each CQL table. The memtable accumulates writes and provides
read for data which are not yet stored to disk.
A. Client requests
B. Partitioner
C. Snitch
D. None of the mentioned
Answer: C) Snitch
Explanation:
Snitch maps IPs to racks and data centers configured in cassandra.yaml config file .
__________________________________________________________________________