HCIA-Cloud Computing-Chapter3
HCIA-Cloud Computing-Chapter3
Data is the most important asset for every user. This chapter describes how and where
data is stored, and provide the key data storage technologies in cloud computing.
However, other problems remain, such as low storage space utilization, decentralized
data management, and inconvenient data sharing. We will learn how the network shared
storage such as SAN and NAS solves these pain points.
CIFS is a network file system protocol used for sharing files and printers between
machines on a network. It is mainly used to share network files between hosts running
Windows.
NAS is a file-level storage architecture that meets the requirements of work teams and
departments on quick storage capacity expansion. Currently, NAS is widely used to share
documents, images, and movies. NAS supports multiple protocols (such as NFS and CIFS)
and supports various OSs. Users can conveniently manage NAS devices by using Internet
Explorer or Netscape on any work station.
With the development of SAN technologies, three SAN types are made available: FC SAN,
IP SAN, and SAS SAN. The following describes FC SAN and IP SAN.
3.1.4.3.1 Introduction to FC SAN
and Fibre Channel over Ethernet SAN (FCoE SAN). Currently, FC SAN and IP SAN
technologies are mature, and FCoE SAN is still in the early stage of its development.
A disk array combines multiple physical disks into a single logical unit. Each disk array
consists of one controller enclosure and multiple disk enclosures. This architecture
delivers an intelligent storage space featuring high availability, high performance, and
large capacity.
For high data availability and security, the centralized storage system uses the RAID
technology. RAID can be implemented by hardware and software. All disks must be
deployed on the same server (hardware RAID requires a unified RAID card, and software
RAID requires a unified OS). Disks of a distributed storage system are distributed in
different servers, resulting in that the RAID mechanism is unavailable.
Therefore, in a distributed storage system, a copying mechanism is introduced to ensure
high data reliability. The copying mechanism copies and stores data on different servers.
If a server is faulty, data will not be lost.
addition, many remote and disaster recovery solutions are also developed based on the IP
network, allowing users to expand the physical scope of their storage infrastructure.
Internet SCSI (iSCSI), Fibre Channel over IP (FCIP), and Fibre Channel over Ethernet
(FCoE) are the major IP SAN protocols.
⚫ iSCSI encapsulates SCSI I/Os into IP packets and transmits them over TCP/IP. iSCSI is
widely used to connect servers and storage devices because it is cost-effective and
easy to implement, especially in environments without FC SAN.
⚫ FCIP allows FCIP entities, such as FCIP gateways, to implement FC switching over IP
networks. FCIP combines the advantages of FC SAN and the mature, widely-used IP
infrastructure. This gives enterprises a better way to use existing investments and
technologies for data protection, storage, and migration.
⚫ FCoE achieves I/O consolidation. Usually, one server in a data center is equipped with
two to four NICs and HBAs for redundancy. If there are hundreds of servers in a data
center, numerous adapters, cables, and switches required make the environment
complex and difficult to manage and expand. FCoE achieves I/O consolidation via
FCoE switches and Converged Network Adapters (CNA). CNAs replace the NICs and
HBAs on the servers and consolidate IP traffic and FC traffic. In this way, servers no
longer need various network adapters and many independent networks, thus the
requirement of NICs, cables, and switches is reduced. This massively lowers the costs
and management overheads.
Block storage is a high-performance network storage, but data cannot be shared
between hosts in block storage. Some enterprise workloads may require data or file
sharing between different types of clients, and block storage cannot do this.
3.1.5.3.2 File Storage
File storage provides file-based, client-side access over the TCP/IP protocol. In file storage,
data is transferred via file I/Os in the local area network (LAN). A file I/O is a high-level
request for accessing a specific file. For example, a client can access a file by specifying
the file name, location, or other attributes. The NAS system records the locations of files
on disks and converts the client's file I/Os to block I/Os to obtain data.
File storage is a commonly used type of storage for desktop users. When you open and
close a document on your computer, you used the file system. Clients can access file
systems on the file storage for file upload and download. Protocols used for file sharing
between clients and storage include CIFS (SMB) and NFS. In addition to file sharing, file
storage also provides file management functions, such as reliability maintenance and file
access control. Although there are differences in managing file storage and local files, file
storage is basically a directory to users. One can use file storage almost the same as
using local files.
Because NAS access requires the conversion of file system format, it is not suitable for
applications using blocks, especially database applications that require raw devices.
File Storage has the following advantages:
⚫ Comprehensive information access: Local directories and files can be accessed by
users on other computers over LAN. Multiple end users can collaborate with each
other based on same files, such as project documents and source code.
⚫ Good flexibility: NAS is compatible with both Linux and Windows clients.
HCIA-Cloud Computing V5.0 Learning Guide Page 57
storage cannot do this. In addition, block storage is complex and costly because
additional components, such FC components and HBAs, need to be purchased.
File systems are deployed on file storage devices, and users access specific files, for
example, opening, reading from, writing to, or closing a file. File storage maps file
operations to disk operations, and users do not need to know the exact disk block where
the file resides. Data is exchanged between users and file storage over the Ethernet in a
LAN. File storage is easy to manage and supports comprehensive information access. One
can share files by simply connecting the file storage devices to a LAN. This makes file
sharing and collaboration more efficient. But file storage is not suitable for applications
that demand block devices, especially databases systems. This is because file storage
requires the conversion of file system format and users access specific files instead of
data.
Object storage uses a content addressing system to simplify storage management,
ensuring that the stored content is unique. It offers terabyte to petabyte scalability for
static data. When a data object is stored, the system converts the binary content of the
stored data to a unique identifier. The content address is not a simple mapping of the
directory, file name, or data type of the stored data. OBS ensures content reliability with
globally unique, location-independent identifiers and high scalability. It is good at storing
non-transactional data, especially static data and is applicable to archives, backups,
massive file sharing, scientific and research data, and digital media.
RAID 0 group consists of n disks, theoretically, the read and write performance of the
group is n times that of a single disk. Due to the bus bandwidth restriction and other
factors, the actual performance is lower than the theoretical one.
RAID 0 features low cost, high read/write performance, and 100% disk usage. However, it
offers no redundancy. In the event of a disk failure, data is lost. Therefore, RAID 0 is
applicable to applications that have high requirements on performance but low
requirements on data security and reliability, such as video/audio storage and temporary
storage space.
3.2.1.5.2 RAID 1
RAID 1, also known as mirror or mirroring, is designed to maximize the availability and
repairability of user data. RAID 1 automatically copies all data written to one disk to the
other disk in a RAID group.
RAID 1 writes the same data to the mirror disk while storing the data on the source disk.
If the source disk fails, the mirror disk takes over services from the source disk. RAID 1
delivers the best data security among all RAID levels because the mirror disk is used for
data backup. However, no matter how many disks are used, the available storage space
is only the capacity of a single disk. Therefore, RAID 1 delivers the lowest disk usage
among all RAID levels.
3.2.1.5.3 RAID 3
3.2.1.5.4 RAID 5
3.2.1.5.5 RAID 6
RAID 6 breaks through the limitation of disk redundancy.
RAID 6 features fast read performance and high fault tolerance. However, the cost of
RAID 6 is much higher than that of RAID 5, the write performance is poor, and the design
and implementation are complicated. Therefore, RAID 6 is seldom used and is mainly
applicable to scenarios that require high data security. It can be used as an economical
alternative to RAID 10.
2. Each SSD is then divided into CKs of a fixed size (typically 4 MB) for logical space
management.
3. CKs from different SSDs form chunk groups (CKGs) based on the RAID policy
specified on DeviceManager.
4. CKGs are further divided into grains (typically 8 KB). Grains are mapped to LUNs for
refined management of storage resources.
⚫ RAID 2.0+ outperforms traditional RAID in the following aspects:
- Service load balancing to avoid hot spots: Data is evenly distributed to all disks in
the resource pool, protecting disks from early end of service lives due to excessive
writes.
- Fast reconstruction to reduce risk window: When a disk fails, the valid data in the
faulty disk is reconstructed to all other functioning disks in the resource pool
(fast many-to-many reconstruction), efficiently resuming redundancy protection.
- Reconstruction load balancing among all disks: All member disks in a storage
resource pool participate in reconstruction, and each disk only needs to
reconstruct a small amount of data. Therefore, the reconstruction process does
not affect upper-layer applications.
universal serial bus (USB) port is used to connect an MP3 player or digital camera to a
computer. The USB port is competent to the data transfer and charging of portable
electronic devices that store pictures and music. However, the USB bus is incapable of
supporting computers, servers, and many other devices.
In this case, SCSI buses are applicable. SCSI, short for Small Computer System Interface, is
an interface used to connect between hosts and peripheral devices including disk drives,
tape drives, CD-ROM drives, and scanners. Data operations are implemented by SCSI
controllers. Like a small CPU, the SCSI controller has its own command set and cache.
The special SCSI bus architecture can dynamically allocate resources to tasks run by
multiple devices in a computer. In this way, multiple tasks can be processed at the same
time.
SCSI is a vast protocol system evolved from SCSI-1 to SCSI-2 and then to SCSI-3. It
defines a model and a necessary command set for different devices (such as disks,
processors, and network devices) to exchange information using the framework.
universal network protocol and IP network infrastructure is mature. The two points
provide a solid foundation for iSCSI development.
Prevalent IP networks allow data to be transferred over LANs, WANs, or the Internet
using new IP storage protocols. The iSCSI protocol is developed by this philosophy. iSCSI
adopts IP technical standards and converges SCSI and TCP/IP protocols. Ethernet users
can conveniently transfer and manage data with a small investment.
3.2.2.2.1 iSCSI Initiator and Target
with the target. If the iSCSI names are consistent, the connection is set up. Each iSCSI
node has a unique iSCSI name. One iSCSI name can be used in the connections from one
initiator to multiple targets. Multiple iSCSI names can be used in the connections from
one target to multiple initiators.
The functions of the iSCSI initiator and target are as follows:
⚫ Initiator
The SCSI layer generates command descriptor blocks (CDBs) and transfers them to the
iSCSI layer.
The iSCSI layer generates iSCSI protocol data units (PDUs) and sends them to the target
over an IP network.
⚫ Target
The iSCSI layer receives PDUs and sends CDBs to the SCSI layer.
The SCSI layer interprets CDBs and gives responses when necessary.
From the perspective of Fibre Channel, FCoE enables Fibre Channel to be carried by the
Ethernet Layer 2 link. From the perspective of the Ethernet, FCoE is an upper-layer
protocol that the Ethernet carries, like IP or IPX.
3.2.2.3.4 FCoE Protocol Encapsulation
3.3 Quiz
What are the relationships between DAS, NAS, SAN, block storage, file storage, and
object storage?