Cloud Computing Unit-2 (A)
Cloud Computing Unit-2 (A)
Rohit Handa
Lecturer, CSE-IT Department 1
IBM-ICE Program, BUEST Baddi
Topic: Overview of Grid, Peer-to-Peer, Pervasive and Utility Computing technologies,
their characteristics and comparison with cloud computing
Topic-1: Client-Server Model of Computing
The client-server model of computing is a
distributed application model that partitions the
system into two entities:
o providers of a resource/service, called
servers
o the users of the resource/service, called
clients
A server machine is a host that runs one or more server programs and share resources
with clients.
A client machine sends requests to the server through computer network and may
make use of client program.
A client does not share any of its resources, but requests a server's content or service
function. Clients therefore initiate communication sessions with servers which await
incoming requests.
A client-server network involves multiple clients connecting to a single, central server.
Often clients and servers communicate over a computer network on separate hardware,
but both client and server may reside in the same system.
Examples of computer applications that use the clientserver model are Email, network
printing, and the World Wide Web.
The clientserver characteristic describes the relationship of cooperating programs in
an application.
The server component provides a function or service to one or many clients, which
initiate requests for such services.
Servers are classified by the services they provide. A web server serves web pages; a file
server serves computer files.
A shared resource may be any of the server computer's software and electronic
components, from programs and data to processors and storage devices. The sharing of
resources of a server constitutes a service.
Whether a computer is a client, a server, or both, is determined by the nature of the
application that requires the service functions. For example, a single computer can run
web server and file server software at the same time to serve different data to clients
making different kinds of requests. Client software can also communicate with server
software within the same computer.
Communication between servers, such as to synchronize data, is sometimes
called inter-server or server-to-server communication.
In general, a service is an abstraction of computer resources and a client does not have
to be concerned with how the server performs while fulfilling the request and delivering
the response.
The client only has to understand the response based on the well-known application
protocol, i.e. the content and the formatting of the data for the requested service.
Clients and servers exchange messages in a request-response messaging pattern: The
client sends a request, and the server returns a response. This exchange of messages is
an example of inter-process communication.
To communicate, the computers must have a common language, and they must follow
rules so that both the client and the server know what to expect. The language and
Er. Rohit Handa
Lecturer, CSE-IT Department 2
IBM-ICE Program, BUEST Baddi
rules of communication are defined in a communications protocol. All client-server
protocols operate in the application layer.
Major Characteristics:
Tightly coupled systems
Single system image
Centralized Job management & scheduling system
Multiple computing nodes,
o low cost
o a fully functioning computer with its own memory, CPU, possibly storage
o own instance of operating system
computing nodes are connected by interconnects
o typically low cost, high bandwidth and low latency
permanent, high performance data storage
a resource manager to distribute and schedule jobs
Er. Rohit Handa
Lecturer, CSE-IT Department 4
IBM-ICE Program, BUEST Baddi
the middleware that allows the computers act as a distributed or parallel system
parallel applications designed to run on it
More towards parallel computing
Makes use of inter connection technologies
Processing elements generally lie within the close proximity of each other
Gives the impression of a single powerful computer
Generally cost effective compared to single computers of comparable speed and
availability
Deployed to improve performance and availability over that of a single computer
Attributes of Clusters
Computer clusters may be configured for different purposes ranging from general
purpose business needs such as web-service support, to computation-intensive
scientific calculations. In either case, the cluster may use a high-availability approach.
"Load-balancing" clusters are configurations in which cluster-nodes share
computational workload to provide better overall performance. For example, a web
server cluster may assign different queries to different nodes, so the overall response
time will be optimized. However, approaches to load-balancing may significantly differ
among applications, e.g. a high-performance cluster used for scientific computations
would balance load with different algorithms from a web-server cluster which may just
use a simple round-robin method by assigning each new request to a different node.
"Computer clusters" are used for computation-intensive purposes, rather than
handling IO-oriented operations such as web service or databases. For instance, a
computer cluster might support computational simulations of vehicle crashes or
weather.
Very tightly coupled computer clusters are designed for work that may approach
"supercomputing".
"High-availability clusters" (also known as failover clusters, or HA clusters) improve
the availability of the cluster approach. They operate by having redundant nodes, which
are then used to provide service when system components fail. HA cluster
implementations attempt to use redundancy of cluster components to eliminate single
points of failure.
Benefits
Low Cost: Customers can eliminate the cost and complexity of procuring, configuring
and operating HPC clusters with low, pay-as-you-go pricing. Further, you can optimize
costs by leveraging one of several pricing models: On Demand, Reserved or Spot
Instances.
Elasticity: You can add and remove computer resources to meet the size and time
requirements for your workloads.
Run Jobs Anytime, Anywhere: You can launch compute jobs using simple APIs or
management tools and automate workflows for maximum efficiency and scalability. You
can increase your speed of innovation by accessing computer resources in minutes
instead of spending time in queues.
Components of a Cluster
The following are the components of cluster computers:
Multiple computers (computing nodes)
Er. Rohit Handa
Lecturer, CSE-IT Department 5
IBM-ICE Program, BUEST Baddi
Classification of clusters
1. According to usage requirements
a. High Performance and High Throughput Clusters: They are used for applications
which require high computing capability.
b. High Availability Clusters: The aim is to keep the overall services of the cluster
available as much as possible, considering the fail possibility of each hardware of
software. They provide redundant services across multiple systems, to overcome
loss of service. If a node fails, others pick up the service to keep the system
environment consistent, from the point of view of the user. The switching over
Er. Rohit Handa
Lecturer, CSE-IT Department 6
IBM-ICE Program, BUEST Baddi
should take very short time. A subset of this type is the load balancing clusters.
They are usually used for business needs. The aim is to share processing load as
evenly as possible. No single parallel program that runs across those nodes. Each
node is independent, running separate software. There should be a central node
balancing server.
Characteristics
A distributed system architecture, i.e., No centralized control
Clients are also servers and routers
Nodes contribute content, storage, memory, CPU
Nodes are autonomous (no administrative authority)
Ad-hoc Nature. Peers join and leave the system without direct control of any entity.
Therefore, the number and location of active peers as well as the network topology
interconnecting them are highly dynamic. The ad-hoc nature requires P2P systems to
be self-organizing. So, network is dynamic: nodes enter and leave the network
frequently
Nodes collaborate directly with each other (not through well-known servers)
Limited Capacity and Reliability of Peers. Measurement studies [33] show that peers
do not have server-like properties: peers have much less capacity and they fail more
often. The unreliability of peers suggests that fault tolerance and adaptation
techniques should be integral parts of the P2P protocols. The limited capacity of peers
demands load-sharing and balancing among all participating peers.
Nodes have widely varying capabilities
Rationality of Peers. Computers participating in a P2P system are typically owned and
operated by autonomous and rational entities (peers). Rational peers make decisions to
maximize their own benefits. Peers may decide, for example, on whether to share data,
leave the system, and forward queries. These decisions are not always inline with the
performance objectives of the system. This conflict of interest may jeopardize the growth
and the performance of the entire system. Therefore, peer rationality should be
considered in designing the P2P protocols
Decentralization: One main goal of decentralization is the emphasis on users'
ownership and control of data and resources. In a fully decentralized system, all peers
assume equal roles. This makes the implementation of the P2P models difficult in
practice because there is no centralized server with a global view of all the peers in the
network or the files they provide. This is the reason why many P2P file systems are built
Er. Rohit Handa
Lecturer, CSE-IT Department 8
IBM-ICE Program, BUEST Baddi
as hybrid approach as in the case of Napster, where there is a centralized directory of
the files but the nodes download files directly from their peers.
Scalability: An immediate benefit of decentralization is improved scalability. Scalability
is limited by factors such as the amount of centralized operations (e.g., synchronization
and coordination) that needs be performed, the amount of state that needs to be
maintained, the inherent parallelism an application exhibits, and the programming
model that is used to represent the computation.
Anonymity: An important goal of anonymity is to allow people to use systems without
concern for legal or other ramifications. A further goal is to guarantee that censorship
of digital content is not possible.
Cost of Ownership: One of the promises of P2P network is shared ownership. Shared
ownership reduces the cost of owning the systems and the content, and the cost of
maintaining them. This is applicable to all classes of P2P systems.
Ad-hoc Connectivity: The ad-hoc nature of connectivity has a strong effect on all
classes of P2P systems. In distributed computing, the parallelized applications cannot
be executed on all systems all of the time; some of the systems will be available all of
the time, some will be available part of the time, and some will not be available at all.
P2P systems and applications in distributed computing need to be aware of this ad-hoc
nature and be able to handle systems joining and withdrawing from the pool of
available P2P systems.
Performance: Performance is a significant concern in P2P systems. P2P systems aim to
improve performance by aggregating distributed storage capacity (e.g., Napster,
Gnutella) and computing cycles (e.g., SETI@Home) of devices spread across a network.
Owing to the decentralized nature of these models, performance is influenced by three
types of resources namely: processing, storage, and networking.
Security: P2P systems share most of their security needs with common distributed
systems: trust chains between peers and shared objects, session key exchange
schemes, encryption, digital digests, and signatures. However, new security
requirements appeared with P2P systems. Some of these requirements are multi-key
encryption, sandboxing, digital rights management, reputation and accountability and
firewalls.
Transparency and Usability: In distributed systems, transparency was traditionally
associated with the ability to transparently connect distributed systems into a
seamlessly local system. The primary form of transparency was location transparency,
but other forms include transparency of access, concurrency, replication, failure,
mobility, scaling, etc. Over time, some of the transparencies were further qualified,
such as transparency for failure, by requiring distributed applications to be aware of
failures, and addressing transparency on the Internet and Web. Another form of
transparency is related to security and mobility.
Fault Tolerance: One of the primary goals of a P2P system is to avoid a central point of
failure. Although most P2P systems (pure P2P) already do this, they nevertheless are
faced with failures commonly associated with systems spanning multiple hosts and
networks: disconnections/unreachability, partitions, and node failures. It would be
desirable to continue active collaboration among the still connected peers in the
presence of such failures.
P2P Applications
File sharing (Napster, Gnutella, Kazaa)
Multiplayer games (Unreal Tournament, DOOM)
Collaborative applications (ICQ, shared whiteboard)
Er. Rohit Handa
Lecturer, CSE-IT Department 9
IBM-ICE Program, BUEST Baddi
Distributed computation (Seti@home)
Ad-hoc networks
Centralized P2P-Napster
Napster can be classified as a centralized P2P network, where a central entity is
necessary to provide the service.
A central database maintains an index of all files that are shared by the peers currently
logged onto the Napster network.
The database can be queried by all peers to lookup the IP addresses and ports of all
peers sharing the requested file.
File transfer is decentralized, but locating content is centralized.
In 2000 Napster offered the first P2P file-sharing application and with it a real P2P rush
started.
Pure P2P
The main disadvantage of a central architecture is its single point of failure.
For this reason pure P2P networks like Gnutella 0.4 have been developed.
They are established and maintained completely without central entities.
All peers in these overlay networks are homogeneous and provide the same
functionality.
Therefore, they are very fault resistant as any peer can be removed without loss of
functionality.
Because of Gnutellas unstructured network architecture, no guarantee can be given
that content can be found.
Er. Rohit Handa
Lecturer, CSE-IT Department 11
IBM-ICE Program, BUEST Baddi
Messages are coded in plain text and all queries have to be flooded through the
network.
This results in a significant signaling overhead and a comparable high network load.
Gnutella
Searching by flooding:
If you dont have the file you want, query 7 of your neighbors.
If they dont have it, they contact 7 of their neighbors, for a maximum hop count of 10.
Requests are flooded, but there is no tree structure.
No looping but packets may be received twice.
Reverse path forwarding
Hybrid P2P
Hybrid approaches, like Gnutella try to reduce network traffic by establishing a second
routing hierarchy, i.e. the Superpeer layer.
By the differentiation of participating nodes in Superpeers and Leaf nodes a significant
reduction of the data rate consumption can be achieved, without loosing the networks
complete self organization.
Context Awareness
A Ubiquitous computing system has to be context aware, i.e., aware of users state and
surroundings and modify its behavior based on this information.
The situational conditions that are associated with a user location, surrounding
conditions light, temperature, humidity, noise level, etc), social activities, user
intentions, personal information, etc.
Characteristics
Permanency: The information remains unless the learners purposely remove it.
Accessibility: The information is always available whenever the learners need to use it.
Immediacy: The information can be retrieved immediately by the learners.
Interactivity: The learners can interact with peers, teachers, and experts efficiently
and effectively through different media.
Context-awareness: The environment can adapt to the learners real situation to
provide adequate information for the learners.
Invisibility: - Invisible Intelligent Devices - Wearable Computing Devices
Adaptation: Adapting to Device Type, Time, Location, Temperature, Weather, etc
Task Dynamism - Applications need to adapt to the users environment and
uncertainties - Programs need to adapt to changing goals
Device heterogeneity and resource constraints: Technological capabilities of the
environment change Approaches to Mobility - Device itself is mobile.
Constraints: physical ones limiting resources (e.g., battery power, network bandwidth,
etc.), variability in availability of resources.
Application that follows the user: Constraints: dynamic adaptation of applications to
changing - hardware capabilities and variability in software services.
Computing in a social environment Applications will have significant impact on
their social environment. Ubiquitous environment sensors. Who should access
sensors data ? Who owns data from Ubiquitous Computing System ?
Smart Clothing
Sensors based on fabric e.g., monitor pulse, blood pressure, body temperature
Invisible collar microphones
Kidswear
o game console on the sleeve
o integrated GPS-driven locators
o integrated small camera
Computer's resources:
Central processing unit (CPU): A CPU is a microprocessor that performs mathematical
operations and directs data to different memory locations. Computers can have more than
one CPU.
Memory: In general, a computer's memory is a kind of temporary electronic storage.
Memory keeps relevant data close at hand for the microprocessor. Without memory, the
microprocessor would have to search and retrieve data from a more permanent storage
device such as a hard disk drive.
Er. Rohit Handa
Lecturer, CSE-IT Department 15
IBM-ICE Program, BUEST Baddi
Storage: In grid computing terms, storage refers to permanent data storage devices like
hard disk drives or databases.
Normally, a computer can only operate within the limitations of its own resources.
There's an upper limit to how fast it can complete an operation or how much information it
can store. Most computers are upgradeable, which means it's possible to add more power
or capacity to a single computer, but that's still just an incremental increase in
performance.
Grid computing systems link computer resources together in a way that lets someone use
one computer to access and leverage the collected power of all the computers in the
system.
To the individual user, it's as if the user's computer has transformed into a
supercomputer.
Utility computing rates vary depending on the utility computing company and the
requested service.
Usually, companies charge clients based on service usage rather than a flat fee.
The more a client uses services, the more fees it must pay.
Some companies bundle services together at a reduced rate, essentially selling
computer services in bulk.
Utility computing is the packaging of computing resources, such as computation,
storage and services, as a metered service.
This model has the advantage of a low or no initial cost to acquire computer resources;
instead, computational resources are essentially rented.
This repackaging of computing services became the foundation of the shift to "on
demand" computing, software as a service and cloud computing models that further
propagated the idea of computing, application and network as a service.
Utility Computing can support grid computing which has the characteristic of very large
computations or a sudden peaks in demand which are supported via a large number of
computers.
"Utility computing" has usually envisioned some form of virtualization so that the
amount of storage or computing power available is considerably larger than that of a
single time-sharing computer.
Multiple servers are used on the "back end" to make this possible. These might be a
dedicated computer cluster specifically built for the purpose of being rented out, or
even an underutilized supercomputer. The technique of running a single calculation on
multiple computers is known as distributed computing.
The term "grid computing" is often used to describe a particular form of distributed
computing, where the supporting nodes are geographically distributed or
cross administrative domains. To provide utility computing services, a company can
"bundle" the resources of members of the public for sale, who might be paid with a
portion of the revenue from clients.
This model has the advantage of a low or no initial cost to acquire computer
resources; instead, computational resources are essentially rented-turning what was
previously a need to purchase products (hardware, software and network bandwidth)
into a service.
This packaging of computing services became the foundation of the shift to "On
Demand" computing, Software as a Service and Cloud Computing models that further
propagated the idea of computing, application and network as a service.
Utility computing is not a new concept, but rather has quite a long history. Among the
earliest references is: If computers of the kind I have advocated become the computers
of the future, then computing may someday be organized as a public utility just as
the telephone system is a public utility... The computer utility could become the
basis of a new and important industry.