0% found this document useful (0 votes)
30 views103 pages

CLOUD COMPUTING msccs notes

The document provides an overview of cloud computing, defining it as a means of accessing applications and resources over the Internet. It discusses key characteristics, service models (SaaS, PaaS, IaaS, CaaS), and deployment models (public, community, hybrid, private) of cloud computing. Additionally, it highlights various applications and benefits of cloud services, including scalability, communication, productivity, and data analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views103 pages

CLOUD COMPUTING msccs notes

The document provides an overview of cloud computing, defining it as a means of accessing applications and resources over the Internet. It discusses key characteristics, service models (SaaS, PaaS, IaaS, CaaS), and deployment models (public, community, hybrid, private) of cloud computing. Additionally, it highlights various applications and benefits of cloud services, including scalability, communication, productivity, and data analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

UNIT-I

INTRODUCTION TO CLOUD COMPUTING


INTRODUCTION
Cloud Computing provides us means of accessing the applications as utilities over the
Internet. It allows us to create, configure, and customize the applications online.
What is Cloud?
The term Cloud refers to a Network or Internet. In other words, we can say that Cloud is
something, which is present at remote location. Cloud can provide services over public and private
networks, i.e., WAN, LAN or VPN.Applications such as e-mail, web conferencing, customer
relationship management (CRM) execute on cloud.
What is Cloud Computing?
Cloud Computing refers to manipulating, configuring, and accessing the hardware and
software resources remotely. It offers online data storage, infrastructure, and application.

Cloud computing offers platform independency, as the software is not required to be installed locally on
the PC. Hence, the Cloud Computing is making our business applications mobile and collaborative.
History of Cloud Computing

4
The concept of Cloud Computing came into existence in the year 1950 with implementation of mainframe
computers, accessible via thin/static clients. Since then, cloud computing has been evolved from static clients to
dynamic ones and from software to services. The following diagram explains the evolution of cloud computing:

CHARACTERISTICS OF CLOUD COMPUTING


There are four key characteristics of cloud computing. They are shown in the following diagram:

On Demand Self Service


Cloud Computing allows the users to use web services and resources on demand. One can logon to a
website at any time and use them.

Broad Network Access


Since cloud computing is completely web based, it can be accessed from anywhere and at any time.

Resource Pooling
Cloud computing allows multiple tenants to share a pool of resources. One can share single physical
instance of hardware, database and basic infrastructure.

Rapid Elasticity
It is very easy to scale the resources vertically or horizontally at any time. Scaling of resources means the
ability of resources to deal with increasing or decreasing demand.
The resources being used by customers at any given point of time are automatically monitored.

Measured Service
In this service cloud provider controls and monitors all the aspects of cloud service. Resource optimization,
billing, and capacity planning etc. depend on it.

5
CLOUD MODELS
There are certain services and models working behind the scene making the cloud
computing feasible and accessible to end users. Following are the working models for cloud
computing:
• Service Models
• Deployment Models

Service models
Cloud computing is a model for enabling ubiquitous, convenient, on -demand
network access to a shared pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be rapidly provisioned and
released with minimal management effort or service provider interaction. This cloud
model is composed of five essential characteristics, three service models, and four
deployment models.
• Software as a Service (SaaS)
• Platform as a Service (PaaS).
• Infrastructure as a Service (IaaS).
• Communication-as-a-Service (CaaS)

6
Software as a Service (SaaS).
The traditional model of software distribution, in which software is purchased for and
installed on personal computers, is sometimes referred to as Software-as-a-Product. Software-as-
a-Service is a software distribution model in which applications are hosted by a vendor or service
provider and made available to customers over a network, typically the Internet.
SaaS is becoming an increasingly prevalent delivery model as underlying technologies
that support web services and service-oriented architecture (SOA) mature and new
developmental approaches become popular.
SaaS is also often associated with a pay-as-you-go subscription licensing model. Mean-
while, broadband service has become increasingly available to support user access from more
areas around the world. Examples are Google’s Gmail and Apps, instant messaging from AOL,
Yahoo and Google.

Platform as a Service (PaaS).


• Cloud computing has evolved to include platforms for building and running custom web-
based applications, a concept known as Platform-as-a-Service. PaaS is an outgrowth of the
SaaS application delivery model.
• The PaaS model makes all of the facilities required to support the complete life cycle of
building and delivering web applications and services entirely available from the Internet, all
with no software downloads or installation for developers, IT managers, or end users.
Examples include Microsoft’s Azure and Salesforce’s Force.com.

Infrastructure as a Service (IaaS).


• The capability provided to the consumer is the provision of grids or clusters or virtualized
servers, processing, storage, networks, and other fundamental computing resources where
the consumer is able to deploy and run arbitrary software, which can include operating
systems.
• The highest profile example is Amazon’s Elastic Compute Cloud (EC2) and Simple Storage
Service, but IBM and other traditional IT vendors are also offering services, as is telecom-
and-more provider Verizon Business.

7
Communication-as-a-Service (CaaS)
A CaaS model allows a CaaS provider’s business customers to selectively deploy
communications features and services throughout their company on a pay-as-yougo basis for
service(s) used. CaaS is designed on a utility-like pricing model that provides users with
comprehensive, flexible, and (usu-ally) simple-tounderstand service plans.

Deployment models
As the cloud technology is providing users with so many benefits, these benefits must
have to be categorized based on users requirement. Cloud deployment model represents the exact
category of cloud environment based on proprietorship, size, and access and also describes the
nature and purpose of the cloud. Most organizations implement the cloud infrastructure to
minimize capital expenditure & regulate operating costs.
• Public Cloud
• Community Cloud
• Hybrid Cloud
• Private Cloud

Public cloud
• Public cloud or external cloud describes cloud computing in the traditional mainstream
sense, whereby resources are dynamically provisioned on a fine-grained, selfservice basis

8
over the Internet, via web applications/web services from an off-site third-party provider
who bills on a fine-grained utility computing basis.
• The cloud infrastructure is made available to the general public or a large industry group,
and is owned by an organization selling cloud services. Examples: Amazon Elastic-
Compute-Cloud, IBM's BlueCloud, Sun Cloud, Google AppEngine.

Community cloud
A community cloud may be established where several organizations have similar
requirements and seek to share infrastructure so as to realize some of the benefits of cloud
computing. With the costs spread over fewer users than a public cloud (but more than a single
tenant) this option is more expensive but may offer a higher level of privacy, security and/or
policy compliance. Examples of community cloud include Google’s "Gov Cloud".

Hybrid cloud
• The term "Hybrid Cloud" has been used to mean either two separate clouds joined
together (public, private, internal or external), or a combination of virtualized cloud
server instances used together with real physical hardware.
• The most correct definition of the term "Hybrid Cloud" is probably the use of physical
hardware and virtualized cloud server instances together to provide a single common
service. Two clouds that have been joined together are more correctly called a "combined
cloud".
• A hybrid storage cloud uses a combination of public and private storage clouds. Hybrid
storage clouds are often useful for archiving and backup functions, allowing local data to
be replicated to a public cloud.

Private cloud
• A private cloud is a particular model of cloud computing that involves a distinct and secure
cloud based environment in which only the specified client can operate. As with other cloud
models, private clouds will provide computing power as a service within a virtualized
environment using an underlying pool of physical computing resource.
• However, under the private cloud model, the cloud (the pool of resource) is only accessible
by a single organization providing that organization with greater control and privacy.
• The possible dependencies between CaaS, SaaS, PaaS & IaaS is as follows:
9
CLOUD SERVICES EXAMPLES
1. Scalable Usage
Cloud computing offers scalable resources through various subscription models. This
means that you will only need to pay for the computing resources you use. This helps in
managing spikes in demands without the need to permanently invest in computer hardware.
Netflix, for instance, leverages this potential of cloud computing to its advantage. Due to
its on-demand streaming service, it faces large surges in server load at peak times. The move to
migrate from in-house data centres to cloud allowed the company to significantly expand its
customer base without having to invest in setup and maintenance of costly infrastructure.

2. Chatbots
The expanded computing power and capacity of the cloud enables us to store information
about user preferences. This can be used to provide cus\
tomized solutions, messages and products based on the behaviour and preferences of users.
Siri, Alexa and Google Assistant - all are cloud-based natural-language intelligent bots.
These chatbots leverage the computing capabilities of the cloud to provide personalized context-
relevant customer experiences. The next time you say, “Hey Siri!” remember that there is a
cloud-based AI solution behind it.

3. Communication
The cloud allows users to enjoy network-based access to communication tools like
emails and calendars. Most of the messaging and calling apps like Skype and WhatsApp are
also based on cloud infrastructure. All your messages and information are stored on the service
provider’s hardware rather than on your personal device. This allows you access your
information from anywhere via the internet.

10
4. Productivity

Office tools like Microsoft Office 365 and Google Docs use cloud computing, allowing
you to use your most-productive tools over the internet. You can work on your documents,
presentations and spreadsheets - from anywhere, at any time. With your data stored in the cloud,
you don’t need to bother about data loss in case your device is stolen, lost or damaged. Cloud
also helps in sharing of documents and enables different individuals to work on the same
document at the same time.

5. Business Process
Many business management applications like customer relationship management (CRM)
and enterprise resource planning (ERP) are also based on a cloud service provider. Software as a
Service (SAAS) has become a popular method for deploying enterprise level software.
Salesforce, Hubspot, Marketo etc. are popular examples of this model. This method is
cost-effective and efficient for both the service provider and customers. It ensures hassle free
management, maintenance and security of your organization’s critical business resources and
allows you to access these applications conveniently via a web browser.
6. Backup and recovery
When you choose cloud for data storage the responsibility of your information also lies
with your service provider. This saves you from the capital outlay for building infrastructure and
maintenance. Your cloud service provider is responsible for securing data and meeting legal and
compliance requirements.
The cloud also provides more flexibility in the sense that you can enjoy large storage and
on-demand backups. Recovery is also performed faster in the cloud because the data is stored
over a network of physical servers rather than at one on-site data centre. Dropbox, Google Drive
and Amazon S3 are popular examples of cloud backup solutions.

11
7. Application development

Whether you are developing an application for web or mobile or even games, cloud
platforms prove to be a reliable solution. Using cloud, you can easily create scalable cross-
platform experiences for your users. These platforms include many pre-coded tools and libraries
— like directory services, search and security. This can speed up and simplify the development
process. Amazon Lumberyard is a popular mobile game development tool used in the cloud.

8. Test and development


The cloud can provide an environment to cut expenses and launch your apps in the
market faster. Rather than setting up physical environments developers can use the cloud to set
up and dismantle test and development environments.
This saves the technical team from securing budgets and spending critical project time
and resources. These dev-test environments can also be scaled up or down based on
requirements. LoadStorm and BlazeMeter are popular testing tools.

9. Big data analytics

Cloud computing enables data scientists to tap into any organizational data to analyze it
for patterns and insights, find correlations make predictions, forecast future crisis and help in

12
data backed decision making. Cloud services make mining massive amounts of data possible by
providing higher processing power and sophisticated tools.
There are many open source big data tools that are based on the cloud for instance
Hadoop, Cassandra, HPCC etc. Without the cloud, it won’t be very difficult to collect and
analyze data in real time, especially for small companies.

10. Social Networking


• Social Media is the most popular and often overlooked application of cloud computing.
Facebook, LinkedIn, MySpace, Twitter, and many other social networking sites use cloud
computing. Social networking sites are designed to find people you already know or would
like to know.
• In course of finding people, we end up sharing a lot of personal information. Of course, if
you're sharing information on social media then you are not only sharing it with friends but
also with the makers of the platform.
• This means that the platform will require a powerful hosting solution to manage and store
data in real-time - making use of cloud critical.Cloud has provided sophisticated solutions to
companies as well as individuals.
• If you are looking for a service provider to help you leverage cloud then NewGenApps can
be a partner of choice. We have worked on many popular cloud platforms like AWS, Azure
and GCP. Contact us today for a consultation or project.

CLOUD BASED SERVICES AND APPLICATION

A cloud application, or cloud app, is a software program where cloud-based and local
components work together. This model relies on remote servers for processing logic that is
accessed through a web browser with a continual internet connection.

Cloud application servers typically are located in a remote data center operated by a
third-party cloud services infrastructure provider. Cloud-based application tasks may encompass
13
email, file storage and sharing, order entry, inventory management, word processing, customer
relationship management (CRM), data collection, or financial accounting features.

Benefits of cloud apps


• Fast response to business needs.
Cloud applications can be updated, tested and deployed quickly, providing enterprises
with fast time to market and agility. This speed can lead to culture shifts in business
operations.
• Simplified operation
Infrastructure management can be outsourced to third-party cloud providers.

• Instant scalability
As demand rises or falls, available capacity can be adjusted.
• API use
Third-party data sources and storage services can be accessed with an application
programming interface (API). Cloud applications can be kept smaller by using APIs to hand
data to applications or API-based back-end services for processing or analytics computations,
with the results handed back to the cloud application. Vetted APIs impose passive
consistency that can speed development and yield predictable results.
• Gradual adoption.
Refactoring legacy, on-premises applications to a cloud architecture in steps, allows
components to be implemented on a gradual basis.
14
• Reduced costs.
The size and scale of data centers run by major cloud infrastructure and service providers,
along with competition among providers, has led to lower prices. Cloud-based applications
can be less expensive to operate and maintain than equivalents on-premises installation.
• Improved data sharing and security.
Data stored on cloud services is instantly available to authorized users. Due to their
massive scale, cloud providers can hire world-class security experts and implement
infrastructure security measures that typically only large enterprises can obtain. Centralized
data managed by IT operations personnel is more easily backed up on a regular schedule and
restored should disaster recovery become necessary.
• How cloud apps work
Data is stored and compute cycles occur in a remote data center typically operated by a
third-party company. A back end ensures uptime, security and integration and supports
multiple access methods.
Cloud applications provide quick responsiveness and don't need to permanently reside on
the local device. They can function offline, but can be updated online.
While under constant control, cloud applications don't always consume storage space on a
computer or communications device. Assuming a reasonably fast internet connection, a well-
written cloud application offers all the interactivity of a desktop application, along with
the portability of a web application.

Cloud apps vs. web apps


• With the advancement of remote computing technology, clear lines between cloud and web
applications have blurred. The term cloud application has gained great cachet, sometimes
leading application vendors with any online aspect to brand them as cloud applications.
• Cloud and web applications access data residing on distant storage. Both use server
processing power that may be located on premises or in a distant data center.A key
difference between cloud and web applications is architecture.
• A web application or web-based application must have a continuous internet connection to
function. Conversely, a cloud application or cloud-based application performs processing
tasks on a local computer or workstation.
• An internet connection is required primarily for downloading or uploading data.A web
application is unusable if the remote server is unavailable. If the remote server becomes
15
unavailable in a cloud application, the software installed on the local user device can still
operate, although it cannot upload and download data until service at the remote server is
restored.
• The difference between cloud and web applications can be illustrated with two common
productivity tools, email and word processing. Gmail, for example, is a web application that
requires only a browser and internet connection.
• Through the browser, it's possible to open, write and organize messages using search and
sort capabilities. All processing logic occurs on the servers of the service provider (Google,
in this example) via either the internet's HTTP or HTTPS protocols.A CRM application
accessed through a browser under a fee-based software as a service (SaaS) arrangement is a
web application. Online banking and daily crossword puzzles are also considered web
applications that don't install software locally.

An example of a word-processing cloud application that is installed on a workstation


is Word's Microsoft Office 365. The application performs tasks locally on a machine
without an internet connection. The cloud aspect comes into play when users save work to
an Office 365 cloud server.

Cloud apps vs. desktop apps


• Desktop applications are platform-dependent and require a separate version for each
operating system. The need for multiple versions increases development time and cost, and
complicates testing, version control and support.
• Conversely, cloud applications can be accessed through a variety of devices and operating
systems and are platform-independent, which typically leads to significant cost
savings.Every device on a desktop application requires its own installation.

16
• Because it's not possible to enforce an upgrade whenever a new version is available, it's
tricky to have all users running the same one. The need to provide support for multiple
versions simultaneously can become a burden on tech support.
• Cloud applications don't face version control issues since users can access and run only the
version available on the cloud.

Testing of cloud apps


Testing cloud applications prior to deployment is essential to ensure security and optimal
performance.
A cloud application must consider internet communications with numerous clouds and a
likelihood of accessing data from multiple sources simultaneously. Using API calls, a cloud
application may rely on other cloud services for specialized processing. Automated testing can
help in this multicloud, multisource and multiprovider ecosystem.
The maturation of container and microservices technologies has introduced additional
layers of testing and potential points of failure and communication. While containers can
simplify application development and provide portability, a proliferation of containers introduces
additional complexity.
Containers must be managed, cataloged and secured, with each tested for its own
performance, security and accuracy. Similarly, as legacy monolithic applications that perform
numerous, disparate tasks are refactored into many single-task microservices that must
interoperate seamlessly and efficiently, test scripts and processes grow correspondingly complex
and time-consuming.

Cloud Based Services


Cloud computing is the the use of various services, such as software development
platforms, servers, storage and software, over the internet, often referred to as the “cloud”.
Companies offering these computing services are called cloud providers and typically charge for
cloud computing services based on usage.

Types of cloud services


Most cloud computing services fall into three broad categories:
1. Software as a service (Saas)
2. Platform as a service (PaaS)

17
3. Infrastructure as a service (IaaS)
These are sometimes called the cloud computing stack, because they are build on top of one
another. Knowing what they are and how they are different, makes it easier to accomplish your
goals.

1. Software As A Service
Software-as-a-Service (SaaS) is a way of delivering services and applications over the
Internet. Instead of installing and maintaining software, we simply access it via the Internet,
freeing ourselves from the complex software and hardware management.It removes the need to
install and run applications on our own computers or in the data centers eliminating the expenses
ofhardware as well as software maintenance. SaaS provides a complete software solution which
you purchase on a pay-as-you-go basis from a cloud service provider.Most SaaS applications
can be run directly from a web browser without any downloads or installations required.The
SaaS applications are sometimes called Web-based software, on-demand software, or hosted
software.

Advantages of SaaS
1. Cost Effective :
Pay only for what you use.
2. Reduced time :
Users can run most SaaS apps directly from their web browser without needing to
download and install any software.This reduces the time spent in installation and configuration,
and can reduce the issues that can get in the way of the software deployment.
3. Accessibility :
We can Access app data from anywhere.
4. Automatic updates :
Rather than purchasing new software, customers rely on a SaaS provider to automatically
perform the updates.
5. Scalability :
It allows the users to access the services and features on demand.
The various companies providing software as a service are Cloud9 Analytics, Salesforce.com,
Cloud Switch, Microsoft Office 365, Eloqua, dropBox and Cloud Tran .
2. Platform As A Service

18
• PaaS is a category of cloud computing that provides a platform and environment to allow
developers to build applications and services over the internet.
• PaaS services are hosted in the cloud and accessed by users simply via their web browser.
A PaaS provider hosts the hardware and software on its own infrastructure.
• As a result, PaaS frees users from having to install in-house hardware and software to
develop or run a new application.Thus, the development and deployment of the
application takes place independent of the hardware.
• The consumer does not manage or control the underlying cloud infrastructure including
network, servers, operating systems, or storage, but has control over the deployed
applications and possibly configuration settings for the application-hosting environment.

Advantages of PaaS:
1. Simple and convenient for users :
It provides much of the infrastructure and other IT services, which users can access
anywhere via a web browser.
2. Cost Effective :
It charges for the services provided on a per-use basis thus eliminating the expenses one
may have for on-premises hardware and software.
3. Efficiently managing the lifecycle :
It is designed to support the complete web application lifecycle: building, testing,
deploying, managing and updating.
4. Efficiency :
It allows for higher-level programming with reduced complexity thus, the overall
development of the application can be more effective
The various companies providing Platform as a service are Amazon Web services, Salesforce,
Windows Azure, Google App Engine, cloud Bess and IBM smart cloud.

3. Infrastructure As A Service
Infrastructure as a service (IaaS) is a service model that delivers computer infrastructure
on an outsourced basis to support various operations.
Typically IaaS is a service where infrastructure is provided as an outsource to enterprises
such as networking equipments, devices, database and web servers.

19
Infrastructure as a service (IaaS) is also known as Hardware as a service (HaaS).
IaaS customers pay on a per-use basis, typically by the hour, week or month. Some
providers also charge customers based on the amount of virtual machine space they use.
It simply provides the underlying operating systems, security, networking, and servers for
developing such applications, services, and for deploying development tools, databases, etc.

Advantages of IaaS :
1. Cost Effective :
Eliminates capital expense and reduces ongoing cost and IaaS customers pay on a per use
basis, typically by the hour, week or month.
2. Website hosting :
Running websites using IaaS can be less expensive than traditional web hosting.
3. Security :
The IaaS Cloud Provider may provide better security than your existing software.
4. Maintainence :
There is no need to manage the underlying data center or the introduction of new releases
of the development or underlying software. This is all handled by the IaaS Cloud Provider.
The various companies providing Infrastructure as a service are Amazon web services,
Bluestack, IBM, Openstack, Rackspace and Vmware.

CLOUD CONCEPT AND TECHNOLOGIES


• Cloud computing is portrayed as a disruptive change in how computing and communications
are delivered to, and used by, end users.
• As with any such change, there is considerable room for innovation and many significant
opportunities for both providers and customers.Cloud computing is not just a new vision,
and it’s not just a new product or a new service.
• It’s actually all of these and more. New market sectors are emerging, and each has to
“cross the chasm” in order to achieve widespread adoption.
• Today’s processes and technologies also need to adapt or they will perish. Even IT
education will need to be modernized to reflect changing roles and processes.
Here are a few of the basic things to keep in mind as you head towards computing in the “cloudy
skies”:

20
1. Various viewpoints
There are many stakeholders in the world of cloud computing, ranging from the individual
user to the entrepreneur service provider to the service developer to the legal compliance
auditor.
Each has a role to play and activities to perform, and each will see cloud computing
differently. It is important to distinguish among the requirements of each of the stakeholders.

2. Scope and field of application


Determining what is and isn’t a cloud can be an important consideration. It is important to
decide what minimum characteristics are needed for an IT solution to be called a cloud.
Currently, many marketing efforts are aimed at positioning existing products as cloud
compliant.
Eventually the term cloud computing may disappear as the range of cloud-based systems
expands but for now we must understand where it fits and how it differs from today’s
technologies.

3. Defining cloud computing


The National Institute of Standards and Technology (NIST) was one of the first to attempt a
definition for cloud computing.

21
“paradigm for enabling network access to a scalable and elastic pool of shareable physical
or virtual resources with self-service provisioning and administration on-demand.”
This provides lots of latitude for implementing cloud-based IT solutions and ensures there
will be lots of competition among suppliers for both the underlying resources and the provision
of services.

4. Reference models for cloud computing


Proposed reference models for cloud computing include both the NIST Cloud computing
Reference Architecture (Special Publication 500-292) and the emergingISO/IEC/ITU DIS
17789 (Information technology – cloud computing – Reference Architecture).
The NIST uses the following diagram to show the overall model for cloud computing:
Source: NIST Special Publication 500-292
The ISO/ITU model is similar and includes concepts such as layering and views, cloud roles,
cloud services, cloud activities and components, and cross cutting aspects.

5. Cloud-based systems
Fundamental to cloud computing is the idea that what is delivered to the customer is
services, not systems consisting of dedicated hardware and software. Under the cloud computing
“covers” there may be many components that are shared by many customers.
This includes security services, administrative services, ecosystem services and performance
services. The vision is to make cloud IT pervasive and to achieve both the digital economy (for
business) and the digital society (for the public) leading to the “Digitally Interconnected
Society”.

6. Cloud deployment models


Cloud computing includes the deployment of shared resources that can be “rented” for use
as-needed. Cloud deployments can be characterized by ownership, operator, location,
complexity, target customers and services offered.
For example, a public cloud would typically be owned and operated by a service provider
and would be located on their premises (Amazon or Google, for example). A public cloud would
usually target the general public with a specific service. One good example is the Google Search
Service.

22
7. Cloud services
There is no single definition for a cloud service, although conceptually each service provides
a measured amount of one or more capabilities at specified levels of quality at an agreed upon
price. Three types of service are currently being defined:
• IaaS (Infrastructure as a Service) covers basic computing, storage and networking
capabilities;
• PaaS (Platform as a Service) provides capabilities for developing and deploying customer-
owned applications; and
• SaaS (Software as a Service) provides commonly used applications on a shared basis.
Offering services at one level does not imply that services are also offered for all lower levels,
however.

8. Overarching concerns
There are a number of overarching considerations that are generally applicable to any cloud
computing deployment and which have a major impact on the success of any cloud-based
system. For example:
• Governance and management: auditability, governance, regulatory, privacy, security and
service levels (SLAs);
• Qualities: availability, performance and resiliency;
• Capabilities: interoperability, maintainability, portability and reversibility

9. Cloud-specific risks
As with any new technology, there are business risks associated with cloud computing, both
for providers and customers. While the most visible of these has so far been security, there are
other important things to keep in mind, including:
• Supplier quality, longevity and lock-in
• Available expertise – technical, organizational and operational
• Adaptation of business processes to available services
• Financial management including changes in purchasing and variable bills
• Exploitation and innovation
Sum total is that there are many things to consider as you prepare to include cloud computing in
your IT solutions. These vary according to the role you will play, the services that are being
used and the maturity of your IT organization. As part of developing your policies and roadmaps

23
for cloud computing, I recommend creating a centre of cloud computing excellence to kick start
your journey.

Cloud technology
• Cloud Computing Technologies (CCT), as a cloud systems integrator and cloud service
provider, CCT specializes in cloud systems aggregation, cross-cloud platform integration,
application API integration, software development, and management of your cloud-
ecosystem. Our professional cloud services include cloud systems design and
implementation (private cloud, public cloud, or hybrid cloud), migration to shared services
and on-premises private cloud infrastructures.
• As your single point of contact for cloud integration, we explain third-party cloud service
level agreements, pricing models, and contracts as your trusted adviser.
• Organizations that seek do-it-yourself cloud shared services solutions, CCT, offers secure,
scalable, and on-demand cloud service through our enterprise level cloud partners, Amazon
Web Services Platform-as-a-Service (Paas).
• At all Cloud Computing Technologies services levels, we are proud of our track record of
delivering high-impact public cloud service with excellent customer satisfaction.
Our mission is “To provide high-quality Cloud Computing Shared Services Solutions to
accomplish our clients’ business goals and develop long-term relationships.
• Our commitment is to continuous improvement of shared services, deliverables, and
competitive pricing with current and emerging cloud computing technology.”

The Underlying Technology (Virtualization)


Cloud computing technologies could never exist without the use of the underlying
technology known as Virtualization. It allows abstraction and isolation of lower level
functionalities and underlying hardware. This enables portability of higher level functions and
sharing and/or aggregation of the physical resources .
Cloud computing heavily relies on virtualization as it virtualizes many aspects of the
computer including software, memory, storage, data and networks. Virtualization is known to
enables you to consolidate your servers and do more with less hardware. It also lets you support
more users per piece of hardware, deliver applications, and run applications faster .

24
These attributes that virtualization hold are the core of cloud computing technologies and
is what makes it possible for cloud computing’s key characteristics of multitenancy, massive
scalability, rapid elasticity and measured service to exist.
Three forms of virtualization There are three types of Virtualization known as Full, Para
and Isolation, each are explained below and are represented visually Full Virtualization – This
type of virtualization operates at the processor level, which supports unmodified guest operating
systems that simulate the hardware and software of the host machine.

Para-virtualization
Utilizes the use of a virtual machine monitor, which is software that allows a single
physical machine to support multiple virtual machines . It allows multiple virtual machines to
run on one host and each instance of a guest program is executed independently on their own
virtual machine.

Isolation
Is similar to Para virtualization although it only allows virtualization of the same
operating system as the host and only supports Linux systems but it is considered to perform the
best and operate the most efficiently.
As more businesses are starting to move to the cloud, they should be aware of the many
challenges that the technology is currently experiencing, its important that they are prepared to
encounter some of these challenges during their migration towards cloud technologies. Cloud
computing has been around for many years and has always been clouded in ambiguity as to what
the technology was and many individuals would provide their own interpretations and opinions
in defining various cloud delivery models.
This is very much to do with lack of standards and a clear definition of each aspect of
what cloud technology is and how it actually functions. Many cloud computing providers
admitted they consider standards as the first step to commoditization, something they would
rather not see this early in the emerging market . So the lack of standards is partially to do with
many cloud providers not wanting them defined yet, which is certainly going to cause more
ambiguity and possibly slow down adoption of cloud technologies.

25
Cloud consumers do not have control over the underlying computing resources, they do
need to ensure the quality, availability, reliability, and performance of these resources when
consumers have migrated their core business functions onto their entrusted cloud [13]. Cloud
service providers need to be transparent and responsible for the services they provide for their
consumers to create consumer confidence in their services.
Consumer confidence can be achieved through a mutual agreement commonly referred to
as a Service Level Agreement (SLA). By migrating to the cloud service provider’s infrastructure
means that they have a large responsibility for the consumers’ data and services to be maintained
and made available with the specification outlined in the SLA.

Future of the technology


As cloud computing is a relatively new delivery model it’s future is not fully known but
seeing as its popularity and excitement around the technology is constantly growing, it’s safe to
say that cloud computing is here to stay.
Short-term forecasts predict that in 2012 80% of new commercial enterprise apps will be
deployed on cloud platforms , which illustrates that cloud adoption is set to rise exponentially
this year.
In the long-term technology experts and stakeholders say they expect they will ‘live
mostly in the cloud’ in 2020 and not on the desktop, working mostly through cyberspace- based
applications accessed through networked devices .
The many stakeholders and enthusiasts of the technology see it as the next step in
computing, with many businesses and individual users in the future using cloud technology in
some shape of Cloud Computing Technologies (Sean Carlin) 63 form.

26
QUESTIONS
5 MARKS
1. Write short notes on cloud computing?
2. Write about cloud models?
3. Explain cloud based services.
4. What is mean by cloud technology? Explain it.

10 MARKS
5. Explain about characteristics of cloud computing.
6. Discuss about cloud based services and application.
7. Explain about cloud concept and technology.
8. What is mean by cloud services? Give example of cloud services.

UNIT I COMPLETED

27
UNIT II
CLOUD SERVICES AND PLATFORM
COMPUTE SERVICES
• Compute as a service (CaaS) is the provisioning of computing resources (access to raw
compute or server capacity) on demand.
• This cloud computing layer involves the delivery of virtual or physical resources as a
service, priced via a consumption-based model.
• CaaS was the original cloud offering, emerging in 2006 via the launch of Amazon’s EC2.
While EC2 continues to enjoy a significant market-share lead, more than 50 vendors have
entered the space to challenge the leader for a piece of the pie. With commoditization taking
hold, the sector faces unique challenges but remains a core part of the broader cloud
universe.
• This report examines the revenue being generated by CaaS providers, predicts the growth
trends for the space, and highlights the opportunities and threats facing vendors.

Compute Services – Amazon EC2


• Amazon Elastic Compute Cloud (EC2) is a compute service provided by Amazon.
• Launching EC2 Instances
To launch a new instance click on the launch instance button. This will open a wizard
where you can select the Amazon machine image (AMI) with which you want to launch the
instance. You can also create their own AMIs with custom applications, libraries and data.
Instances can be launched with a variety of operating systems.
• Instance Sizes
When you launch an instance you specify the instance type (micro, small, medium, large,
extra-large, etc.), the number of instances to launch based on the selected AMI and availability
zones for the instances.
• Key-pairs

28
When launching a new instance, the user selects a key-pair from existing keypairs or
creates a new keypair for the instance. Keypairs are used to securely connect to an instance after
it launches.
• Security Groups
The security groups to be associated with the instance can be selected from the instance
launch wizard. Security groups are used to open or block a specific network port for the launched
instances.

Compute Services – Google Compute Engine


• Google Compute Engine is a compute service provided by Google.
• Launching Instances
To create a new instance, the user selects an instance machine type, a zone in which the
instance will be launched, a machine image for the instance and provides an instance name,
instance tags and meta-data.
• Disk Resources
Every instance is launched with a disk resource. Depending on the instance type, the disk
resource can be a scratch disk space or persistent disk space. The scratch disk space is deleted
when the instance terminates. Whereas, persistent disks live beyond the life of an instance.
• Network Options
Network option allows you to control the traffic to and from the instances. By default,
traffic between instances in the same network, over any port and any protocol and incoming SSH
connections from anywhere are enabled.

Compute Services – Windows Azure VMs


• Windows Azure Virtual Machines is the compute service from Microsoft.
• Launching Instances:
To create a new instance, you select the instance type and the machine image.
• You can either provide a user name and password or upload a certificate file for securely
connecting to the instance.
• Any changes made to the VM are persistently stored and new VMs can be created from the
previously stored machine images.

29
STORAGE SERVICES
A cloud storage service is a business that maintains and manages its customers' data and
makes that data accessible over a network, usually the internet.
Most of these types of services are based on a utility storage model. They tend to offer
flexible, pay-as-you-go pricing and scalability. Cloud storage providers also provide for
unlimited growth and the ability to increase and decrease storage capacity on demand.
Leading use cases for a cloud storage service include backup, disaster recovery (DR),
collaboration and file sharing, archiving, primary data storage and near-line storage.

Public vs. private vs. hybrid services


Public cloud storage is a service owned and operated by a provider. It is usually suitable
for unstructured data that is not subject to constant change. The infrastructure usually consists of
inexpensive storage nodes attached to commodity drives. Data is stored on multiple nodes for
redundancy and accessed through internet protocols, typically representational state transfer
(REST).
Designed for use by many clients, a public cloud storage service supports massive multi-
tenancy with data isolation, access and security for each customer. It is generally used for
purposes ranging from static noncore application data to archived content that must still be
available for DR and backup.
Some enterprise users opt for a hybrid cloud storage model that stores unstructured data -
- for backup and archiving purposes, for example -- and less sensitive data with a public cloud
provider, while a private cloud is used for active, structured and more sensitive data.
When considering any cloud storage service, you need to consider the following:

• Does the service use REST, the most commonly used cloud storage API?

• Are you migrating data from an existing archival storage product?

30
• Does your data have to be preserved in some specific format to meet compliance
requirements? That capacity is not commonly available.

• Can the provider deal with large fluctuations in resource demands?

• Does the provider offer both public and private clouds? This may become important if you
want to migrate data from one type of service to the other.

Cloud storage pros/cons


Advantages of private cloud storage include high reliability and security. But this
approach to cloud storage provides limited scalability and requires on-site resources and
maintenance.
Public cloud storage offers high scalability and a pay-as-you-go model with no need for
an on-premises storage infrastructure. However, performance and security measures can vary by
service provider. In addition, reliability depends on service provider availability and internet
connectivity.

Advantages of a hybrid cloud


George Crump, president of analyst firm Storage Switzerland, explains some benefits of the
hybrid cloud.

Hybrid cloud storage offers the best of the private and public cloud with high scalability
and on-premises integration that adds more layers of security. The result is better performance
and reliability because active content is cached locally. While a hybrid cloud tends to be more
costly than public storage, it is cheaper than private cloud storage. Reliability can be an issue, as
users must depend on service provider availability and internet connectivity.

Select storage type with care.


Choose a storage service that delivers the amount of performance and resilience most
suitable for your workload at the least possible cost.

Gauge availability and performance.


Metrics enable users to monitor and measure when a public cloud storage service
performs as it should or has issues. Having access to these metrics eases troubleshooting and
facilitates improvements to architectures and workload designs.

31
Adapt to the behaviors of the cloud storage service.
The way a cloud storage provider stores and provides access to data cannot be changed
by customers to address unexpected variations in performance as they share the infrastructure
with many other organizations.
But clients do have the ability to redesign the architecture of their workloads by
duplicating storage resources in more than one public cloud region, for example. This way, cloud
storage customers can redirect storage resources to the replicated region should problems arise.
Caching can also be used to address -- and head off -- potential cloud storage service
performance issues.

Consider the hybrid approach.


Deploy dedicated tools to accelerate connectivity between your on-premises data center
and cloud storage when local workloads can't surmount the performance limitations of a public
cloud storage service.

Improve connectivity.
Sometimes, performance issues are caused by shortcomings in the internet connection
itself due to unexpected disruption and congestion -- always a risk when using the public
internet.

Cloud storage and data migration


Migrating data from one cloud storage service to another is an often-overlooked area.
Cloud migrations have become more common due to market consolidation and price
competition.
Businesses tend to switch cloud storage providers either because of price -- which must
be substantially cheaper to justify the cost and work of switching -- or when a cloud provider
goes out of business or stops providing storage services. With public cloud providers, it is
usually just as easy to copy data out of the cloud as it was to upload data to it. Available
bandwidth can become a major issue, however. In addition, many providers charge extra to
download data.

CLOUD DATABASE SERVICES


• Many different cloud database service providers are working who provide database as a
service that is further divided into major three categories. There are rational database, non-

32
rational database and operating virtual machine loaded with local database software like
SQL.
• There are different companies offering database as a service, DBaaS like Amazon RDS,
Microsoft SQL Azure, Google AppEngine Datastore and Amazon SimpleDB (Pizzete and
Cabot 2012). Each service provider is different from the other depending upon the quality
and sort of services being provided.
• There are certain parameters that can be used to select the best service that will suit for your
company. This is not limited to a certain company; these parameters can help in deciding the
best service provider depending upon the requirements of any company.

Choosing best DBaaS


The selecting of DBaaS depends not only on the services being provided by the company,
but it also depends on the requirements of the company as well. There are certain parameters that
can be taken as a guide to choose the best DBaas.

Data Sizing
Every DBaaS provider has a different capacity of storing data on the database. The data
sizing is very important as the company will need to be sure about the size of data that it will be
stored in its database. For example, the Amazon RDS allows the user to store up to 1TB of data
in one database on the other hand SQL Azure offers only 50GB of data for one database.

33
Portability
The database should be portable as the database should never be out of the access of the
user. The service provider may go out of business, so the database and the data stored can be
destroyed. There should be an emergency plan if such things happen. This can be resolved by
taking cloud services from other companies as well so that the database is accessible even in the
case of emergency.

Transaction Capabilities
The transaction capabilities are the major feature of the cloud database as the completion
of the transaction is very important for the user. The user must be aware if the transaction has
been successful or not. There are companies who mostly do transact money, in this situation the
complete read and write operations must be accomplished. The user needs a guarantee of the
transaction he made, and this sort of transaction is called an ACID transaction (Pizzete and Cabot
2012). If there is no need of the guarantee then the transactions can be made by non ACID
transactions. This will be faster as well.

Configurability
There are many databases that can easily configurable by the user as most of the
configuration are done by the service provider. In this way there are very less options available
left to the administrator of the database and he can easily manage the database without more
efforts.

Database Accessibility
As there are different number of databases, the mechanism for accessing the database are
different as well. The first method is the one that is RDBMS being offered through the standards
of the industry drivers such as Java Database Connectivity. The motive of this driver is that
allows the external connection to access the services through the standard connection. The
second accessibility of the database is that by the usage of interfaces or protocols like, Service-
Oriented Architecture (SOA) and SOAP or rest (Pizzete and Cabot 2012). These interfaces use
HTTP and some new API definition.

34
Certification and Accreditation
It is better to get the services of the cloud database provider, who have got certification
and accreditation. It helps in mitigating the risks of services for the company to avoid any
inconvenience. The companies who have certifications like FISMA can be considered reliable as
compared to other DBaaS provider.
International Journal of Database Management Systems Data Integrity, Security and
Storage Location Security has been the major threat to the data stored in the cloud storage. The
security also depends on the encryption methods used and the storage locations of the data.The
data is stored in the different locations in data centers.

APPLICATIONS SERVICES
Applications as a service refers to the delivery of computer software applications as a
service via the Internet. This type of software is also referred to as SaaS (Software as a Service),
software on demand and on-demand software. On-demand software has been gaining an
increasing share of the software market, due to the cost savings and efficiency gains it can offer
to organizations, regardless of their size. On-demand software provides financial benefits to
organizations, by eliminating the expense of individual user licenses which normally accompany
traditional on-premise software delivery.

Applications as a service can also provide software to enterprise users more efficiently,
because it can be distributed and maintained for all users at a single point – in the public cloud
(Internet). The efficiency gains are facilitated through the use of various automation tools that
can be implemented on the cloud services platform. The automation of functions such as: user
provisioning, user account management, user subscription management and application life cycle
35
management make on-demand software a highly efficient and cost effective way to deliver
software to enterprise users.
Companies that provide applications as a service (on-demand software) are known as ASPs
or application service providers. ASPs are divided into 4 major categories, as follows:

• Specialist or Functional ASPs – who provide a single application (e.g. a payroll


solution)
• Vertical Market ASPs – who provide a suite of software applications for a particular
market (e.g. software for the food service industry)
• Enterprise ASPs – who provide integrated applications for management of multiple
functional areas within a single organization.
• Local ASPs – who provide a variety of applications, primarily to small businesses,
within a specific geographical area

ASPs own the software that they deliver to consumers, as well as the hardware which
supports the software. ASPs bill on a per use basis, on a monthly basis or on an annual basis –
making software on demand a very affordable option for many organizations. On-demand
software provides small and medium size businesses with a method of accessing software that
may have previously been financially out of reach, due to software licensing costs and additional
hardware costs.

CONTENT DELIVERY SERVICES


CDNs have made a significant impact on how content is delivered via the Internet tothe
end-users . Traditionally content providers have relied on third-party CDNs todeliver their
content to end-users. With the ever changing landscape of content typese.g. moving for standard
definition video to high definition to full high definition, it is achallenge for content providers
who either supplement their existing delivery networkswith third-party providers or completely
rely on them to understand and monitor theperformance of their service. Moreover, the
performance of the CDN is impacted bythe geographical availability of the third-party
infrastructure.

36
A cloud CDN (CCDN)provides a flexible solution allowing content providers to
intelligently match and placecontent on one or more cloud storage servers based on coverage,
budget and QoS preferences . The key implication is economies of scale and the benefits
delivered by the pay-as-you-go model. Using clouds the content providers have more agility in
managing situations such as flash crowds avoiding the need to invest in infrastructure
development.
Pay-as-you-go CCDN model
CCDN allows the users to consume the delivery content using a pay-as-you-go model.
Hence, it would be much more cost-effective than owning the physical infrastructure that is
necessary for the users to be the part of CDN.
Increased point-of-presence
The content is moved closer to users with relative ease in the CCDN system than the
traditional CDN due to the omnipresence of cloud .The Cloud-based content delivery network
can reduce the transmission latency as It can rent operating resources from the cloud provider to
increase the reach and visibility of the CDN on-demand.
CCDN Interoperability
CDN interoperability has emerged as a strategic important concept for service providers
and content providers. Interoperability of CDNs via the cloud will allow content providers to
reach new markets and regions and support nomadic users. E.g., instead of setting up an

37
infrastructure to serve a small group of customers in Africa, taking advantage of current cloud
providers in the region to dynamically host surrogate servers.
Support for variety of CCDN application
The cloud can support dynamic changesin load. This will facilitate the CDNs to support
different kinds of applications thathave unpredictable bursting traffic, predictable bursting traffic,
scale up and scaledown of resources and ability to expand and grow fast.However, while cloud-
based CDNs have made a remarkable progress in the pastfive years, they are still limited in a
number of aspects. For instance, moving into thecloud might carry some marked security and
performance challenges that can impact theefficiency and productivity of the CDN thus affecting
the client’s business.
Dynamic Content Management
CDNs are designed for streaming staged content but do not perform well in
situationswhere content is produced dynamically. This is typically the case when content is pro-
duced, managed and consumed in collaborative activities. For example, an art teachermay find
and discuss movies from different film archives; the students may then edit theselected movies.
Parts of them may be used in producing new movies that will be sent tothe students’friends for
comments and suggestions. Current CDNs do not support suchcollaborative activities that
involve dynamic content creation.
CCDN Ownership
Cloud CDN service providers either own all the services they use to run their
CDNservices or they outsource this to a single cloud provider. A specialized legal and technical
relationship is required to make the CDN work in the latter case.
CCDN Personalization
CDNs do not support content personalization. For example, if the subscriber’s behavior
and usage pattern can be observed, a better estimation on the traffic demand can be achieved.
The performance of content delivery is moving from speed and latency to on-demand delivery of
relevant content matching end-user’s interest and context.
Cost Models for Cloud CDNs
The cloud cost model works well as long as the network consumption is predictable for
both service provider and end-user. However, such predictions become very challenging with
distributed cloud CDNs.

38
Security
CDNs also impose security challenges due to the introduction public clouds to store,
share and route content. The use of multi vendor public clouds further complicates this problem.
Security is the protection of content against unauthorised usage, modification, tampering and
protection against illegal use, hack attacks, viruses and other unwanted intrusions. Further,
security also plays an important role while accessing and delivering content to relevant users .

Hybrid Clouds
The integration of cloud and CDN will also allow the development of hybrid CCDN that
can leverage on a combination and private and public cloud providers. E.g. the content provider
can use a combination of cloud service platforms offered by Microsoft Azure and Amazon AWS
to host their content. Depending on the pay-as-you go model, the content provider can also move
from one cloud provider to another. However, achieving a hybrid model is very challenging due
to various CCDN ownership issues and QoS issues.

ANALYTIC SERVICES
Analytics as a service (AaaS) refers to the provision of analytics software and operations
through web-delivered technologies. These types of solutions offer businesses an alternative to
developing internal hardware setups just to perform business analytics.
To put analytics as a service in context, this type of service is part of a much wider range of
services with similar names and similar ideas, including:

• Software as a service (SaaS)


• Platform as a service (PaaS)
• Infrastructure as a service (IaaS)
What these all have in common is that the service model replaces internal systems with web-
delivered services. In the example of analytics as a service, a provider might offer access to a
remote analytics platform for a monthly fee. This would allow a client to use that particular
analytics software for as long as it is needed, and to stop using it and stop paying for it at a future
time.
Analytics as a service is becoming a valuable option for businesses because setting up
analytics processes can be a work-intensive process. Businesses that need to do more analytics

39
may need more servers and other kinds of hardware, and they may need more IT staff to
implement and maintain these programs.
Advantages
Agile Computing Resources
Instead of handling speed and delivery time related hassles from your on-premise servers,
cloud computing resources are high-powered and can deliver your queries and reports in no-time.

Ad hoc Deployment of Resources for Better Performance


If you are having an in-house analytics team, you should be concerned about an efficient
warehouse, latency of your data over poor public internet, being up-to date with advanced tools
and experience in handling the high demands for real-time BI or emergency queries.

Match, Consolidate and Clean Data Effortlessly


Real time Cloud analytics with real-time access to your online data keeps your data up-to
date and organized, helping your Operations and Analytics teams function under the same roof.
This makes sure of no mismatches and delays, helping you to also predict and implement finer
decisions.

Accessibility
Cloud services are capable in sharing data and visualization and performing cross-
organizational analysis, making the raw data more accessible and perceivable by a broader user
base.

DEPLOYMENT AND MANAGEMENT SERVICES


Comprehensive cloud management is vital to protect cloud assets against vulnerabilities,
data loss and downtimes. Cloud's true potential cannot be harnessed by managing it like a data
center. End-to-end visibility of infrastructure and applications helps remediate performance and
utilization issues and increases productivity, security and compliance.

Mindtree’s rich infrastructure and application experience enables high availability and
continuous optimization in a hybrid cloud across the business application ecosystem.
Mindtree cloud management services include:

Risk reports and remediation plans Forecast and trends reporting

40
Optimization in spending Regular operational metrics

Disaster recovery test reports Improvement plans

Automation and DevOps

Mindtree has developed a distinctive approach to deliver management through its proven
cloud management platform and skilled workforce. Our platform delivers integrated
functionality of ITSM, monitoring, APM and log analytics, and change and issue resolution. We
also enable integration of cloud with legacy IT landscape. Additionally, we provide a single
window to view the health of IT resources in hybrid or multi-cloud environments.
Mindtree addresses the challenges of both application and infrastructure, jointly referred
to as AppliStructure. Our approach is to deliver security-as-hygiene. This means building
security at every step of the delivery process rather than transposing security in isolation.
Needless to say, we deploy the right tools to help in thread prediction, identification and
remediation.

Benefits
Cost savings
By outsourcing your cloud managed services, you control and reduce costly network
maintenance costs. Staffing a full-time IT department is expensive and often unnecessary for
small to medium-sized businesses with simple networks. Outsourcing to a cloud-first managed
services provider like Agile IT can save you thousands each year on the cost of an in-house IT
department.
Predictable, recurring monthly costs
With the flexibility of cloud managed services, you decide how much you’re willing to
pay for IT services and have a consistent monthly bill.
With a fixed monthly service plan that’s customized to fit your needs or budget, you
optimize the amount you pay for IT support.

41
Future-proofed technology
Migrating to a cloud environment is the first step in future-proofing your data center.
Next, you’ll need make the latest technology and services available to your business.
By hiring an in-house IT staff, your IT personnel will have to spend company time
training when a new technology or required upgrade gets released. Cloud technicians are already
prepared to manage the latest technology.
Disaster recovery
Services are the lifeline of a cloud managed service provider. Agile IT has designed
countless networks and data centers with proven redundancy and resiliency to maintain business
continuity.
With cloud managed services, your data will be safe and secured across all cloud services
and applications. In the event of a disaster, your business and operations can continue with
minimal downtime.
Fast response times
Your businesses can expect quick response times through enterprise-level monitoring and
remote cloud services. Agile IT can access, monitor and repair virtually any network issue
remotely. If you must resolve an issue locally, a technician should be dispatched within the same
business day.

IDENTITY AND ACCESS MANAGEMENT SERVICES


Identity Management
An identity is a set of unique characteristics of an entity: an individual, a subject, or an
object. An identity used foridentification purposes is called an identifier [1].Entity identifiers are
used for authentication to service providers (SPs). Identifiers provide assurance to an SP about
the entity’s identity, which helps the SP to decide whether topermit the entity to use a service or
not. Entities may have multiple digital identities. An Identity Management System (IDM)
supports the management of these multiple digital identities. It also decides how to best disclose
PII to obtain a particular service. IDM performs the following tasks .
1) Establish identities: Associate PII with an entity.
2) Describe identities: Assign attributes identifying an entity.
3) Record the use of identity data: Log identity activity in a system and/or provides
access to the logs.
42
4) Destroy an identity: Assign expiration date to PII. PII become unusable after the
expiration date.

Identity Management can involve three perspectives


1. The pure identity paradigm: creation, management and deletion of identities without
regard to access or entitlements.
2. The user access (log on) paradigm: A traditional method say for ex a user uses the
smart card to log on to a service.
3. The service paradigm: A system that delivers personalized role based, online, on-
demand, presence based services to users and their devices.

The Identity Life Cycle


The Identity Life Cycle is closely related to the concept of digital identities. It
comprisesthree main steps:
• Provisioning
• Maintenance
• Deprovisioning
Provisioning
The term provisioning is often explained with an example of a new employee. When
people join a new company, they often need physical objects like an office, a desk, a phone, a
key card, etc. Likewise, many collections of digital information need to be created for the new
employee describing who s/he is and what his/her roles and entitlements (access rights and
privileges) are within the organization.
The allocation of these digital objects and the creation of the digital identity information
that enables the necessary services for a user is called provisioning [7].
This idea can however be expanded to more than just people joining the company. Many
individuals from outside the company might also need provisioning, for example customers,
vendors and business partners.

Maintenance

43
• Identity information is prone to changes over time. During the Identity Life
Cycle,modifications of it will therefore most likely be necessary. Synchronization plays a
substantial role here.
• As the information is updated in one data store, it is often desired that this will be distributed
automatically to other data stores using certain synchronization processes that are in place.
• An example of this would be the change inhome address of a certain employee.

Purposed approaches for IAM


• As auditing information can also be provided by applications running on top of an IAM
solution, an additional component, Applications, is also shown. This component however is
strictly seen not part of IAM.
• In the next chapter of this report, a project will be described in which such an application is
developed. A few times it has already been hinted why a company might choose to develop
and implement an IAM solution.
• As already mentioned however, the impact and complexity of IAM projects is so heavy,
that the amount of resources needed and the costs are factors that carry an enormous weight.
For this reason, it is critical to answer the question on why one would want to start an IAM
project in the first place.

IAM Architecture
As we will now talk about technologies however, this functional approach is not very
practical anymore. We will hence look at IAM from a more architectural point of view now,
starting with the following diagram inspired.
Before we will focus on the different components of IAM systems as depicted in
thisoverview, some initial considerations are deemed necessary. In the center of the diagram, we
see the Directory Services component, which can be considered the core of IAM. While all IAM
components can in fact be deployed independent from any IAM architecture.

OPEN SOURCE PRIVATE CLOUD SOFTWARE


Identity & Access Management (IDAM) services allow managing the authentication and
authorization of users to provide secure access to cloud resources.
• Using IDAM services you can manage user identifiers, user permissions, security
credentials and access keys.
44
• Amazon Identity & Access Management
• AWS Identity and Access Management (IAM) allows you to manage users and user
permissions for an AWS account.
• With IAM you can manage users, security credentials such as access keys, and
permissions that control which AWS resources users can access
• Using IAM you can control what data users can access and what resources users can
create.
• IAM also allows you to control creation, rotation, and revocation security credentials of
users.

• Windows Azure Active Directory


• Windows Azure Active Directory is an Identity & Access Management Service from
Microsoft.
• Azure Active Directory provides a cloud-based identity provider that easily integrates
with your on-premises active directory deployments and also provides support for third party
identity providers.
• With Azure Active Directory you can control access to your applications in Windows
Azure.

Open Source Private Cloud Software - CloudStack


• Apache CloudStack is an open source cloud software that can be used for creating private cloud
offerings.
• CloudStack manages the network, storage, and compute nodes that make up a cloud
infrastructure.
• A CloudStack installation consists of a Management Server and the cloud infrastructure that it
manages.
• Zones
• The Management Server manages one or more zones where each zone is typically a
single datacenter.
• Pods
• Each zone has one or more pods. A pod is a rack of hardware comprising of a switch
and one or more clusters.

45
• Cluster
• A cluster consists of one or more hosts and a primary storage. A host is a compute node
that runs guest virtual machines.
• Primary Storage
• The primary storage of a cluster stores the disk volumes for all the virtual machines
running on the hosts in that cluster.
• Secondary Storage
• Each zone has a secondary storage that stores templates, ISO images, and disk volume
snapshots.
Open Source Private Cloud Software - OpenStack
• Eucalyptus is an open source private cloud software for building private and hybrid clouds that
are compatible with Amazon Web Services (AWS) APIs.
• Node Controller
• NC hosts the virtual machine instances and manages the virtual network endpoints.
• The cluster-level (availability-zone) consists of three components
• Cluster Controller - which manages the virtual machines and is the front-end for a
cluster.
• Storage Controller – which manages the Eucalyptus block volumes and snapshots to
the instances within its specific cluster. SC is equivalent to AWS Elastic Block Store (EBS).
• VMWare Broker - which is an optional component that provides an AWS-compatible
interface for VMware environments.

1. OwnCloud

A Dropbox replacement for Linux users, giving many functionalities which are similar to
that of DropBox, ownCloud is a self-hosted file sync and share server.
Its open source functionality provides users with access to unlimited amount of storage
space. Project started in January 2010 with aim to provide open source replacement for
proprietary cloud storage service providers. It is written in PHP, JavaScript and available for
Windows, Linux, OS X desktops and even successfully provides mobile clients for Android and
iOS.

46
Another file hosting software system which exploits open source property to avail its
users with all advantages they expect from a good cloud storage software system. It is written in
C, Python with latest stable release being 4.4.3 released on 15th October 2015.

2. Seafile
Seafile provides desktop client for Windows, Linux, and OS X and mobile clients for
Android, iOS and Windows Phone. Along with a community edition released under General
Public License, it also has a professional edition released under commercial license which
provides extra features not supported in community edition i.e. user logging and text search.
Since it got open sourced in July 2012, it started gaining international attention. Its main
features are syncing and sharing with main focus on data safety. Other features of Seafile which
have made it common in many universities like: University Mainz, University HU Berlin and
University Strasbourg and also among other thousands of people worldwide are: online file
editing, differential sync to minimize the bandwidth required, client-side encryption to secure
client data.

3. Pydio
Earlier known by the name AjaXplorer, Pydio is a freeware aiming to provide file
hosting, sharing and syncing. As a project it was initiated in 2009 by Charles du jeu and since
2010, it is on all NAS equipment’s supplied by LaCie.
Pydio is written in PHP and JavaScript and available for Windows, Mac OS and Linux
and additionally for iOS and Android also. With nearly 500,000 downloads on Sourceforge, and
acceptance by companies like Red Hat and Oracle, Pydio is one of the very popular Cloud
Storage Software in the market.
In itself, Pydio is just a core which runs on a web server and can be accessed through any
browser. Its integrated WebDAV interface makes it ideal for online file management and
SSL/TLS encryption makes transmission channels encrypted securing the data and ensuring its
privacy. Other features which come with this software are: text editor with syntax highlighting,
audio and video playback, integration of Amazon, S3, FTP or MySQL Databases, image editor,
file or folder sharing even through public URL’s.

APACHE HADOOP
Apache Hadoop is an open source software framework for storage and large scale
processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level
47
project being built and used by a global community of contributors and users. It is licensed under
the Apache License 2.0.
Hadoop was created by Doug Cutting and Mike Cafarella in 2005. It was originally
developed to support distribution for the Nutch search engine project. Doug, who was working at
Yahoo! at the time and is now Chief Architect of Cloudera, named the project after his son's toy
elephant. Cutting's son was 2 years old at the time and just beginning to talk. He called his
beloved stuffed yellow elephant "Hadoop" (with the stress on the first syllable). Now 12, Doug's
son often exclaims, "Why don't you say my name, and why don't I get royalties? I deserve to be
famous for this!"
The Apache Hadoop framework is composed of the following modules
1. Hadoop Common: contains libraries and utilities needed by other Hadoop modules
2. Hadoop Distributed File System (HDFS): a distributed file-system that stores data on the
commodity machines, providing very high aggregate bandwidth across the cluster
3. Hadoop YARN: a resource-management platform responsible for managing compute resources
in clusters and using them for scheduling of users' applications
4. Hadoop MapReduce: a programming model for large scale data processing

For the end-users, though MapReduce Java code is common, any programming language
can be used with "Hadoop Streaming" to implement the "map" and "reduce" parts of the user's
program. Apache Pig and Apache Hive, among other related projects, expose higher level user
interfaces like Pig latin and a SQL variant respectively. The Hadoop framework itself is mostly
written in the Java programming language, with some native code in C and command line
utilities written as shell-scripts.
48
HDFS and MapReduce

There are two primary components at the core of Apache Hadoop 1.x: the Hadoop
Distributed File System (HDFS) and the MapReduce parallel processing framework. These are
both open source projects, inspired by technologies created inside Google.

Hadoop distributed file system


The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-
system written in Java for the Hadoop framework. Each node in a Hadoop instance typically has
a single namenode, and a cluster of datanodes form the HDFS cluster. The situation is typical
because each node does not require a datanode to be present. Each datanode serves up blocks of
data over the network using a block protocol specific to HDFS. The file system uses the TCP/IP
layer for communication. Clients use Remote procedure call (RPC) to communicate between
each other.

49
• HDFS stores large files (typically in the range of gigabytes to terabytes) across multiple
machines.
• It achieves reliability by replicating the data across multiple hosts, and hence does not
require RAID storage on hosts. With the default replication value, 3, data is stored on
three nodes: two on the same rack, and one on a different rack. Data nodes can talk to
each other to rebalance data, to move copies around, and to keep the replication of data
high.

HADOOP MAP REDUCE JOB EXECUTION


In this Hadoop blog, we are going to provide you an end to end MapReduce job
execution flow. Here we will describe each component which is the part of MapReduce working
in detail. This blog will help you to answer how Hadoop MapReduce work, how data flows in
MapReduce, how Mapreduce job is executed in Hadoop?

50
What is MapReduce?
Hadoop MapReduce is the data processing layer. It processes the huge amount of
structured and unstructured data stored in HDFS. MapReduce processes data in parallel by
dividing the job into the set of independent tasks. So, parallel processing improves speed and
reliability.
Hadoop MapReduce data processing takes place in 2 phases- Map and Reduce phase.
• Map phase- It is the first phase of data processing. In this phase, we specify all the
complex logic/business rules/costly code.
• Reduce phase- It is the second phase of processing. In this phase, we specify light-
weight processing like aggregation/summation.
Steps of MapReduce Job Execution flow
MapReduce processess the data in various phases with the help of different components.
Let’s discuss the steps of job execution in Hadoop.
Input Files
In input files data for MapReduce job is stored. In HDFS, input files reside. Input files
format is arbitrary. Line-based log files and binary format can also be used.
InputFormat
After that InputFormat defines how to split and read these input files. It selects the files
or other objects for input. InputFormat creates InputSplit. Learn about InputFormat in detail.
InputSplits
It represents the data which will be processed by an individual Mapper. For each split,
one map task is created. Thus the number of map tasks is equal to the number of InputSplits.
Framework divide split into records, which mapper process. Learn about InputSplit in detail.
RecordReader
It communicates with the inputSplit. And then converts the data into key-value pairs
suitable for reading by the Mapper. RecordReader by default uses TextInputFormat to convert
data into a key-value pair. It communicates to the InputSplit until the completion of file reading.
It assigns byte offset to each line present in the file. Then, these key-value pairs are further sent
to the mapper for further processing. Learn about RecordReader in detail.
Mapper
It processes input record produced by the RecordReader and generates intermediate key-
value pairs. The intermediate output is completely different from the input pair. The output of the
mapper is the full collection of key-value pairs. Hadoop framework doesn’t store the output of

51
mapper on HDFS. It doesn’t store, as data is temporary and writing on HDFS will create
unnecessary multiple copies. Then Mapper passes the output to the combiner for further
processing. Learn about Mapper in detail.

HADOOP SCHEDULERS
The default Scheduling algorithm is based on FIFO where jobs were executed in the
order of their submission. Later on the ability to set the priority of a Job was added. Facebook
and Yahoo contributed significant work in developing schedulers i.e. Fair Scheduler [7] and
Capacity Scheduler respectively which subsequently released to Hadoop Community.

Default FIFO Scheduler


The default Hadoop scheduler operates using a FIFO queue. After a job is partitioned into
individual tasks, they are loaded into the queue and assigned to free slots as they become
available on Task Tracker nodes. Although there is support for assignment of priorities to jobs,
this is not turned on by default. Typically each job would use the whole cluster, so jobs had to
wait for their turn. Even though a shared cluster offers great potential for offering large resources
to many users, the problem of sharing resources fairly between users requires a better scheduler.

Fair Scheduler
• The Fair Scheduler allocates resources evenly between multiple jobs and also provides
capacity guarantees
. • Fair Scheduler assigns resources to jobs such that each job gets an equal share of the
available resources on average over time.
• Tasks slots that are free are assigned to the new jobs, so that each job gets roughly the
same amount of CPU time.
• Job Pools
• The Fair Scheduler maintains a set of pools into which jobs are placed. Each pool has a
guaranteed capacity.
• When there is a single job running, all the resources are assigned to that job. When there
are multiple jobs in the pools, each pool gets at least as many task slots as guaranteed.
• Each pool receives at least the minimum share.
• When a pool does not require the guaranteed share the excess capacity is split between
other jobs.
52
• Fairness
• The scheduler computes periodically the difference between the computing time
received by each job and the time it should have received in ideal scheduling.
• The job which has the highest deficit of the compute time received is scheduled next.

Capacity Scheduler
• The Capacity Scheduler has similar functionally as the Fair Scheduler but adopts a
different scheduling philosophy.
• Queues
• In Capacity Scheduler, you define a number of named queues each with a configurable
number of map and reduce slots.
• Each queue is also assigned a guaranteed capacity.
• The Capacity Scheduler gives each queue its capacity when it contains jobs, and shares
any unused capacity between the queues. Within each queue FIFO scheduling with priority is
used.
• Fairness
• For fairness, it is possible to place a limit on the percentage of running tasks per user, so
that users share a cluster equally.
• A wait time for each queue can be configured. When a queue is not scheduled for more
than the wait time, it can preempt tasks of other queues to get its fair share.

SCHEDULER IMPROVEMENTS
Many researchers are working on opportunities for improving the scheduling policies in
Hadoop. Recent efforts such as Delay Scheduler , Dynamic Proportional Scheduler offer
differentiated service for Hadoop jobs allowing users to adjust the priority levels assigned to
their jobs. However, this does not guarantee that the job will be completed by a specific deadline.
Deadline Constraint Scheduler addresses the issue of deadlines but focuses more on increasing
system utilization. The Schedulers described above attempt to allocate capacity fairly among
users and jobs, they make no attempt to consider resource availability on a more fine-grained
basis. Resource Aware Scheduler considers the resource availability to schedule jobs. In the
following sections we compare and contrast the work done by the researchers on various
Schedulers.

53
Longest Approximate Time to End (LATE)
Speculative Execution It is not uncommon for a particular task to continue to progress
slowly. This may be due to several reasons like–high CPU load on the node, slow background
processes etc. All tasks should be finished for completion of the entire job. The scheduler tries to
detect a slow running task to launch another equivalent task as a backup which is termed as
speculative execution of tasks. If the backup copy completes faster, the overall job performance
is improved.

Delay Scheduling Fair scheduler


Delay Scheduling Fair scheduler is developed to allocate fair share of capacity to all the
users. Two locality problems identified when fair sharing is followed are – head-of-line
scheduling and sticky slots The first locality problem occurs in small jobs (jobs that have small
input files and hence have a small number of data blocks to read). The problem is that whenever
a job reaches the head of the sorted list for scheduling, one of its tasks is launched on the next
slot that becomes free irrespective of which node this slot is on. If the head-of-line job is small, it
is unlikely to have data locally on the node that is given to it. Head-of-line scheduling problem
was observed at Facebook in a version of HFS without delay scheduling.

Dynamic Priority Scheduling


Thomas Sandholm et al proposed Dynamic Priority Scheduler that supports capacity
distribution dynamically among concurrent users based on priorities of the users. Automated
capacity
allocation and redistribution is supported in a regulated task slot resource market. This approach
allows users to get Map or Reduce slot on a proportional share basis per time unit. These time
slots can be configured and called as allocation interval. It is typically set to somewhere between
10 seconds and 1minute. For example a max capacity of 15 Map slots gets allocated
proportionally to three users. The central scheduler contains a Dynamic Priority Allocator and a
Priority Enforcer component responsible for accounting and schedule enforcement respectively.

Deadline Constraint Scheduler


Deadline Constraint Scheduler addresses the issue of deadlines but focuses more on
increasing system utilization. Dealing with deadline requirements in Hadoop-based data

54
processing is done by a job execution cost model that considers various parameters like map and
reduce runtimes, input data sizes, data distribution, etc.,

HADOOP CLUSTER SETUP


` There are two ways to install Hadoop, i.e. Single node and Multi node.
Single node cluster means only one DataNode running and setting up all the NameNode,
DataNode, ResourceManager and NodeManager on a single machine.
This is used for studying and testing purposes. For example, let us consider a sample
data set inside a healthcare industry. So, for testing whether the Oozie jobs have scheduled all the
processes like collecting, aggregating, storing and processing the data in a proper sequence, we
use single node cluster.
It can easily and efficiently test the sequential workflow in a smaller environment as
compared to large environments which contains terabytes of data distributed across hundreds of
machines.
While in a Multi node cluster, there are more than one DataNode running and each
DataNode is running on different machines. The multi node cluster is practically used in
organizations for analyzing Big Data. Considering the above example, in real time when we deal
with petabytes of data, it needs to be distributed across hundreds of machines to be processed.
Thus, here we use multi node cluster.
In this blog, I will show you how to install Hadoop on a single node cluster.

• VIRTUAL BOX: it is used for installing the operating system on it.


• OPERATING SYSTEM: You can install Hadoop on Linux based operating systems.
Ubuntu and CentOS are very commonly used. In this tutorial, we are using CentOS.
• JAVA: You need to install the Java 8 package on your system.
• HADOOP: You require Hadoop 2.7.3 package.

Purpose
This document describes how to install and configure Hadoop clusters ranging from a
few nodes to extremely large clusters with thousands of nodes. To play with Hadoop, you may
first want to install it on a single machine (see Single Node Setup).This document does not cover
advanced topics such as Security or High Availability.
Prerequisites

55
• Install Java. See the Hadoop Wiki for known good versions.
• Download a stable version of Hadoop from Apache mirrors.

Installation
Installing a Hadoop cluster typically involves unpacking the software on all the machines
in the cluster or installing it via a packaging system as appropriate for your operating system. It
is important to divide up the hardware into functions.
Typically one machine in the cluster is designated as the NameNode and another machine
as the ResourceManager, exclusively. These are the masters. Other services (such as Web App
Proxy Server and MapReduce Job History server) are usually run either on dedicated hardware
or on shared infrastructure, depending upon the load. The rest of the machines in the cluster act
as both DataNode and NodeManager. These are the slaves.

QUESTIONS
5 MARKS
1. Write short notes on compute services?
2. Explain about storage services.
3. Explain about analytics services.
4. Write short notes on apache hadoop?
5. Explain about content delivery services

10MARKS
6. Explain in detail about deployment and management services.
7. Discuss about identity and access management services.
8. Explain in detail about open source private cloud software.
9. Explain in detail about hadoop map reduce job execution.
10. Discuss about hadoop cluster setup.

UNIT II COMPLETED

56
UNIT III
APPLICATION DESIGN

CLOUD APPLICATION DESIGN


When designing applications for the cloud, irrespective of the chosen platform, I have
often found it useful to consider four specific topics during my initial discussions; scalability,
availability, manageability and feasibility.
It is important to remember that the items presented under each topic within this article
are not an exhaustive list and are aimed only at presenting a starting point for a series of long and
detailed conversations with the stakeholders of your project, always the most important part of
the design of any application. The aim of these conversations should be to produce an initial
high-level design and architecture.
Scalability
Conversations about scalability should focus on any requirement to add additional
capacity to the application and related services to handle increases in load and demand. It is
particularly important to consider each application tier when designing for scalability, how they
should scale individually and how we can avoid contention issues and bottlenecks. Key areas to
consider include:
Capacity
• Will we need to scale individual application layers and, if so, how can we achieve this
without affecting
availability?
• How quickly will we need to scale individual services?
• How do we add additional capacity to the application or any part of it?
• Will the application need to run at scale 24x7, or can we scale-down outside business hours or
atweekends for example?
Platform / Data
• Can we work within the constraints of our chosen persistence services while working at
scale
• (database size, transaction throughput, etc.)?
• How can we partition our data to aid scalability within persistence platform constraints (e.g.
maximum database sizes, concurrent request limits, etc.)?

57
Load
• How can we improve the design to avoid contention issues and bottlenecks? For example,
can we usequeues or a service bus between services in a co-operating producer,
competing consumer pattern?
• Which operations could be handled asynchronously to help balance load at peak times?
• How could we use the platform features for rate-leveling (e.g. Azure Queues, Service
Bus, etc.)?
Availability
• Availability describes the ability of the solution to operate in a manner useful to the
consumer in spite of transient and enduring faults in the application and underlying
operating system, network and hardware dependencies.
• In reality, there is often some crossover between items useful for availability and
scalability.
Conversations should cover at least the following items
Uptime Guarantees
• What Service Level Agreements (SLA’s) are the products required to meet?
• Can these SLA’s be met? Do the different cloud services we are planning to use all
conform to thelevels required? Remember that SLA’s are composite.

Replication and failover


• Which parts of the application are most at risk from failure?
• In which parts of the system would a failure have the most impact?
• Which parts of the application could benefit from redundancy and failover options?
• Will data replication services be required?
• Are we restricted to specific geopolitical areas? If so, are all the services we are planning
to useavailable in those areas?

Disaster recovery
• In the event of a catastrophic failure, how do we rebuild the system?
• How much data, if any, is it acceptable to lose in a disaster recovery scenario?
• How are we handling backups? Do we have a need for backups in addition to data-
replication?
• How do we handle “in-flight” messages and queues in the event of a failure?
58
Performance
• What are the acceptable levels of performance? How can we measure that? What happens
if we dropbelow this level?
• Can we make any parts of the system asynchronous as an aid to performance?
• Which parts of the system are the mostly highly contended, and therefore more likely to
cause
Security
• This is clearly a huge topic in itself, but a few interesting items to explore which relate
directly to cloud-computing include:
• What is the local law and jurisdiction where data is held? Remember to include the
countries where
• failover and metrics data are held too.
• Is there a requirement for federated security (e.g. ADFS with Azure Active Directory)?
• Is this to be a hybrid-cloud application? How are we securing the link between our
corporate and cloudnetworks?
Manageability
`This topic of conversation covers our ability to understand the health and performance of
the livesystem and manage site operations. Some useful cloud specific considerations include:

Monitoring
• How are we planning to monitor the application?
• Are we going to use off-the-shelf monitoring services or write our own?
• Where will the monitoring/metrics data be physically stored? Is this in line with data
protection policies?
Feasibility
• When discussing feasibility we consider the ability to deliver and maintain the system,
within budgetary andtime constraints. Items worth investigating include:
• Can the SLA’s ever be met (i.e. is there a cloud service provider that can give the uptime
guarantees that we need to provide to our customer)?
• Do we have the necessary skills and experience in-house to design and build cloud
applications?

59
• Can we build the application to the design we have within budgetary constraints and a
timeframe that makes sense to the business?
• How much will we need to spend on operational costs (cloud providers often have very
complex pricing structures)?

REFERENCE ARCHITECTURE FOR CLOUD COMPUTING


The Conceptual Reference Model
presents an overview of the NIST cloud computing reference architecture, which
identifies the major actors, their activities and functions in cloud computing. The diagram depicts
a generic high-level architecture and is intended to facilitate the understanding of the
requirements, uses, characteristics and standards of cloud computing.
The NIST cloud computing reference architecture defines five major actors:
cloud consumer, cloud provider, cloud carrier, cloud auditor and cloud broker. Each actor is an
entity (a person or an organization) that participates in a transaction or process and/or performs
tasks in cloudcomputing. Table 1 briefly lists the actors defined in the NIST cloud computing
reference architecture.

Architectural Components

60
Service Deployment
As identified in the NIST cloud computing definition [1], a cloud infrastructure may be
operated in one of the following deployment models: public cloud, private cloud, community
cloud, or hybrid cloud. The differences are based on how exclusive the computing resources are
made to a Cloud Consumer. A public cloud is one in which the cloud infrastructure and
computing resources are made available to the general public over a public network. A public
cloud is owned by an organization selling cloud services, and serves a diverse pool of clients.
Architectural Components Service Deployment As identified in the NIST cloud computing
definition , a cloud infrastructure may be operated in one of the following deployment models:
• public cloud,
• private cloud,
• community cloud, or hybrid cloud.
The differences are based on how exclusive the computing resources are made to a Cloud
Consumer.

• public cloud
A public cloud is one in which the cloud infrastructure and computing resources are made
available to the general public over a public network. A public cloud is owned by an
organization selling cloud services, and serves a diverse pool of clients. Figure 9 presents a
simple view of a public cloud and its customers.
• private cloud
Private Cloud A community cloud serves a group of Cloud Consumers which have shared
concerns such as mission objectives, security, privacy and compliance policy, rather than serving
a single organization as does a private cloud. Similar to private clouds, a community cloud may
be managed by the organizations or by a third party, and may be implemented on customer
premise (i.e. on-site community cloud) or outsourced to a hosting company (i.e. outsourced
community cloudhybrid cloud.

• Hybrid cloud
A hybrid cloud is a composition of two or more clouds (on-site private, on-site community, off-
site private, off-site community or public) that remain as distinct entities but are bound together
by standardized or proprietary technology that enables data and application portability.

61
CLOUD APPLICATION DESIGN METHODOLOGIES
Cloud methodologies

(i)Service Oriented Architecture (SOA):


• Since the paradigm of Cloud computing perceives of all tasks accomplished as a “Service”
rendered to users, it is said to follow the Service Oriented Architecture.
• This architecture comprises a flexible set of design principles used during the phases of
system development and integration. The deployment of a SOA-based architecture will
provide a loosely-integrated suite of services that can be used within multiple business
domains.
• The enabling technologies in SOA allow services to be discovered, composed, and executed.
For instance, when an end-user wishes to accomplish a certain task, a service can be
employed to discover the required resources for the task. This will be followed by a
composition service which will plan the road-map to provide the desired functionality and
quality of service to the end-user.

(ii)Virtualization
• The concept of virtualization is to relieve the user from the burden of resourcepurchasesand
installations.
• The Cloud brings the resources to the users. Virtualization may refer to Hardware (execution
of software in an environment separated from the underlying hardware resources), Memory
(giving an application program the impression that it has contiguous working memory,
• isolating it from the underlying physical memory implementation), Storage (the process of
completely abstracting logical storage from physical storage), Software (hosting of multiple
virtualized environments within a single Operating System (OS) instance), Data (the
presentation of data as an abstract layer, Amongst the other important reasons for which the
Clouds tend to adopt virtualization are:
(i) Server and application consolidation – as multiple applications can be run on the same
server resources can be utilized more efficiently.
(ii) (ii) Configurability – as the resource requirements for various applications could differ
significantly, (some require largestorage,some requirehigher computation capability)
virtualization is the only solution for customized configuration and aggregation of resources
which are not achievable at the hardware level.

62
(iii) (iii) Increased application availability – virtualization allows quick recovery from
unplanned outages as virtual environments can be backed up and migrated with no
interruption in services.
(iv) (iv) Improved responsiveness – resource provisioning, monitoring and maintenance can be
automated, and common resources can be cached and reused.

CLOUD STORAGE APPROACHES


CLOUD STORAGE Cloud storage is a service that maintains data, manage and backup
remotely and made data available to users over the network (via internet) .There are many cloud
storage providers. Most of the providers provide free space up to certain gigabytes.

• For ex: DropBox provide free space up to 2GB, Google Drive, Box, Amazon, Apple Cloud
provide free space up to 5GB, Microsoft SkyDrive provide free space up to
7GB[4].Customer have to pay amount according to the plan if they cross the free space
limit.
• Features like maximum file size, auto backup, bandwidth, upgrade for limited space differ
from one provider to another provider like maximum file size in DropBox is 300MB where
as maximum file size in Google Drive is 1TB .

63
Cloud Storage Standards

• Storage Network Industry Association TM published CDMI in the year 2009.This supports
both Legacy and New applications. Cloud storage standards define roles and
responsibilities for archiving, retrieving, data ownership.
• This also provides standard auditing way so that calculations are done in consistent
manner. These are helpful to the cloud storage providers, cloud storage subscribers, cloud
storage developers, cloud storage service brokers .By using CDMI, cloud storage subscribers
can easily identify the providers according to their requirements.
• Even, the CDMI provides common interface for providers to advertise their specific
capabilities so that subscribers can easily identify the providers.

General Cloud Storage Architecture


Cloud storage architecture consists of front end, middleware, back end. The front end can
be webservice frontend, file based front end, and even more traditional front ends. The
middleware consists of storage logic which implements various features like replication, data
reduction, data placement algorithms. The back end implements the physical storage for data.
The access methods for cloud are different from traditional storage as the cloud holds
different type of data of different customers. Most of the providers implement multiple access
methods.

64
Virtual storage architecture
• An important part of the cloud model is , the concept of a pool of resources that is drawn
from upon the demand in small increments .The recent innovation that has made this
possible is virtualization. Cloud Storage is simply the delivery of virtualized storage on
demand.
• This architecture is based on Storage Virtualization Model. It consists of three layers namely
1.Interface Layer, 2.Rule and Metadata Management, 3. Virtual Storage Management. In
Interface Layer, Administrator and users are provided with the interface modes that may
include icommands, client web browsers.

Advantages of Cloud Storage


There are many benefits to storing data in the cloud over local storage.
• Companies only pay for the storage they use. It creates operating expenses rather than capital
expenses.
• The data is quickly accessible and reliable. The data is located on the web across multiple
storage systems instead of a local site.
• Better protection in case of a disaster. Sometimes, the organization has a local backup and in
cases of fire or natural disaster, the backup will not be available.
• Cloud vendors provide hardware redundancy and automatic storage failover. This help to avoid
service outages caused by hardware failure. The vendors know how to distribute copies to
mitigate any hardware failure.
• Virtually limitless storage capacities. If the customer does not have the necessity of extra
storage, the costs will decrease.
Disadvantages of Cloud Storage
On the other hand, there are many disadvantages to storing data in the cloud over local
storage.
• Immaturity. Vendors had to rewrite solutions to solve some incompatibilities with storing data
online, and It has created difficulty for organizations (Galloway, 2013).
• Price and Reliability. The customer has to calculate the cost-effectiveness of a cloud solution
against hosting and maintaining their data.

65
DEVELOPMENT IN PHYTHON

DESIGN APPROACH

Design Patterns in Python


• Python is a ground-up object oriented language which enables one to do object oriented
programming in a very easy manner. While designing solutions in Python, especially the
ones that are more than use-and-throw scripts which are popular in the scripting world, they
can be very handy.
• Python is a rapid application development and prototyping language. So, design patterns can
be a very powerful tool in the hands of a Python programmer.
Design Pattern Classifications

• Creational Patterns - They describe how best an object can be created. A simple example
of such design pattern is a singleton class where only a single instance of a class can be
created. This design pattern can be used in situations when we cannot have more than one
instances of logger logging application messages in a file. Structural Patterns – They
describe how objects and classes can work together to achieve larger results. A classic
example of this would be the façade pattern where a class acts as a façade in front of
complex classes and objects to present a simple interface to the client. Behavioral Patterns
– They talk about interaction between objects. Mediator design pattern, where an object of
mediator class, mediates the interaction of objects of different classes to get the desired
process working, is a classical example of behavioral pattern This is an initial version of the
book covering the following 7 design patterns:

DP#1 – Model-View-Controller Pattern

DP#2 – Command Pattern

DP#3 – Observer Pattern

DP#4 – Facade Pattern

DP#5 – Mediator Pattern

DP#6 - Factory Pattern

DP#7 - Proxy Pattern


66
o Because the current number of design patterns is a handful, we have not categorized them
based on classifications mentioned earlier.
o By the time we publish next version of this book with a target of 20 patterns, the content
would be organized into classification-wise grouping of chapters.

IMAGE PROCESSING APP

Why Image Processing


An image is nothing but a visual representation of something. It means it can be a
representation of a person, animal, or any living or non-living thing. Basically, an image is a
rectangular grid of pixels with definite width and height. Each pixel has its own value. So,
quality of image depends on this pixel values and pixel is the unit of in- formation present in an
image. Image Processing is the enhancement of images using mathematical operations for which
the input is an image, such as a photograph or video frame and the output of
image processing may be either an image or set of characteristics or parameters related to the
image.
Need of image processing :
• Humans are not satisfied with the quality of images and therefore they make use of
image processing.
• Humans rely upon their visual system (eyesand brain) to collect visual information about
their surroundings. Visual information refers swto images and videos. In the past, we needed
visual information mainly for survival. Nowa- days, visual information is required for sur- vival
as well as for communication and entertainment purpose [15].
• To enhance an image
• To extract some useful information from an
image that can be utilised for heath sci- ences,public safety, etc.
So, in short, following steps are involved in image processing :
1) We take input as an image.
2) Analyse and manipulate the image.
3) Output is a processed image.
C. Why to make use of Python for Image Processing?
• Python has multiple libraries for multiple purposes like web development, scientific and
numeric computing, image processing.

67
• To work on images, Python has a library i.e Python Imaging Library(PIL) for image
processing operations. The Python Imaging Library provides many functions for image
processing. We performed some basic operations
• using PIL modules.
Python Image Processing Library

• The Python Imaging Library, or PIL for short, is one of the core libraries for image
manipulation in Python programming language and is freely available on internet to
download. Many of the image processing tasks can be carried out using the PIL library such
as image inversion, binary conversion, cropping, writing text on images, changing intensity,
brightness, image filtering, such as blurring, contouring, smoothing and many more.
• Its initial release was in the year 1995. And many versions of PIL are available according to
our operating system.

DOCUMENT STORAGE APP

OneDrive

• Harness the might of Microsoft via OneDrive’s cloud platform


• Free 5GB storage
• Cross platform
• Occasionally crashes

• OneDrive, previously known as SkyDrive, was rolled out in 2007 as Microsoft’s own
cloud storage platform. It works as part of the Microsoft Office Suite and gives users
5GB of free storage space. Registered students and those working in academia are given
1TB of free storage.
• OneDrive is available for all platforms. You need to have a Hotmail or Microsoft account
but this is very easy to set up. Users can collaborate on, share and store documents.
• OneDrive also gives you offline access to documents so you can always have your most
important documents at your fingertips. It comes pre-installed on all Windows 10
machines and can be easily accessed or downloaded onto other platforms.

68
Egnyte

• Flexible pricing plus a robust interface makes Egnyte an ideal document storage platform
• 15-day free trial
• Excellent integration
• Some loading issues

• Egnyte was founded in 2007. The company provides software for enterprise file
synchronization and sharing.
• Egnyte allows businesses to store their data locally and in the cloud. All types of data can
be stored in the cloud, whilst data of a more sensitive nature can be stored on servers on-
premise. This provides better security.
• Business teams can work how and where they want with an easy to use collaboration
system through their content services platform.
• Egnyte integrates with the more popular industry applications such as Office 365. This
allows remote and internal employees to access all the files they need.

69
Dropbox

• Simplified cloud storage from a veteran in the field


• CHECK AMAZON INDIA
• 2GB free
• Integrates with most apps
• Relatively expensive

• Dropbox is one of the oldest cloud storage providers. It does offer a rather miniscule 2GB
of storage space for free users but this can be increased by up to 16GB through referrals
as well as by linking your Dropbox account to social media accounts.
• To date it is one of the simplest storage providers to use. Dropbox can be installed on
most computers or devices and syncs easily between apps. The app can store almost any
kind of file with no compatibility issues. You can drag and drop files into the desktop app
with ease.
• You can also share files with other users easily through links, even if they don’t have a
Dropbox account.
• You can sign up for Dropbox here

SpiderOak
SpiderOak offers military grade encryption but few collaboration options

• Strong security
• Central device management
70
• Few collaboration tools

• SpiderOak is a collaboration tool, online backup and file hosting service founded in 2007.
The platform allows users to access, synchronize and share data using a cloud-based
server.
• The company places a strong emphasis on data security and privacy. They offer a cloud
storage, online backup and sharing service which uses a ‘zero knowledge’ privacy
environment. This means the client is the only one who can view all stored data.
SpiderOak claim that even they cannot access your files.

MAPREDUCE APP
The MapReduce library is available in two versions: one for Java and one for Python.
The functionality of each is slightly different.
Both libraries are built on top of App Engine services, including Datastore and Task Queues.
You must download the MapReduce library and include it with your application. The library
provides:

• A programming model for large-scale distributed data processing


• Automatic parallelization and distribution within your existing codebase
• Access to Google-scale data storage
• I/O scheduling
• Fault-tolerance, handling of exceptions
• User-tunable settings to optimize for speed/cost
• Tools for monitoring status
There are no usage charges associated with the MapReduce library. As with any App Engine
application, you are charged for any App Engine resources that the library or your MapReduce
code consumes (beyond the free quotas) while running your job. These can include instance
hours, Datastore and Google Cloud Storage usage, network, and other storage.
The Python MapReduce library can be used for complete map-shuffle-reduce pipelines
only. It does not have the ability to run a map-only job.

71
Features and capabilities

The App Engine adaptation of Google's MapReduce model is optimized for the needs of the
App Engine environment, where resource quota management is a key consideration. This release
of the MapReduce API provides the following features and capabilities:
• Automatic sharding for faster execution, allowing you to use as many workers as you
need to get your results faster
• Standard data input readers for iterating over blob and datastore data.
• Standard output writers
• Status pages to let you see how your jobs are running
• Processing rate limiting to slow down your mapper functions and space out the work,
helping you avoid exceeding your resource quotas

MapReduce Job
A MapReduce job has three stages: map, shuffle, and reduce. Each stage in the sequence
must complete before the next one can run. Intermediate data is stored temporarily between
the stages. The map stage transforms single input items to key-value pairs, the shuffle stage
groups values with the same key together, and the reduce stage processes all the items with
the same key at once.
The map-shuffle-reduce algorithm is very powerful because it allows you to process all
the items (values) that share some common trait (key), even when there is no way to access
those items directly because, for instance, the trait is computed.
The data flow for a MapReduce job looks like this:

Map
The MapReduce library includes a Mapper class that performs the map stage. The map
stage uses an input reader that delivers data one record at a time. The library also contains a
collection of Input classes that implement readers for common types of data. You can also
create your own reader, if needed.

72
The map stage uses a map() function that you must implement. When the map stage runs,
it repeatedly calls the reader to get one input record at a time and applies the map() function
to the record.
The implementation of the map() function depends on the kind of job you are running.
When used in a Map job, the map() function emits output values. When used in a map reduce
job, the map() function emits key-value pairs for the shuffle stage.
Being flexible to allow adjustments
With social media, the feedback from the customers can be obtained in a span of very
short time. Any mistakes in the marketing strategy can be identified very quickly based on the
feedback from the customers and also suitable actions will be taken to rectify it. Being flexible to
allow adjustments also helps the entire process to become agile i.e. to address the ever-changing
customers demand very quickly.

Automate the process


When the marketing strategy is determined to be an effective one, the entire process is
automated to post weekly or periodically about the product and company updates.

SOCIAL MEDIA ANALYTIC APP


What is SMAC?
SMAC is something of a buzzword in the IT industry. In the past, all business
information was generally kept on a centralised database and accessed via a client server model.
This meant that employees would have to log in to a SCM, ERP or CRM system. Prior to that,
the employee would usually have to undertake a training programme to help them understand all
of the complicated workflows, processes and interfaces.

Today, this has been addressed by the arrival of data visualisation tools and customised
ISVs that are built with industry specific templates, improving the user’s experience and
allowing executives to quickly gain access to the latest business information. Added to this have
been the integration of analytics tools and a host of social and collaborative procedures: all of
which can be accessed via mobile devices.
This combination of new technologies is known as Social, Mobile, Analytics and Cloud:
or SMAC for short. Social helps people to find their colleagues, who they can then collaborate
with; mobile provides access to other data sources and the cloud;

73
the cloud contains the information and the applications that people use; and analytics
allows people to make sense of this data. The broad idea of SMAC is that social networks like
Facebook and Twitter can be used for brand building and customer engagement.
Why SMAC is so important for businesses
In the coming years, it is widely expected that there will be three major trends that emerge
and affect not only IT technologies but also the way we do business: all of which will be heavily
impacted by SMAC. These include:

• New working styles: Today in business, both employees and customers expect a style of
content, collaboration and commerce that offers the same “anytime, anywhere” convenience
that they enjoy related to their personal lives with companies such as Facebook and Amazon.
It is expected that there will be an increase in the mobile elite workforce; especially as
wearable devices such as watches and glasses add to user’s options.In terms of SMAC,
business applications will be required to embrace this approach in order to maximise
productivity and convenience.

QUESTIONS

5 MARKS
1. Write short notes on cloud application design.
2. Explain about cloud storage approach.
3. Discuss about map reduce app.
4. Explain about image processing app.

10 marks
5. Explain in detail about reference architecture for cloud application
6. Write notes on design approaches.
7. Explain in detail about cloud application design methodology.
8. Explain the following:
a) Document storage app
b) Social media analytic app

UNIT III COMPLITED

74
UNIT IV
PYTHON FOR CLOUD
PYTHON FOR AMAZON WEB SERVICES
Amazon Web Services
AWS is a cloud platform service from amazon, used to create and deploy any type of
application in the cloud.
AWS is a Cloud platform service offering compute power, data storage, and a wide array
of other IT solutions and utilities for modern organizations. AWS was launched in 2006, and has
since become one of the most popular cloud platforms currently available.
We should have an account in AWS to use aws services. It offers many featured services
for compute, storage, networking, analytics, application services, deployment, identity and
access management, directory services, security and many more cloud services.
We can use Boto3 (python package) which provides interfaces to Amazon Web Services,
it makes us easy to integrate our Python application, library, or script with AWS services.
Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for
Python, which allows Python developers to write software that makes use of services like
Amazon S3(Simple storage service) and Amazon EC2(Elastic Compute Cloud).

Amazon Elastic Cloud Compute (EC2)


Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizeable
computing capacity.
We use Amazon EC2 to launch a virtual servers and also configure security , networking,
and manage storage.
It enables us to scale up or down depending the requirement.It provides virtual computing
environments called as instances.
Various configurations of CPU, memory, storage, and networking capacity are available
for our instances, known as instance types.

Amazon Elastic Beanstalk

75
AWS Elastic Beanstalk is a service for deploying and scaling web applications and
services. Elastic Beanstalk will also run instances (Computing environments) EC2, and it has
some additional components like Elastic Load Balancer, Auto-Scaling Group, Security Group.

Amazon Lambda
Amazon Lambda is a computing service which automatically manages the server. AWS
Lambda executes our code only when needed and scales automatically, from a few requests per
day to thousands per second.
We only pay for the compute time we comsume , and there will be no charge if the code
is not running.
The initial purpose of lambda is to simplify building on-demand applications that are
responsive to events. AWS starts a Lambda instance within milliseconds of an event.

PYTHON FOR GOOGLE CLOUD PLATFORM


“Google Cloud Platform, offered by Google, is a suite of cloud computing services that
run on the same infrastructure that Google uses internally for its end-user products. Alongside a
set of management tools, it provides a series of modular cloud services including
computing, data storage, data analytics and machine learning.“

Why Google Cloud Platform?


As I have just mentioned that there are various Cloud service providers in the market, so
what makes Google Cloud Platform different? Following image will give you some of the
major reasons why one should opt for it:

76
• Pricing: GCP leaves all the competition way behind with its highly flexible pricing and
is rightly a leader here
• Scalability: Scaling down can always be an issue with cloud services. GCP lets you
scale up and down with extreme ease
• Custom Machines: With Custom Machine Types you can easily create a machine type
customized to your needs with discount implications up to 50% off
• Integrations: Easily use various API’s, practice Internet of Things and Cloud Artificial
Intelligence
• Big Data Analytics: Use Big Query and Big Data Analytics to carry out plethora of
analytical practices
• Serverless: Serverless is a new paradigm of computing that abstracts away the
complexity associated with managing servers for mobile and API back-ends, ETL, data
processing jobs, databases, and more.
In case you wish to know more about Google Cloud Platform and also get introduced to its
practical aspect then the following video is highly recommended.
Google offers a wide range of Services. In case you wish to know more about Google Cloud
Platform Services this blog talks about it in detail.
You can also watch the below video from our Google Cloud expert, discussing about Google
Cloud Platform.

What Is Google Cloud Platform?


You can think of it as collection of Cloud Services offered by Google. The platform hosts a
wide range of services like comprising of
• Compute
• Storage
• Application development
Now who can access these services? Well these can be accessed by developers, cloud
administrators and other enterprise IT professionals. This can be done through the public internet
or through a dedicated network connection.
Next I will putforth some of the core functionalities and services of GCP:

• Google Compute Engine: Google Compute Engine helps you deliver VM that runs in
Google’s innovative data centers and worldwide fiber network. It lets you scale from
single instances to global and implement load-balanced cloud computing.
77
• App Engine: This PaaS offering lets developers access Google’s scalable hosting.
Developers are also free to access software SDK’s to develop software products that run
on App Engine.
• Cloud Storage: Google Cloud Storage platform enables you to store large, unstructured
data sets. Well, Google also offers database storage options such as Cloud Datastore
for No SQL non relational storage, Cloud SQL for MySQL fully relational storage and
Google’s native Cloud Big table database.
• Google Container Engine: It is a management and orchestration system
for Docker containers that runs within Google’s public cloud. Google Container Engine
is based on the Google Kubernetes container orchestration engine.

PYTHON FOR WINDOWS AZURE


If you build technical and scientific applications, you're probably familiar with Python.
What you might not know is that there are now tools available that make it easy for you to put
your Python applications on Microsoft Azure, Microsoft's cloud computing platform. Microsoft
Azure offers a comprehensive set of services that enable you to quickly build, deploy and
manage applications across a global network of Microsoft-managed data centers. Some of the
services that Microsoft Azure offers include blob storage, databases, and messaging mechanisms
such as queues. All Microsoft Azure services are available by using Python.
Here are some types of activities that you can perform by using Python and Microsoft
Azure:
• Remote debugging on Windows, Linux, and Mac OS.
• Cluster debugging.
• Use Python in-line with webpages that you serve.
• Create IPython Notebooks. Use the Message Passing Library (MPI) for High
Performance Computing (HPC). I
This article introduces you to two Python development tools. One is the IPython
Notebook, which can be deployed to Linux and Windows virtual machines (VM) on Microsoft
Azure. The other is Python Tools for Visual Studio (PTVS), which is for Windows users. After
familiarizing you with the tools, the article discusses the following topics.
• Tools for developing Python applications on Microsoft Azure
• Deploying an IPython Notebook to Microsoft Azure
• Using Python to create VMs Using Python to create an application

78
Tools for Developing Python Applications on Microsoft Azure
This section gives you an overview of the IPython Notebooks and PTVS development
tools.
Using IPython Notebooks The IPython project provides a collection of tools for scientific
computing that include interactive shells, high-performance and easy to use parallel libraries, and
a web-based development environment called the IPython Notebook.
The notebook provides a working environment for interactive computing that combines
code execution with the creation of a live computational document. These notebook files can
contain text, mathematical formulas, input code, results, graphics, videos and any other kind of
media that a modern web browser is capable of displaying.

PYTHON FOR MAP REDUCED


• Most MapReduce systems are designed to operate in racks of computers dedicated to
MapReduce processing, but scientific applications use a wide range of computing resources.
A company such as Google or Yahoo with large datacenters might have many thousands of
MapReduce systems and thousands of jobs per day, and dedicated systems in datacenters are
appropriate in this situation.
• However, many different types of clusters are used for scientific computing. Shared clusters,
which are generally large supercomputers, use batch scheduling systems to coordinate jobs
submitted by a wide variety of users.
• Private clusters, consisting of a smaller number ofcommodity workstationsor
temporarilyprovisioned cloud nodes, are used by a single user at a time and do not require a
scheduler. Shared and private clusters are different from dedicated MapReduce clusters.
• A shared cluster has many users, each of which has unique software requirements.
Supercomputers provide resources that are expected to meet the needs of the majority of
users, and any individual user cannotexpect MapReduceinfrastructure tobe available.
Likewise, most private clusters have no support staff to set up software. In many cases on
both shared and private clusters, an individual MapReduce user must perform installation,
configuration, and maintenance of the infrastructure they require.

79
The Design And Architecture Of Mrs
Mrs is a lightweight MapReduce implementation that works well for scientific
computing. It is designed to be simple for both programmers and users. The API includes
reasonable but overridable defaults in order to avoid any unnecessary complexity. Likewise, Mrs
makes it easy to run jobs without requiring a large amount of configuration.
It supports both Python 2 and Python 3 and depends only on the standard library for
maximum portability and ease of installation. Furthermore, Mrs is designed to easily run in a
varienty of enviornments and filesystems. Mrs is also compatible with PyPy, a high-performance
Python interpreter with a JIT compiler that accelerates numerical-intensive programs particularly
well.

Programming Model
As a programming framework, Mrs controls the execution flow and is invoked by a call
to mrs.main. The execution of Mrs depends on the command-line options and the specified
program class. In its simplest form, a program class has an __init__ method which takes the
arguments opts and args from command-line parsing and a run method that takes a job argument.
In practice, most program classes inherit from mrs.MapReduce, which provides a variety of
reasonable but overridable defaults including __init__ and run methods that are sufficient for
many simple programs. The simplest MapReduce program need only implementation map and a
reduce method.
Architecture

• Mrs owes much of its efficiency to simple design. Many choices are driven by concerns such
as simplicity and ease of maintainability.
80
• For example, Mrs uses XML-RPC because it is included in the Python standard library even
though other protocols are more efficient. Profiling has helped to identify
realbottlenecksandtoavoidworryingabouthypotheticalones.
• We include a few details about the architecture of Mrs. Communication between the master
and a slave occurs over a simple HTTP-based remote procedure call API using XMLRPC.
• Intermediate data between slaves uses either direct communication for high performance or
storage on a filesystem for increased fault-tolerance. Mrs can read and write to any
filesystem supported by the Linux kernel or FUSE, including NFS, Lustre, and the Hadoop
Distributed File System (HDFS), and native support for WebHDFS is in progress.

PHYTHON PACKAGE OF INTEREST


There are a number of Special Interest Groups (SIGs) for focused collaborative efforts to
develop, improve, or maintain specific Python resources. Each SIG has a charter, a coordinator, a
mailing list, and a directory on the Python website. SIG membership is informal, defined by
subscription to the SIG's mailing list. Anyone can join a SIG, and participate in the development
discussions via the SIG's mailing list.
Below is the list of currently active Python SIGs, with links to their resources. The link in
the first column directs you to the SIG's home page: a page with more information about the
SIG. The links in the "Info" column direct you to the SIG's archives, and to the SIG's Mailman
page, which you can use to subscribe or unsubscribe yourself and to change your subscription
options.
The SIG mailing lists are managed by GNU Mailman, a web-based interface for mailing
lists written in Python.

PHYTHON WEB APPLICATION FRAMEWORK


A Web framework is a collection of packages or modules which allow developers to
write Web applications or services without having to handle such low-level details as protocols,
sockets or process/thread management.
The majority of Web frameworks are exclusively server-side technology, although, with
the increased prevalence of AJAX, some Web frameworks are beginning to include AJAX code
that helps developers with the particularly tricky task of programming (client-side) the user's
browser. At the extreme end of the client-side Web Frameworks is technology that can use the
web browser as a full-blown application execution environment (a la gmail for example).
81
As a developer using a framework, you typically write code which conforms to some
kind of conventions that lets you "plug in" to the framework, delegating responsibility for the
communications, infrastructure and low-level stuff to the framework while concentrating on the
logic of the application in your own code. This "plugging in" aspect of Web development is
often seen as being in opposition to the classical distinction between programs and libraries, and
the notion of a "mainloop" dispatching events to application code is very similar to that found in
GUI programming.

Popular Full-Stack Frameworks


A web application may use a combination of a base HTTP application server, a storage
mechanism such as a database, a template engine, a request dispatcher, an authentication module
and an AJAX toolkit. These can be individual components or be provided together in a high-
level framework.

These are the most popular high-level frameworks. Many of them include components
listed on the WebComponents page.

Latest
Latest
Name update description
version
date

The Web framework for perfectionists (with deadlines).


Django makes it easier to build better Web apps more
quickly and with less code. Django is a high-level Python
2018- Web framework that encourages rapid development and
Django 2.1
08-01 clean, pragmatic design. It lets you build high-performing,
elegant Web applications quickly. Django focuses on
automating as much as possible and adhering to the DRY
(Don't Repeat Yourself) principle. See Django

the rapid Web development webframework you've been


2018- looking for. Combines SQLAlchemy (Model) or Ming
TurboGears 2.3.12
04-06 (MongoDB Model), Genshi (View), Repoze and Tosca
Widgets. Create a database-driven, ready-to-extend

82
application in minutes. All with designer friendly templates,
easy AJAX on the browser side and on the server side, with
an incredibly powerful and flexible Object Relational
Mapper (ORM), and with code that is as natural as writing a
function. After reviewing the Documentation, check out the
Tutorials

* Python 2.6 to 2.7, Python 3.x friendly (compile but not


tested no support yet) * All in one package with no further
dependencies. Development, deployment, debugging,
testing, database administration and maintenance of
applications can be done via the provided web interface, but
not required. * web2py has no configuration files, requires
no installation, can be run off a USB drive. * web2py uses
Python for the Model, View and the Controller * Built-in
ticketing system to manage errors * Internationalization
2018-
web2py 2.17.1 engine and pluralisation, caching system * Flexible
08-06
authentication system (LDAP, MySQL, janrain etc) *
NIX(Linux, BSD), Windows, Mac OSX, tested on EC2,
Webfaction * works with MySQL, PostgreSQL, SQLite ,
Firebird, Oracle, MSSQL and the Google App Engine via
an ORM abstraction layer. * Includes libraries to handle
HTML/XML, RSS, ATOM, CSV, RTF, JSON, AJAX,
XMLRPC, WIKI markup. * Production ready, capable of
upload/download of very large files * Emphasis on
backward compatibility.

See below for some other arguably less popular full-stack frameworks

83
DESIGNING A RESTFUL WEB API
So you need to build an API for your website, maybe you need to provide data to a
mobile app you’re working on, or maybe you just want to put the data from your website in a
format that other developers can easily use to build cool things.
But what is an API exactly? An API is just a fancy term for describing a way for
programs (or websites) to exchange data in a format that is easily interpreted by a machine. This
is in contrast to regular websites, which are exchanging data in a format that is easily interpreted
by a human. For a website, we might use HTML and CSS, but for an API we would use JSON or
XML.
In this article I will be focusing on designing an API using the RESTful paradigm. REST
is basically a list of design rules that makes sure that an API is predictable and easy to
understand and use.
Some of these rules include:

• Stateless design: Data will never be stored in a session (each request includes all
information needed by the server and client).
• Self-descriptive messages: Ideally you should be able to understand requests and
responses after spending minimal time reading the documentation.
• Semantics, semantics, semantics: The API should use existing features of the HTTP
protocol to improve the semanticness of input and output (e.g. HTTP Verbs, HTTP Status
Codes and HTTP Authentication)

Output formats
First, let’s talk about output formats. The most important thing to look at when
determining what format your API should output data in is what users of your API would be
using the data for and with.
Maybe you need to support legacy systems where JSON parsing is not feasible and XML
is more desirable, or maybe it makes more sense for you to output data in the CSV format for
easy import into spreadsheet applications. Whichever you choose, it's important to think about
your users and their use cases.

Versioning
This is a very important aspect that is often overlooked. As an API provider, one of your
most important tasks is to make sure that breaking changes will never occur in your API. Making

84
breaking changes will make life difficult for the developers who depend on your service and can
easily start causing frustration when things start to break.
But don’t worry! This is where versioning comes in handy. There are a lot of options for
versioning your API. For example, WePay uses an Api-Version header, and Twilio uses a similar
approach putting the version date in the URL.

URL Structure
• The URL structure is one of the most important pieces of the puzzle. Spending some time to
define the right endpoint names can make your API much easier to understand and also help
making the API more predictable.URLs should be short and descriptive and utilize the
natural hierarchy of the path structure.
• It’s also important to be consistent with pluralization.

QUESTIONS
5 MARKS
1. Explain python for amazon web services.
2. Write short notes on python for windows azure.
3. Write short notes on python package of interest.
4. Explain about Django.
5.Write short on python for map reduced.

10 MARKS
6.Explain in detail about python for cloud.
7.Explain about python for cloud platform.
8.Write notes on python web application frame work.
9. Illustrate designing a RESTful web API.

UNIT IV COMPLTED

85
UNIT V
BIG DATA ANALYTICS
CLUSTRING BIG DATA
• Clustering is an essential data mining and tool for analyzing big data. There are difficulties
for applying clustering techniques to big data duo to new challenges that are raised with big
data. As Big Data is referring to terabytes and petabytes of data and clustering algorithms
are come with high computational costs, the question is how to cope with this problem and
how to deploy clustering techniques to big data and get the results in a reasonable time.
• Big data clustering techniques can be classified into two major categories:
• single-machine clustering techniques and multiple-machine clustering techniques.
Recently multiple machine clustering techniques has attracted more attention because they
are more flexible in scalability and offer faster response time to the users. As it is
demonstrated in Fig. single-machine and multiple-machine clustering techniques include
different techniques:
• Single-machine clustering
o Sample based techniques
o Dimension reduction techniques

• Multiple-machine clustering
o Parallel clustering
o MapReduce based clustering
In this section advancements of clustering algorithms for big data analysis in categories that are
mentioned above will be reviewed.

86
Challenges of big data have root in its five important characteristics:

• Volume: The first one is Volume and an example is the unstructured data streaming in form of
social media and it rises question such as how to determine the relevance within large data
volumes and how to analyze the relevant data to produce valuable information.

• Velocity: Data is flooding at very high speed and it has to be dealt with in reasonable time.
Responding quickly to data velocity is one of the challenges in big data.

• Variety: Another challenging issue is to manage, merge and govern data that comes from
different sources with different specifications such as: email, audio, unstructured data, social
data, video and etc.

• Variability: Inconsistency in data flow is another challenge. For example in social media it
could be daily or seasonal peak data loads which makes it harder to deal and manage the data
specially when the data is unstructured.

• Complexity: Data is coming from different sources and have different structures;
consequently it is necessary to connect and correlate relationships and data linkages or you find
your data to be out of control quickly.
Traditional clustering techniques cannot cope with this huge amount of data because of
their high complexity and computational cost. As an instance, the traditional Kmeans clustering
is NP-hard, even when the number of clusters is k=2. Consequently, scalability is the main
challenge for clustering big data. The main target is to scale up and speed up clustering
87
algorithms with minimum sacrifice to the clustering quality. Although scalability and speed of
clustering algorithms were always a target for researchers in this domain, but big data challenges
underline these shortcomings and demand more attention and research on this topic. Reviewing
the literature of clustering techniques shows that the advancement of these techniques could be
classified in stages.

CLASSIFICATION OF BIG DATA


• Big data can be applied to real-time fraud detection, complex competitive analysis, call
center optimization, consumer sentiment analysis, intelligent traffic management, and to
manage smart power grids, to name only a few applications.
• Big data is characterized by three primary factors: volume (too much data to handle easily);
velocity (the speed of data flowing in and out makes it difficult to analyze); and variety (the
range and type of data sources are too great to assimilate). With the right analytics, big data
can deliver richer insight since it draws from multiple sources and transactions to uncover
hidden patterns and relationships.
There are four types of big data BI that really aid business:

1. Prescriptive – This type of analysis reveals what actions should be taken. This is the most
valuable kind of analysis and usually results in rules and recommendations for next steps.
2. Predictive – An analysis of likely scenarios of what might happen. The deliverables are
usually a predictive forecast.
3. Diagnostic – A look at past performance to determine what happened and why. The result of
the analysis is often an analytic dashboard.
4. Descriptive – What is happening now based on incoming data. To mine the analytics, you
typically use a real-time dashboard and/or email reports.

Big Data Analytics in Action


• Prescriptive analytics is really valuable, but largely not used. Where big data analytics in
general sheds light on a subject, prescriptive analytics gives you a laser-like focus to answer
specific questions. For example, in the health care industry, you can better manage the
patient population by using prescriptive analytics to measure the number of patients who are
clinically obese, then add filters for factors like diabetes and LDL cholesterol levels to

88
determine where to focus treatment. The same prescriptive model can be applied to almost
any industry target group or problem.
• Predictive analytics use big data to identify past patterns to predict the future. For example,
some companies are using predictive analytics for sales lead scoring. Some companies have
gone one step further use predictive analytics for the entire sales process, analyzing lead
source, number of communications, types of communications, social media, documents,
CRM data, etc. Properly tuned predictive analytics

RECOMMENDATION SYSTEM
o Recommendation systems have impacted or even redefined our lives in many ways. One
example of this impact is how our online shopping experience is being redefined. As we
browse through products, the Recommendation system offer recommendations of products
we might be interested in. Regardless of the perspective — business or consumer,
Recommendation systems have been immensely beneficial. And big data is the driving force
behind Recommendation systems.
• A typical Recommendation system cannot do its job without sufficient data and big data
supplies plenty of user data such as past purchases, browsing history, and feedback for the
Recommendation systems to provide relevant and effective recommendations. In a nutshell,
even the most advanced Recommenders cannot be effective without big data.

How does a Recommendation system work?


A Recommendation system works in well-defined, logical phases which are data
collection, ratings, and filtering. These phases are described below.

Data collection
• Let us assume that a user of Amazon website is browsing books and reading the details.
Each time the reader clicks on a link, an event such as an Ajax event could be fired. The
event type could vary depending on the technology used. The event then could make an
entry into a database which usually is a NoSQL database. The entry is technical in content
but in layman’s language could read something like “User A clicked Product Z details
once”. That is how user details get captured and stored for future recommendations.
• How does the Recommendation system capture the details? If the user has logged in, then
the details are extracted either from an http session or from the system cookies. In case the

89
Recommendation system depends on system cookies, then the data is available only till the
time the user is using the same terminal. Events are fired almost in every case — a user
liking a Product or adding it to a cart and purchasing it. So that is how user details are
stored. But that is just one part of what Recommenders do.
The following paragraphs show how Amazon offers its product recommendations to a user who
is browsing for books:

• As shown by the image below, when a user searched for the book Harry Potter and the
Philosopher’s Stone, several recommendations were given.

• In another example, a customer who searched Amazon for Canon EOS 1200D 18MP
Digital SLR Camera (Black) was interestingly given several recommendations on camera
accessories.

Ratings
Ratings are important in the sense that they tell you what a user feels about a product.
User’s feelings about a product can be reflected to an extent in the actions he or she takes such as
likes, adding to shopping cart, purchasing or just clicking. Recommendation systems can assign
implicit ratings based on user actions. The maximum rating is 5. For example, purchasing can be

90
assigned a rating of 4, likes can get 3, clicking can get 2 and so on. Recommendation systems
can also take into account ratings and feedback users provide.

Filtering
Filtering means filtering products based on ratings and other user data. Recommendation
systems use three types of filtering: collaborative, user-based and a hybrid approach. In
collaborative filtering, a comparison of users’ choices is done and recommendations given. For
example, if user X likes products A, B, C, and D and user Y likes products A, B, C, D and E, the
it is likely that user X will be recommended product E because there are a lot of similarities
between users X and Y as far as choice of products is concerned.

MULTIMEDIA CLOUD
• Media Cloud is an open-source content analysis tool that aims to map news media coverage
of current events. It "performs five basic functions -- media definition, crawling, text
extraction, word vectoring, and analysis."
• Media cloud "tracks hundreds of newspapers and thousands of Web sites and blogs, and
archives the information in a searchable form.
• The database ... enable[s] researchers to search for key people, places and events — from
Michael Jackson to the Iranian elections — and find out precisely when, where and how
frequently they are covered.
• " Media Cloud was developed by the Berkman Center for Internet & Society at Harvard
University and launched in March 2009.[

LIVE VIDEO STREAM APP


• The video recording technology has been available for decades. People record videos, create
movies, and publish them online so that the videos and movies can be shared to their online
groups or even to the public. With the innovation of mobile technology, users use mobile to
download videos from online sources, such as YouTube, Vimeo, LiveTV, and PPStream.
Many mobile apps have been constructed to enable mobile users to stream videos online.
When users use an application, they are always allowed to assess and give feedback on the
application so that the application could be enhanced and improved to the next stage in
meeting and satisfying the users’ needs.

91
• Thus, usability studies have turned to be a very vital element in evaluating the application.
Recently, with the emergence of various mobile apps, the role of usability studies extends
the scope of studies to the evaluation of mobile apps as well [1], including the user interface,
and performance.
• This scenario also goes for mobile video streaming apps as well. Researchers develop
various video streaming apps and perform usability test for different groups of users under
different conditions.
Systematic Review
In this paper, the activities to be performed in the facilitation of the process of the
systematic review are: the elaboration of the definition of a search strategy, the selection of
primary studies, the extraction of data, and the implementation of a synthesis strategy
Search Strategy
• In order to perform the search and selection of the usability test metrics for mobile video
streaming apps, articles and
• journals from different online databases were searched. Also, relevant data from the search
results were extracted and finally, the collection of studies for review was listed.

Study Selection
The scope of the review was defined to be the metrics used in usability test in mobile
video streaming apps. Since the scope had been defined clearly before the search process was
carried out, most of the articles and journals found were relevant to the review objective.
However, there were many articles and journals excluded from the search process, based on the
following criteria:
1) The study is only on mobile video apps development,
2) the study presents the usability test on mobile apps without touching on video
streaming apps,
3) the study is not written in English, and
4) the study is a book.

Results And Discussion


The results obtained from the reviewed articles were classified based on the categories of
metrics used in the usability test of mobile video streaming apps, the detailed metrics, number of
studies and the percentage of studies.

92
STREAMING PROTOCOLS
Basics of streaming protocols
• Streaming of audio and video is a confusing subject. This page is aimed at providing some
of the basic concepts.
• Streaming means sending data, usually audio or video, in a way that allows it to start being
processed before it's completely received. Video clips on Web pages are a familiar example.
• Progressive streaming, aka progressive downloading, means receiving an ordinary file and
starting to process it before it's completely downloaded. It requires no special protocols, but
it requires a format that can be processed based on partial content. This has been around for
a long time; interleaved images, where the odd-numbered pixel rows are received and
displayed before any of the even ones, are a familiar example. They're displayed at half
resolution before the remaining rows fill in the full resolution.
The protocol stack
Streaming involves protocols at several different layers of the OSI Reference Model. The
lower levels (physical, data link, and network) are generally taken as given. Streaming protocols
involve:

• The transport layer, which is responsible for getting data from one end to the other.
• The session layer, which organizes streaming activity into ongoing units such as movies
and broadcasts.
• The presentation layer, which manages the bridge between information as seen by the
application and information as sent over the network.
• The application layer, which is the level at which an application talks to the network.
Most Internet activity takes place using the TCP transport protocol. TCP is designed to
provide reliable transmission.

HTTP Live Streaming


The new trend in streaming is the use of HTTP with protocols that support adaptive
bitrates. This is theoretically a bad fit, as HTTP with TCP/IP is designed for reliable delivery
rather than keeping up a steady flow, but with the prevalence of high-speed connections these
days it doesn't matter so much. Apple's entry is HTTP Live Streaming, aka HLS or Cupertino
streaming. It was developed by Apple for iOS and isn't widely supported outside of Apple's
products. Long Tail Video provides a testing page to determine whether a browser supports HLS.

93
Its specification is available as an Internet Draft. The draft contains proprietary material, and
publishing derivative works is prohibited.

Adobe HTTP Dynamic Streaming


Adobe HTTP Dynamic Streaming (HDS) is also known as San Jose streaming. Like
Apple's HLS, it operates over HTTP. Like RTMP, it's associated with Flash. HTTP is more
likely to be allowed through than other protocols, and HDS is less of a kludge than RTMP over
HTTP. The technical specs say that Flash is required for playback, so its use is mainly in desktop
environments.
Microsoft Smooth Streaming
Smooth Streaming is Microsoft's piece of the very fragmented world of HTTP streaming.
It's used with Silverlight and IIS.
Dynamic Streaming over HTTP
DASH, for Dynamic Streaming over HTTP, is MPEG's offering in the HTTP streaming
Babel. DASH's creators insist it's not a protocol but an "enabler," but that claim violates the
"looks like a duck" principle. It's specified by ISO/IEC 23009-1:2012.

VIDEO TRANSCODING APP


• Remote video surveillance relies on capturing video streams from camera(s) mounted on
surveillance site and transmitting those streams to a remote command and control center for
analysis and inspection The process of streaming or uploading video to a remote monitoring
location requires large bandwidth since video is streamed continuously.
• Moreover, many sites have large number of cameras which makes the streaming process
very costly and in some cases impractical given the needed bandwidth.
• In order to reduce the streaming bandwidth requirements, each camera output (frame rate,
bit rate, resolution) can be configured manually which is an error prone process and would
not be suitable for large scale deployment.
• In addition, this option can degrade the quality of important details of each video frame and
make it difficult to recognize. In other solutions, video transcoding techniques (bitrate,
frame rate, resolution, or combination of them)[1] are used to modify the stream to be
transmitted and adapt it to the available bandwidth.

94
AvidBeam Smart Video Transcoding Solution
• AvidBeam has developed a comprehensive and robust solution for optimizing bandwidth for
surveillance systems with limited effect to the video stream quality [2]. Our solution is based
onthe use of a multistage filter pipeline.
• where several filters are used to eliminate unnecessary frames and identify region of interest
before invoking the video transcoder. Consequently, the transmitted bandwidth can be
reduced dramatically without affecting the quality of the important information in the video
frames.
• Clients can enable/disable each filter separately as well as configure each filter according to
their needs.
Those filters are described as follows

1. Frame Filter
The frame filter is used to detect motion in a given frame. The amount of motion to be
detected is configured. The filter passes only frames with motion greater or equal to the
configured motion size. This way, the small variation in each video frame due to external factor
such as wind blow, camera vibration, or small animals or birds moving in front of the
surveillance camera can be eliminated easily.
the experimental results from using motion detection filter before streaming out frames.
As shown in the Figure, when motion detection is enabled, frames with no significant motion are
not transmitted.

2. Object Specific Filter


The object specific filter is used to identify the presence of object of interest in the video
frame and will pass only the frames that have the object(s) of interest. Figure 4 shows the results
of applying the object specific filter to several use cases such as vehicle license plate recognition
(LPR) or people count.
In each case, a dedicated object detector is applied to the video frame (LPR or people).
The bitrate saving results as shown in both cases which are approximately 73.5% and 43.5%. It
should be noted that these results are directly proportional to the % appearance of the license
plate of people in each frame.

95
3. ROI Filter
The purpose of the ROI filter is to identify region of interest in each frame and pass this
information to the transcoder. The ROI information can be used to clip the transmitted frame or
to encode the frame with different quality values for both of the ROI and none-ROI frame
blocks.

4.Video Transcoder
The final stage in the pipeline includes the actual video transcoding. The transcoder
receives the selected ROI together with their proper quality (quantization) settings. Other
transcoding parameters are also selected (resolution, bitrate, frame rate) based on client system
configuration.

CLOUD SECURITY
security, data security becomes more important when using cloud computing at all
“levels”: infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-
service (SaaS). This chapter describes several aspects of data security, including:
• Data-in-transit
• Data-at-rest
• Processing of data, including multitenancy
• Data lineage
• Data provenance
• Data remanence

The objective of this chapter is to help users evaluate their data security scenarios and
make informed judgments regarding risk for their organizations. As with other aspects of cloud
computing and security, not all of these data security facets are of equal importance in all

96
topologies (e.g., the use of a public cloud versus a private cloud, or non-sensitive data versus
sensitive data).

Aspects of Data Security


• With regard to data-in-transit, the primary risk is in not using a vetted encryption algorithm.
Although this is obvious to information security professionals, it is not common for others
tounderstand this requirement when using a public cloud, regardless of whether it is IaaS,
PaaS, or SaaS.

Data Security Mitigation


• If prospective customers of cloud computing services expect that data security will serve as
compensating controls for possibly weakened infrastructure security, since part of a
customer’s infrastructure security moves beyond its control and a provider’s infrastructure
security may (for many enterprises) or may not (for small to medium-size businesses, or
SMBs) be less robust than expectations, you will be disappointed. Although data-in-transit
can and should be encrypted, any use of that data in the cloud, beyond simple storage,
requires that it be decrypted.
Provider Data and Its Security
• In addition to the security of your own customer data, customers should also be concerned
about what data the provider collects and how the CSP protects that data. Specifically with
regard to your customer data, what metadata does the provider have about your data, how is
it secured, and what access do you, the customer, have to that metadata? As your volume of
data with a particular provider increases, so does the value of that metadata.

CSA CLOUD SECURITY ARCHITECTURE


• where all the security related services for cloud-based platform are shifted form a platform
to an application level and are provided as web services by our security system architecture.
One of the advantages of shifting all the security-related service to an application level is
based on its design modularity and generosity.
• This means that our architecture is applicable to any cloudbased platform, regardless of its
delivery and deployment models. The components of our security system are based on
“Service Oriented Architecture” and are responsible for managing and distributing

97
certificates, identity management (CRUD), identity federation, creating and managing
XACML-based policies, and providing strong authentication mechanisms.
• All the components within the system are interoperable and act as a security service
providers in order to assure a secure cloud-based system. Figure 2 shows logical components
of our central security system.

• PKI server, also known as Local Certification Authority (LCA) in our system is responsible
for issuing and distributing X509 certificates to all components in a domain. This server can
either be configured as single certification authority, by generating self-signed certificates or
may be linked to PKI in order to exchange certificates and establish trust relationship

98
between various domains. In this case higher level trusted certification authority server
issues certificates to the issuing CA.
Authentication System Security Protection
• SSO service provider interacts with service consumers through request-response message
protocols. All system entities securely store their private keys locally. SAML server issues
tickets according to the decision made by the central authentication server. That is why they
communicate only over trusted internal network.
• At the same time central authentication server communicates with the IDMS and CA servers
over a trusted network. Therefore, the central security system is an isolated secure
environment, where all the system entities trust each other.

AUTHENTICATION
• Authentication System A single enterprise may provide many application services to end-
users. E-mail servers and web servers are examples of application services providers. As
company’s boundaries broaden, the number of application services grows. Mostly all service
providers should authenticate clients before service transactions are executed, because they
are dealing with personal information.
• This means that the client should have security context for each application server and log in
before it can consume any service. The same situation happens when the client accesses
resources in different security domains.
• As mentioned in the second chapter, having many security credentials for authentication
purposes is not an effective solution from security, system coordination, and management
perspectives. While organizations migrate to cloud environments, the same problem still
exists.
• To this problem, as a solution a Single Sign-on (SSO) protocol is proposed, which is part of
the shared security system of a cloud environment. This solution relies on the SAML web
browser SSO profile, which complete description can be found in the following referenced
document.
• The system consists of a SAML server which provides SSO services for application service
providers: SAML server issues SAML ticket which contains an assertion about the client’s
identity verification, thus confirming that it has been properly authenticated or not. Once the
user is authenticated, he or she can request access to different authorized resources at
different application provider sites without the need to re-authenticate for each domain.
99
• SAML server resides in the shared security system. Besides SAML assertions issuing
server, there are three other security entities in the central security system, coordinated with
each other, in order to accomplish the desired solution.

AUTHORIZATION
• As already mentioned earlier, different application services may be hosted in a cloud
environment and may use the same physical resources. However, each application service is
logically separated from others.
• Different types of system entities consume those services; therefore, application service
provider should manage a proper mechanism for access control decisions. This means that
various users, after being successfully authenticated, should request and access those
resources and services for which they are authorized in a particular enterprise security
domain.
• As the number of the services and service consumers grow, management of access control
mechanism becomes more complex and expensive: each service provider needs to
implement independent access control mechanism by means of self-governing security
policies and policy enforcement points.
• Decoupling policies from application services and managing them independently from
application services results in a solution which is more effective for an authorization system.
Applications focus only on system functionality and business value.
• Having a single security policy management point makes the entire authorization system
more flexible and secure, meaning that it can be administered, configured and protected
separately from application services. In this way, it is easy to configure and apply common
policies for every application service in a single security domain.
• Besides, changing a policy becomes very simple because of a single location for policy
management. Protection and auditing of the authorization system is managed separately thus
making it much harder to compromise
• Role-based authorization system is proposed for a cloud environment which is a component
of the central security system. XACML is the main standard adopted for this authorization
system. The system provides authorization services for cloud-based application services.

100
• As Policy Decision Point (PDP) server resides in the central security system. It implements
role-based access control mechanism and provides authorization services to application
service providers within a security domain.

IDENTITY AND ACCESS MANAGEMENT


• Identity Management (IdM) is capable of performing functions like, administration,
discovery, maintenance, policy enforcement, management, information exchange and
authentication. Identity and Access Management (IAM) ratifies that same identity are used
and managed for all applications and simultaneously ensures security.
• It is used to authenticate users, devices or services and to grant or deny rights to access data
and other system resources. In the event of the access of any application, the system or
service does not require its own identity store or authentication mechanism to authenticate.
Instead, the process of identity verification can be configured with the trusted identity
provider which indeed reduces the workload of the application.
• Identity and access management simplifies the management of large-scale distributed
systems. Identity and access management are used within an enterprise or outside of an
enterprise in a business-to-business relationship or even between a private enterprise and a
cloud provider.
• IAM has an extensive organizational area that deals with identifying cloud objects, entities
and controlling access to resources based on pre-established policies. There are a number of
operational areas related to identity and access management. The operational areas include
identity management and provisioning, authentication management, federated identity
management, authorization management and compliance management. These operational
areas ensure that the authorized users are securely and effectively incorporated into the
cloud.
• The Service Provisioning Markup Language (SPML) is an XML-based framework that is
used for identity management. It exchanges resources, user, and service provisioning
information between organizations.
• One of the shortcomings of SPML is that it uses multiple proprietary protocols from various
vendors which leads to a bunch of different Application Peripheral Interfaces (APIs). As the
APIs are not of the same vendor, it is difficult to make them interact with each other.
• The second operational area of IAM is authentication management. This ensures that
credentials such as passwords and digital certificates are managed securely.
101
• The third operational area of IAM is Federated Identity Management. This identity
management authenticates cloud services using the organization’s selected identity provider.
Federated identity management ensures privacy, integrity and non-repudiation. This also
ensures the trust between a web-based application and the identity provider by exchanging
Public Key Infrastructure (PKI) certified public keys.
• The fourth operational area is authorization management. After successful authentication,
authorization management determines whether the authenticated entity is allowed to perform
any function within a given application.
• The last operational area of identity and access management is compliance management.
This ensures that an organization’s resources are secure and accessed in accordance with the
existing policies and regulations.

DATA SECURITY
• Data protection is a crucial security issue for most organizations. Before moving into the
cloud, cloud users need to clearly identify data objects to be protected and classify data
based on their implication on security, and then define the security policy for data protection
as well as the policy enforcement mechanisms.
• For most applications, data objects would include not only bulky data at rest in cloud servers
(e.g., user database and/or filesystem), but also data in transit between the cloud and the
user(s) which could be transmitted over the Internet or via mobile media (In many
circumstances, it would be more cost-effective and convenient to move large volumes of
data to the cloud by mobile media like archive tapes than transmitting over the Internet.).
• Data objects may also include user identity information created by the user management
model, service audit data produced by the auditing model, service profile information used
to describe the service instance(s), temporary runtime data generated by the instance(s), and
many other application data..

Security Services:
• The basic security services for information security include assurance of data
Confidentiality, Integrity, and Availability (CIA). In Cloud Computing, the issue of data
security becomes more complicated because of the intrinsic cloud characteristics. Before
potential cloud users are able to safely move their applications/data to the cloud, a suit of

102
security services would be in place which we can identify as follows (not necessarily all
needed in a specific application):

1) Data confidentiality assurance: This service protects data from being disclosed to
illegitimate parties. In Cloud Computing, data confidentiality is a basic security service to be in
place. Although different applications may have different requirements in terms of what kind of
data need confidentiality protection, this security service could be applicable to all the data
objects discussed above.

2) Data integrity protection: This service protects data from malicious modification.
When having outsource their data to remote cloud servers, cloud users must have a way to check
whether or not their data at rest or in transit are intact. Such a security service would be of the
core value to cloud users. When auditing cloud services, it is also critical to guarantee that all the
audit data are authentic since these data would be of legal concerns. This security service is also
applicable to other data objects discussed above.

3) Guarantee of data availability: This service assures that data stored in the cloud are
available on each user retrieval request. This service is particularly important for data at rest in
cloud servers and related to the fulfillment of Service Level Agreement. For long-term data
storage services, data availability assurance is of more importance because of the increasing
possibility of data damage or loss over the time.

103
4) Secure data access: This security service is to limit the disclosure of data content to
authorized users. In practical applications, disclosing application data to unauthorized users may
threat the cloud user’s business goal. In missioncritical applications, inappropriate disclosure of
sensitive data can have juristic concerns. For better protection on sensitive data, cloud users may
need finegrained data access control in the sense that different users may have access to different
set of data. This security service is applicable to most of the data objects addressed above.

5) Regulations and compliances: In practical application scenarios, storage and access


of sensitive data may have to comply specific compliance. For example, disclosure of health
records may be limited by the Health Insurance Portability and Accountability Act (HIPAA)
[12]. In addition to this, the geographic location of data would frequently be of concern due to
export-law violation issues. Cloud users should thoroughly review these regulation and
compliance issues before moving their data into the cloud.

KEY MANAGEMENT

• Cloud key management Infrastructure consists of cloud key management client (CKMC)
and cloud key management server (CKMS) [5]. CKMC exits in cloud applications,
104
serving for three fundamental cloud service model, including Software, Platform or
Infrastructure (as a Service).

Management Of Key At Client


Side In this approach, data will be stored at cloud service provider side in encrypted
form. Client may be thin e.g. mobile phone. Keys will be maintained at customer side. Usually
this approach is taken in Homomorphism cryptographic technique. Operations are done on
encrypted data at server side [8, 9]. Figure describes key management approach. In this
approach, mobile phone user and desktop user maintains key at its own side [20].

Key Management
At Cloud Service Provider Side In this approach, keys are maintained at cloud service
provider side. If the key is lost, customer is unable to read data which is present at cloud. Data is
stored in the encrypted form and decrypted by the key to get it in the original form.

Management Of Key
At Both Sides In this technique, key is divided into two parts. One part is stored at user
side and other part is stored at cloud side. If both parts are combined together, it is possible to
retrieve the data properly. Thus, data remains the secure and can be controlled by the user. Thus,
solution is also scalable. Cloud service provider and user do not need to maintain complete key
at Cloud side. If part of the key is lost, data cannot be recovered.

Key Management
At Centralized Server This approach uses asymmetric key approach. Data is encrypted
with the public key stored in key server. Data at cloud side is stored in the encrypted form. The
user accesses the data. This will be decrypted by private key maintained at each user.
Disadvantage of this method is that if key server is crashed, its single point of failure]. that each
user generates public and private keys. Public keys are stored at Key server. Suppose mobile
phone user wants to share data with desktop user. He/She will encrypt the data with public key of
desktop user. Thus desktop user will access data with its private key

105
Group Key Management
For Cloud Data Storage Data is shared in cloud by trusted members of the group. Group
key is established for securing data at cloud side. Group key is formed by the partial keys
maintained at each user. If particular group members want to access the data, group key used to
access the data.

QUESTIONS
5 MARKS
1. Write short notes on clustering big data.
2. Explain about recommendation system.
3. Explain about live stream app.
4. Discuss about video transcoding app.
5. Write notes on authentication.
6. Explain about authorization.

10 MARKS
7. Explain in detail about classification of big data.
8. Write notes on streaming protocol.
9. Discuss about CSA cloud security architecture.
10.explain the following:
a)data security
b)key management

UNIT V COMPLETED

106

You might also like