CLOUD COMPUTING msccs notes
CLOUD COMPUTING msccs notes
Cloud computing offers platform independency, as the software is not required to be installed locally on
the PC. Hence, the Cloud Computing is making our business applications mobile and collaborative.
History of Cloud Computing
4
The concept of Cloud Computing came into existence in the year 1950 with implementation of mainframe
computers, accessible via thin/static clients. Since then, cloud computing has been evolved from static clients to
dynamic ones and from software to services. The following diagram explains the evolution of cloud computing:
Resource Pooling
Cloud computing allows multiple tenants to share a pool of resources. One can share single physical
instance of hardware, database and basic infrastructure.
Rapid Elasticity
It is very easy to scale the resources vertically or horizontally at any time. Scaling of resources means the
ability of resources to deal with increasing or decreasing demand.
The resources being used by customers at any given point of time are automatically monitored.
Measured Service
In this service cloud provider controls and monitors all the aspects of cloud service. Resource optimization,
billing, and capacity planning etc. depend on it.
5
CLOUD MODELS
There are certain services and models working behind the scene making the cloud
computing feasible and accessible to end users. Following are the working models for cloud
computing:
• Service Models
• Deployment Models
Service models
Cloud computing is a model for enabling ubiquitous, convenient, on -demand
network access to a shared pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be rapidly provisioned and
released with minimal management effort or service provider interaction. This cloud
model is composed of five essential characteristics, three service models, and four
deployment models.
• Software as a Service (SaaS)
• Platform as a Service (PaaS).
• Infrastructure as a Service (IaaS).
• Communication-as-a-Service (CaaS)
6
Software as a Service (SaaS).
The traditional model of software distribution, in which software is purchased for and
installed on personal computers, is sometimes referred to as Software-as-a-Product. Software-as-
a-Service is a software distribution model in which applications are hosted by a vendor or service
provider and made available to customers over a network, typically the Internet.
SaaS is becoming an increasingly prevalent delivery model as underlying technologies
that support web services and service-oriented architecture (SOA) mature and new
developmental approaches become popular.
SaaS is also often associated with a pay-as-you-go subscription licensing model. Mean-
while, broadband service has become increasingly available to support user access from more
areas around the world. Examples are Google’s Gmail and Apps, instant messaging from AOL,
Yahoo and Google.
7
Communication-as-a-Service (CaaS)
A CaaS model allows a CaaS provider’s business customers to selectively deploy
communications features and services throughout their company on a pay-as-yougo basis for
service(s) used. CaaS is designed on a utility-like pricing model that provides users with
comprehensive, flexible, and (usu-ally) simple-tounderstand service plans.
Deployment models
As the cloud technology is providing users with so many benefits, these benefits must
have to be categorized based on users requirement. Cloud deployment model represents the exact
category of cloud environment based on proprietorship, size, and access and also describes the
nature and purpose of the cloud. Most organizations implement the cloud infrastructure to
minimize capital expenditure & regulate operating costs.
• Public Cloud
• Community Cloud
• Hybrid Cloud
• Private Cloud
Public cloud
• Public cloud or external cloud describes cloud computing in the traditional mainstream
sense, whereby resources are dynamically provisioned on a fine-grained, selfservice basis
8
over the Internet, via web applications/web services from an off-site third-party provider
who bills on a fine-grained utility computing basis.
• The cloud infrastructure is made available to the general public or a large industry group,
and is owned by an organization selling cloud services. Examples: Amazon Elastic-
Compute-Cloud, IBM's BlueCloud, Sun Cloud, Google AppEngine.
Community cloud
A community cloud may be established where several organizations have similar
requirements and seek to share infrastructure so as to realize some of the benefits of cloud
computing. With the costs spread over fewer users than a public cloud (but more than a single
tenant) this option is more expensive but may offer a higher level of privacy, security and/or
policy compliance. Examples of community cloud include Google’s "Gov Cloud".
Hybrid cloud
• The term "Hybrid Cloud" has been used to mean either two separate clouds joined
together (public, private, internal or external), or a combination of virtualized cloud
server instances used together with real physical hardware.
• The most correct definition of the term "Hybrid Cloud" is probably the use of physical
hardware and virtualized cloud server instances together to provide a single common
service. Two clouds that have been joined together are more correctly called a "combined
cloud".
• A hybrid storage cloud uses a combination of public and private storage clouds. Hybrid
storage clouds are often useful for archiving and backup functions, allowing local data to
be replicated to a public cloud.
Private cloud
• A private cloud is a particular model of cloud computing that involves a distinct and secure
cloud based environment in which only the specified client can operate. As with other cloud
models, private clouds will provide computing power as a service within a virtualized
environment using an underlying pool of physical computing resource.
• However, under the private cloud model, the cloud (the pool of resource) is only accessible
by a single organization providing that organization with greater control and privacy.
• The possible dependencies between CaaS, SaaS, PaaS & IaaS is as follows:
9
CLOUD SERVICES EXAMPLES
1. Scalable Usage
Cloud computing offers scalable resources through various subscription models. This
means that you will only need to pay for the computing resources you use. This helps in
managing spikes in demands without the need to permanently invest in computer hardware.
Netflix, for instance, leverages this potential of cloud computing to its advantage. Due to
its on-demand streaming service, it faces large surges in server load at peak times. The move to
migrate from in-house data centres to cloud allowed the company to significantly expand its
customer base without having to invest in setup and maintenance of costly infrastructure.
2. Chatbots
The expanded computing power and capacity of the cloud enables us to store information
about user preferences. This can be used to provide cus\
tomized solutions, messages and products based on the behaviour and preferences of users.
Siri, Alexa and Google Assistant - all are cloud-based natural-language intelligent bots.
These chatbots leverage the computing capabilities of the cloud to provide personalized context-
relevant customer experiences. The next time you say, “Hey Siri!” remember that there is a
cloud-based AI solution behind it.
3. Communication
The cloud allows users to enjoy network-based access to communication tools like
emails and calendars. Most of the messaging and calling apps like Skype and WhatsApp are
also based on cloud infrastructure. All your messages and information are stored on the service
provider’s hardware rather than on your personal device. This allows you access your
information from anywhere via the internet.
10
4. Productivity
Office tools like Microsoft Office 365 and Google Docs use cloud computing, allowing
you to use your most-productive tools over the internet. You can work on your documents,
presentations and spreadsheets - from anywhere, at any time. With your data stored in the cloud,
you don’t need to bother about data loss in case your device is stolen, lost or damaged. Cloud
also helps in sharing of documents and enables different individuals to work on the same
document at the same time.
5. Business Process
Many business management applications like customer relationship management (CRM)
and enterprise resource planning (ERP) are also based on a cloud service provider. Software as a
Service (SAAS) has become a popular method for deploying enterprise level software.
Salesforce, Hubspot, Marketo etc. are popular examples of this model. This method is
cost-effective and efficient for both the service provider and customers. It ensures hassle free
management, maintenance and security of your organization’s critical business resources and
allows you to access these applications conveniently via a web browser.
6. Backup and recovery
When you choose cloud for data storage the responsibility of your information also lies
with your service provider. This saves you from the capital outlay for building infrastructure and
maintenance. Your cloud service provider is responsible for securing data and meeting legal and
compliance requirements.
The cloud also provides more flexibility in the sense that you can enjoy large storage and
on-demand backups. Recovery is also performed faster in the cloud because the data is stored
over a network of physical servers rather than at one on-site data centre. Dropbox, Google Drive
and Amazon S3 are popular examples of cloud backup solutions.
11
7. Application development
Whether you are developing an application for web or mobile or even games, cloud
platforms prove to be a reliable solution. Using cloud, you can easily create scalable cross-
platform experiences for your users. These platforms include many pre-coded tools and libraries
— like directory services, search and security. This can speed up and simplify the development
process. Amazon Lumberyard is a popular mobile game development tool used in the cloud.
Cloud computing enables data scientists to tap into any organizational data to analyze it
for patterns and insights, find correlations make predictions, forecast future crisis and help in
12
data backed decision making. Cloud services make mining massive amounts of data possible by
providing higher processing power and sophisticated tools.
There are many open source big data tools that are based on the cloud for instance
Hadoop, Cassandra, HPCC etc. Without the cloud, it won’t be very difficult to collect and
analyze data in real time, especially for small companies.
A cloud application, or cloud app, is a software program where cloud-based and local
components work together. This model relies on remote servers for processing logic that is
accessed through a web browser with a continual internet connection.
Cloud application servers typically are located in a remote data center operated by a
third-party cloud services infrastructure provider. Cloud-based application tasks may encompass
13
email, file storage and sharing, order entry, inventory management, word processing, customer
relationship management (CRM), data collection, or financial accounting features.
• Instant scalability
As demand rises or falls, available capacity can be adjusted.
• API use
Third-party data sources and storage services can be accessed with an application
programming interface (API). Cloud applications can be kept smaller by using APIs to hand
data to applications or API-based back-end services for processing or analytics computations,
with the results handed back to the cloud application. Vetted APIs impose passive
consistency that can speed development and yield predictable results.
• Gradual adoption.
Refactoring legacy, on-premises applications to a cloud architecture in steps, allows
components to be implemented on a gradual basis.
14
• Reduced costs.
The size and scale of data centers run by major cloud infrastructure and service providers,
along with competition among providers, has led to lower prices. Cloud-based applications
can be less expensive to operate and maintain than equivalents on-premises installation.
• Improved data sharing and security.
Data stored on cloud services is instantly available to authorized users. Due to their
massive scale, cloud providers can hire world-class security experts and implement
infrastructure security measures that typically only large enterprises can obtain. Centralized
data managed by IT operations personnel is more easily backed up on a regular schedule and
restored should disaster recovery become necessary.
• How cloud apps work
Data is stored and compute cycles occur in a remote data center typically operated by a
third-party company. A back end ensures uptime, security and integration and supports
multiple access methods.
Cloud applications provide quick responsiveness and don't need to permanently reside on
the local device. They can function offline, but can be updated online.
While under constant control, cloud applications don't always consume storage space on a
computer or communications device. Assuming a reasonably fast internet connection, a well-
written cloud application offers all the interactivity of a desktop application, along with
the portability of a web application.
16
• Because it's not possible to enforce an upgrade whenever a new version is available, it's
tricky to have all users running the same one. The need to provide support for multiple
versions simultaneously can become a burden on tech support.
• Cloud applications don't face version control issues since users can access and run only the
version available on the cloud.
17
3. Infrastructure as a service (IaaS)
These are sometimes called the cloud computing stack, because they are build on top of one
another. Knowing what they are and how they are different, makes it easier to accomplish your
goals.
1. Software As A Service
Software-as-a-Service (SaaS) is a way of delivering services and applications over the
Internet. Instead of installing and maintaining software, we simply access it via the Internet,
freeing ourselves from the complex software and hardware management.It removes the need to
install and run applications on our own computers or in the data centers eliminating the expenses
ofhardware as well as software maintenance. SaaS provides a complete software solution which
you purchase on a pay-as-you-go basis from a cloud service provider.Most SaaS applications
can be run directly from a web browser without any downloads or installations required.The
SaaS applications are sometimes called Web-based software, on-demand software, or hosted
software.
Advantages of SaaS
1. Cost Effective :
Pay only for what you use.
2. Reduced time :
Users can run most SaaS apps directly from their web browser without needing to
download and install any software.This reduces the time spent in installation and configuration,
and can reduce the issues that can get in the way of the software deployment.
3. Accessibility :
We can Access app data from anywhere.
4. Automatic updates :
Rather than purchasing new software, customers rely on a SaaS provider to automatically
perform the updates.
5. Scalability :
It allows the users to access the services and features on demand.
The various companies providing software as a service are Cloud9 Analytics, Salesforce.com,
Cloud Switch, Microsoft Office 365, Eloqua, dropBox and Cloud Tran .
2. Platform As A Service
18
• PaaS is a category of cloud computing that provides a platform and environment to allow
developers to build applications and services over the internet.
• PaaS services are hosted in the cloud and accessed by users simply via their web browser.
A PaaS provider hosts the hardware and software on its own infrastructure.
• As a result, PaaS frees users from having to install in-house hardware and software to
develop or run a new application.Thus, the development and deployment of the
application takes place independent of the hardware.
• The consumer does not manage or control the underlying cloud infrastructure including
network, servers, operating systems, or storage, but has control over the deployed
applications and possibly configuration settings for the application-hosting environment.
Advantages of PaaS:
1. Simple and convenient for users :
It provides much of the infrastructure and other IT services, which users can access
anywhere via a web browser.
2. Cost Effective :
It charges for the services provided on a per-use basis thus eliminating the expenses one
may have for on-premises hardware and software.
3. Efficiently managing the lifecycle :
It is designed to support the complete web application lifecycle: building, testing,
deploying, managing and updating.
4. Efficiency :
It allows for higher-level programming with reduced complexity thus, the overall
development of the application can be more effective
The various companies providing Platform as a service are Amazon Web services, Salesforce,
Windows Azure, Google App Engine, cloud Bess and IBM smart cloud.
3. Infrastructure As A Service
Infrastructure as a service (IaaS) is a service model that delivers computer infrastructure
on an outsourced basis to support various operations.
Typically IaaS is a service where infrastructure is provided as an outsource to enterprises
such as networking equipments, devices, database and web servers.
19
Infrastructure as a service (IaaS) is also known as Hardware as a service (HaaS).
IaaS customers pay on a per-use basis, typically by the hour, week or month. Some
providers also charge customers based on the amount of virtual machine space they use.
It simply provides the underlying operating systems, security, networking, and servers for
developing such applications, services, and for deploying development tools, databases, etc.
Advantages of IaaS :
1. Cost Effective :
Eliminates capital expense and reduces ongoing cost and IaaS customers pay on a per use
basis, typically by the hour, week or month.
2. Website hosting :
Running websites using IaaS can be less expensive than traditional web hosting.
3. Security :
The IaaS Cloud Provider may provide better security than your existing software.
4. Maintainence :
There is no need to manage the underlying data center or the introduction of new releases
of the development or underlying software. This is all handled by the IaaS Cloud Provider.
The various companies providing Infrastructure as a service are Amazon web services,
Bluestack, IBM, Openstack, Rackspace and Vmware.
20
1. Various viewpoints
There are many stakeholders in the world of cloud computing, ranging from the individual
user to the entrepreneur service provider to the service developer to the legal compliance
auditor.
Each has a role to play and activities to perform, and each will see cloud computing
differently. It is important to distinguish among the requirements of each of the stakeholders.
21
“paradigm for enabling network access to a scalable and elastic pool of shareable physical
or virtual resources with self-service provisioning and administration on-demand.”
This provides lots of latitude for implementing cloud-based IT solutions and ensures there
will be lots of competition among suppliers for both the underlying resources and the provision
of services.
5. Cloud-based systems
Fundamental to cloud computing is the idea that what is delivered to the customer is
services, not systems consisting of dedicated hardware and software. Under the cloud computing
“covers” there may be many components that are shared by many customers.
This includes security services, administrative services, ecosystem services and performance
services. The vision is to make cloud IT pervasive and to achieve both the digital economy (for
business) and the digital society (for the public) leading to the “Digitally Interconnected
Society”.
22
7. Cloud services
There is no single definition for a cloud service, although conceptually each service provides
a measured amount of one or more capabilities at specified levels of quality at an agreed upon
price. Three types of service are currently being defined:
• IaaS (Infrastructure as a Service) covers basic computing, storage and networking
capabilities;
• PaaS (Platform as a Service) provides capabilities for developing and deploying customer-
owned applications; and
• SaaS (Software as a Service) provides commonly used applications on a shared basis.
Offering services at one level does not imply that services are also offered for all lower levels,
however.
8. Overarching concerns
There are a number of overarching considerations that are generally applicable to any cloud
computing deployment and which have a major impact on the success of any cloud-based
system. For example:
• Governance and management: auditability, governance, regulatory, privacy, security and
service levels (SLAs);
• Qualities: availability, performance and resiliency;
• Capabilities: interoperability, maintainability, portability and reversibility
9. Cloud-specific risks
As with any new technology, there are business risks associated with cloud computing, both
for providers and customers. While the most visible of these has so far been security, there are
other important things to keep in mind, including:
• Supplier quality, longevity and lock-in
• Available expertise – technical, organizational and operational
• Adaptation of business processes to available services
• Financial management including changes in purchasing and variable bills
• Exploitation and innovation
Sum total is that there are many things to consider as you prepare to include cloud computing in
your IT solutions. These vary according to the role you will play, the services that are being
used and the maturity of your IT organization. As part of developing your policies and roadmaps
23
for cloud computing, I recommend creating a centre of cloud computing excellence to kick start
your journey.
Cloud technology
• Cloud Computing Technologies (CCT), as a cloud systems integrator and cloud service
provider, CCT specializes in cloud systems aggregation, cross-cloud platform integration,
application API integration, software development, and management of your cloud-
ecosystem. Our professional cloud services include cloud systems design and
implementation (private cloud, public cloud, or hybrid cloud), migration to shared services
and on-premises private cloud infrastructures.
• As your single point of contact for cloud integration, we explain third-party cloud service
level agreements, pricing models, and contracts as your trusted adviser.
• Organizations that seek do-it-yourself cloud shared services solutions, CCT, offers secure,
scalable, and on-demand cloud service through our enterprise level cloud partners, Amazon
Web Services Platform-as-a-Service (Paas).
• At all Cloud Computing Technologies services levels, we are proud of our track record of
delivering high-impact public cloud service with excellent customer satisfaction.
Our mission is “To provide high-quality Cloud Computing Shared Services Solutions to
accomplish our clients’ business goals and develop long-term relationships.
• Our commitment is to continuous improvement of shared services, deliverables, and
competitive pricing with current and emerging cloud computing technology.”
24
These attributes that virtualization hold are the core of cloud computing technologies and
is what makes it possible for cloud computing’s key characteristics of multitenancy, massive
scalability, rapid elasticity and measured service to exist.
Three forms of virtualization There are three types of Virtualization known as Full, Para
and Isolation, each are explained below and are represented visually Full Virtualization – This
type of virtualization operates at the processor level, which supports unmodified guest operating
systems that simulate the hardware and software of the host machine.
Para-virtualization
Utilizes the use of a virtual machine monitor, which is software that allows a single
physical machine to support multiple virtual machines . It allows multiple virtual machines to
run on one host and each instance of a guest program is executed independently on their own
virtual machine.
Isolation
Is similar to Para virtualization although it only allows virtualization of the same
operating system as the host and only supports Linux systems but it is considered to perform the
best and operate the most efficiently.
As more businesses are starting to move to the cloud, they should be aware of the many
challenges that the technology is currently experiencing, its important that they are prepared to
encounter some of these challenges during their migration towards cloud technologies. Cloud
computing has been around for many years and has always been clouded in ambiguity as to what
the technology was and many individuals would provide their own interpretations and opinions
in defining various cloud delivery models.
This is very much to do with lack of standards and a clear definition of each aspect of
what cloud technology is and how it actually functions. Many cloud computing providers
admitted they consider standards as the first step to commoditization, something they would
rather not see this early in the emerging market . So the lack of standards is partially to do with
many cloud providers not wanting them defined yet, which is certainly going to cause more
ambiguity and possibly slow down adoption of cloud technologies.
25
Cloud consumers do not have control over the underlying computing resources, they do
need to ensure the quality, availability, reliability, and performance of these resources when
consumers have migrated their core business functions onto their entrusted cloud [13]. Cloud
service providers need to be transparent and responsible for the services they provide for their
consumers to create consumer confidence in their services.
Consumer confidence can be achieved through a mutual agreement commonly referred to
as a Service Level Agreement (SLA). By migrating to the cloud service provider’s infrastructure
means that they have a large responsibility for the consumers’ data and services to be maintained
and made available with the specification outlined in the SLA.
26
QUESTIONS
5 MARKS
1. Write short notes on cloud computing?
2. Write about cloud models?
3. Explain cloud based services.
4. What is mean by cloud technology? Explain it.
10 MARKS
5. Explain about characteristics of cloud computing.
6. Discuss about cloud based services and application.
7. Explain about cloud concept and technology.
8. What is mean by cloud services? Give example of cloud services.
UNIT I COMPLETED
27
UNIT II
CLOUD SERVICES AND PLATFORM
COMPUTE SERVICES
• Compute as a service (CaaS) is the provisioning of computing resources (access to raw
compute or server capacity) on demand.
• This cloud computing layer involves the delivery of virtual or physical resources as a
service, priced via a consumption-based model.
• CaaS was the original cloud offering, emerging in 2006 via the launch of Amazon’s EC2.
While EC2 continues to enjoy a significant market-share lead, more than 50 vendors have
entered the space to challenge the leader for a piece of the pie. With commoditization taking
hold, the sector faces unique challenges but remains a core part of the broader cloud
universe.
• This report examines the revenue being generated by CaaS providers, predicts the growth
trends for the space, and highlights the opportunities and threats facing vendors.
28
When launching a new instance, the user selects a key-pair from existing keypairs or
creates a new keypair for the instance. Keypairs are used to securely connect to an instance after
it launches.
• Security Groups
The security groups to be associated with the instance can be selected from the instance
launch wizard. Security groups are used to open or block a specific network port for the launched
instances.
29
STORAGE SERVICES
A cloud storage service is a business that maintains and manages its customers' data and
makes that data accessible over a network, usually the internet.
Most of these types of services are based on a utility storage model. They tend to offer
flexible, pay-as-you-go pricing and scalability. Cloud storage providers also provide for
unlimited growth and the ability to increase and decrease storage capacity on demand.
Leading use cases for a cloud storage service include backup, disaster recovery (DR),
collaboration and file sharing, archiving, primary data storage and near-line storage.
• Does the service use REST, the most commonly used cloud storage API?
30
• Does your data have to be preserved in some specific format to meet compliance
requirements? That capacity is not commonly available.
• Does the provider offer both public and private clouds? This may become important if you
want to migrate data from one type of service to the other.
Hybrid cloud storage offers the best of the private and public cloud with high scalability
and on-premises integration that adds more layers of security. The result is better performance
and reliability because active content is cached locally. While a hybrid cloud tends to be more
costly than public storage, it is cheaper than private cloud storage. Reliability can be an issue, as
users must depend on service provider availability and internet connectivity.
31
Adapt to the behaviors of the cloud storage service.
The way a cloud storage provider stores and provides access to data cannot be changed
by customers to address unexpected variations in performance as they share the infrastructure
with many other organizations.
But clients do have the ability to redesign the architecture of their workloads by
duplicating storage resources in more than one public cloud region, for example. This way, cloud
storage customers can redirect storage resources to the replicated region should problems arise.
Caching can also be used to address -- and head off -- potential cloud storage service
performance issues.
Improve connectivity.
Sometimes, performance issues are caused by shortcomings in the internet connection
itself due to unexpected disruption and congestion -- always a risk when using the public
internet.
32
rational database and operating virtual machine loaded with local database software like
SQL.
• There are different companies offering database as a service, DBaaS like Amazon RDS,
Microsoft SQL Azure, Google AppEngine Datastore and Amazon SimpleDB (Pizzete and
Cabot 2012). Each service provider is different from the other depending upon the quality
and sort of services being provided.
• There are certain parameters that can be used to select the best service that will suit for your
company. This is not limited to a certain company; these parameters can help in deciding the
best service provider depending upon the requirements of any company.
Data Sizing
Every DBaaS provider has a different capacity of storing data on the database. The data
sizing is very important as the company will need to be sure about the size of data that it will be
stored in its database. For example, the Amazon RDS allows the user to store up to 1TB of data
in one database on the other hand SQL Azure offers only 50GB of data for one database.
33
Portability
The database should be portable as the database should never be out of the access of the
user. The service provider may go out of business, so the database and the data stored can be
destroyed. There should be an emergency plan if such things happen. This can be resolved by
taking cloud services from other companies as well so that the database is accessible even in the
case of emergency.
Transaction Capabilities
The transaction capabilities are the major feature of the cloud database as the completion
of the transaction is very important for the user. The user must be aware if the transaction has
been successful or not. There are companies who mostly do transact money, in this situation the
complete read and write operations must be accomplished. The user needs a guarantee of the
transaction he made, and this sort of transaction is called an ACID transaction (Pizzete and Cabot
2012). If there is no need of the guarantee then the transactions can be made by non ACID
transactions. This will be faster as well.
Configurability
There are many databases that can easily configurable by the user as most of the
configuration are done by the service provider. In this way there are very less options available
left to the administrator of the database and he can easily manage the database without more
efforts.
Database Accessibility
As there are different number of databases, the mechanism for accessing the database are
different as well. The first method is the one that is RDBMS being offered through the standards
of the industry drivers such as Java Database Connectivity. The motive of this driver is that
allows the external connection to access the services through the standard connection. The
second accessibility of the database is that by the usage of interfaces or protocols like, Service-
Oriented Architecture (SOA) and SOAP or rest (Pizzete and Cabot 2012). These interfaces use
HTTP and some new API definition.
34
Certification and Accreditation
It is better to get the services of the cloud database provider, who have got certification
and accreditation. It helps in mitigating the risks of services for the company to avoid any
inconvenience. The companies who have certifications like FISMA can be considered reliable as
compared to other DBaaS provider.
International Journal of Database Management Systems Data Integrity, Security and
Storage Location Security has been the major threat to the data stored in the cloud storage. The
security also depends on the encryption methods used and the storage locations of the data.The
data is stored in the different locations in data centers.
APPLICATIONS SERVICES
Applications as a service refers to the delivery of computer software applications as a
service via the Internet. This type of software is also referred to as SaaS (Software as a Service),
software on demand and on-demand software. On-demand software has been gaining an
increasing share of the software market, due to the cost savings and efficiency gains it can offer
to organizations, regardless of their size. On-demand software provides financial benefits to
organizations, by eliminating the expense of individual user licenses which normally accompany
traditional on-premise software delivery.
Applications as a service can also provide software to enterprise users more efficiently,
because it can be distributed and maintained for all users at a single point – in the public cloud
(Internet). The efficiency gains are facilitated through the use of various automation tools that
can be implemented on the cloud services platform. The automation of functions such as: user
provisioning, user account management, user subscription management and application life cycle
35
management make on-demand software a highly efficient and cost effective way to deliver
software to enterprise users.
Companies that provide applications as a service (on-demand software) are known as ASPs
or application service providers. ASPs are divided into 4 major categories, as follows:
ASPs own the software that they deliver to consumers, as well as the hardware which
supports the software. ASPs bill on a per use basis, on a monthly basis or on an annual basis –
making software on demand a very affordable option for many organizations. On-demand
software provides small and medium size businesses with a method of accessing software that
may have previously been financially out of reach, due to software licensing costs and additional
hardware costs.
36
A cloud CDN (CCDN)provides a flexible solution allowing content providers to
intelligently match and placecontent on one or more cloud storage servers based on coverage,
budget and QoS preferences . The key implication is economies of scale and the benefits
delivered by the pay-as-you-go model. Using clouds the content providers have more agility in
managing situations such as flash crowds avoiding the need to invest in infrastructure
development.
Pay-as-you-go CCDN model
CCDN allows the users to consume the delivery content using a pay-as-you-go model.
Hence, it would be much more cost-effective than owning the physical infrastructure that is
necessary for the users to be the part of CDN.
Increased point-of-presence
The content is moved closer to users with relative ease in the CCDN system than the
traditional CDN due to the omnipresence of cloud .The Cloud-based content delivery network
can reduce the transmission latency as It can rent operating resources from the cloud provider to
increase the reach and visibility of the CDN on-demand.
CCDN Interoperability
CDN interoperability has emerged as a strategic important concept for service providers
and content providers. Interoperability of CDNs via the cloud will allow content providers to
reach new markets and regions and support nomadic users. E.g., instead of setting up an
37
infrastructure to serve a small group of customers in Africa, taking advantage of current cloud
providers in the region to dynamically host surrogate servers.
Support for variety of CCDN application
The cloud can support dynamic changesin load. This will facilitate the CDNs to support
different kinds of applications thathave unpredictable bursting traffic, predictable bursting traffic,
scale up and scaledown of resources and ability to expand and grow fast.However, while cloud-
based CDNs have made a remarkable progress in the pastfive years, they are still limited in a
number of aspects. For instance, moving into thecloud might carry some marked security and
performance challenges that can impact theefficiency and productivity of the CDN thus affecting
the client’s business.
Dynamic Content Management
CDNs are designed for streaming staged content but do not perform well in
situationswhere content is produced dynamically. This is typically the case when content is pro-
duced, managed and consumed in collaborative activities. For example, an art teachermay find
and discuss movies from different film archives; the students may then edit theselected movies.
Parts of them may be used in producing new movies that will be sent tothe students’friends for
comments and suggestions. Current CDNs do not support suchcollaborative activities that
involve dynamic content creation.
CCDN Ownership
Cloud CDN service providers either own all the services they use to run their
CDNservices or they outsource this to a single cloud provider. A specialized legal and technical
relationship is required to make the CDN work in the latter case.
CCDN Personalization
CDNs do not support content personalization. For example, if the subscriber’s behavior
and usage pattern can be observed, a better estimation on the traffic demand can be achieved.
The performance of content delivery is moving from speed and latency to on-demand delivery of
relevant content matching end-user’s interest and context.
Cost Models for Cloud CDNs
The cloud cost model works well as long as the network consumption is predictable for
both service provider and end-user. However, such predictions become very challenging with
distributed cloud CDNs.
38
Security
CDNs also impose security challenges due to the introduction public clouds to store,
share and route content. The use of multi vendor public clouds further complicates this problem.
Security is the protection of content against unauthorised usage, modification, tampering and
protection against illegal use, hack attacks, viruses and other unwanted intrusions. Further,
security also plays an important role while accessing and delivering content to relevant users .
Hybrid Clouds
The integration of cloud and CDN will also allow the development of hybrid CCDN that
can leverage on a combination and private and public cloud providers. E.g. the content provider
can use a combination of cloud service platforms offered by Microsoft Azure and Amazon AWS
to host their content. Depending on the pay-as-you go model, the content provider can also move
from one cloud provider to another. However, achieving a hybrid model is very challenging due
to various CCDN ownership issues and QoS issues.
ANALYTIC SERVICES
Analytics as a service (AaaS) refers to the provision of analytics software and operations
through web-delivered technologies. These types of solutions offer businesses an alternative to
developing internal hardware setups just to perform business analytics.
To put analytics as a service in context, this type of service is part of a much wider range of
services with similar names and similar ideas, including:
39
may need more servers and other kinds of hardware, and they may need more IT staff to
implement and maintain these programs.
Advantages
Agile Computing Resources
Instead of handling speed and delivery time related hassles from your on-premise servers,
cloud computing resources are high-powered and can deliver your queries and reports in no-time.
Accessibility
Cloud services are capable in sharing data and visualization and performing cross-
organizational analysis, making the raw data more accessible and perceivable by a broader user
base.
Mindtree’s rich infrastructure and application experience enables high availability and
continuous optimization in a hybrid cloud across the business application ecosystem.
Mindtree cloud management services include:
40
Optimization in spending Regular operational metrics
Mindtree has developed a distinctive approach to deliver management through its proven
cloud management platform and skilled workforce. Our platform delivers integrated
functionality of ITSM, monitoring, APM and log analytics, and change and issue resolution. We
also enable integration of cloud with legacy IT landscape. Additionally, we provide a single
window to view the health of IT resources in hybrid or multi-cloud environments.
Mindtree addresses the challenges of both application and infrastructure, jointly referred
to as AppliStructure. Our approach is to deliver security-as-hygiene. This means building
security at every step of the delivery process rather than transposing security in isolation.
Needless to say, we deploy the right tools to help in thread prediction, identification and
remediation.
Benefits
Cost savings
By outsourcing your cloud managed services, you control and reduce costly network
maintenance costs. Staffing a full-time IT department is expensive and often unnecessary for
small to medium-sized businesses with simple networks. Outsourcing to a cloud-first managed
services provider like Agile IT can save you thousands each year on the cost of an in-house IT
department.
Predictable, recurring monthly costs
With the flexibility of cloud managed services, you decide how much you’re willing to
pay for IT services and have a consistent monthly bill.
With a fixed monthly service plan that’s customized to fit your needs or budget, you
optimize the amount you pay for IT support.
41
Future-proofed technology
Migrating to a cloud environment is the first step in future-proofing your data center.
Next, you’ll need make the latest technology and services available to your business.
By hiring an in-house IT staff, your IT personnel will have to spend company time
training when a new technology or required upgrade gets released. Cloud technicians are already
prepared to manage the latest technology.
Disaster recovery
Services are the lifeline of a cloud managed service provider. Agile IT has designed
countless networks and data centers with proven redundancy and resiliency to maintain business
continuity.
With cloud managed services, your data will be safe and secured across all cloud services
and applications. In the event of a disaster, your business and operations can continue with
minimal downtime.
Fast response times
Your businesses can expect quick response times through enterprise-level monitoring and
remote cloud services. Agile IT can access, monitor and repair virtually any network issue
remotely. If you must resolve an issue locally, a technician should be dispatched within the same
business day.
Maintenance
43
• Identity information is prone to changes over time. During the Identity Life
Cycle,modifications of it will therefore most likely be necessary. Synchronization plays a
substantial role here.
• As the information is updated in one data store, it is often desired that this will be distributed
automatically to other data stores using certain synchronization processes that are in place.
• An example of this would be the change inhome address of a certain employee.
IAM Architecture
As we will now talk about technologies however, this functional approach is not very
practical anymore. We will hence look at IAM from a more architectural point of view now,
starting with the following diagram inspired.
Before we will focus on the different components of IAM systems as depicted in
thisoverview, some initial considerations are deemed necessary. In the center of the diagram, we
see the Directory Services component, which can be considered the core of IAM. While all IAM
components can in fact be deployed independent from any IAM architecture.
45
• Cluster
• A cluster consists of one or more hosts and a primary storage. A host is a compute node
that runs guest virtual machines.
• Primary Storage
• The primary storage of a cluster stores the disk volumes for all the virtual machines
running on the hosts in that cluster.
• Secondary Storage
• Each zone has a secondary storage that stores templates, ISO images, and disk volume
snapshots.
Open Source Private Cloud Software - OpenStack
• Eucalyptus is an open source private cloud software for building private and hybrid clouds that
are compatible with Amazon Web Services (AWS) APIs.
• Node Controller
• NC hosts the virtual machine instances and manages the virtual network endpoints.
• The cluster-level (availability-zone) consists of three components
• Cluster Controller - which manages the virtual machines and is the front-end for a
cluster.
• Storage Controller – which manages the Eucalyptus block volumes and snapshots to
the instances within its specific cluster. SC is equivalent to AWS Elastic Block Store (EBS).
• VMWare Broker - which is an optional component that provides an AWS-compatible
interface for VMware environments.
1. OwnCloud
A Dropbox replacement for Linux users, giving many functionalities which are similar to
that of DropBox, ownCloud is a self-hosted file sync and share server.
Its open source functionality provides users with access to unlimited amount of storage
space. Project started in January 2010 with aim to provide open source replacement for
proprietary cloud storage service providers. It is written in PHP, JavaScript and available for
Windows, Linux, OS X desktops and even successfully provides mobile clients for Android and
iOS.
46
Another file hosting software system which exploits open source property to avail its
users with all advantages they expect from a good cloud storage software system. It is written in
C, Python with latest stable release being 4.4.3 released on 15th October 2015.
2. Seafile
Seafile provides desktop client for Windows, Linux, and OS X and mobile clients for
Android, iOS and Windows Phone. Along with a community edition released under General
Public License, it also has a professional edition released under commercial license which
provides extra features not supported in community edition i.e. user logging and text search.
Since it got open sourced in July 2012, it started gaining international attention. Its main
features are syncing and sharing with main focus on data safety. Other features of Seafile which
have made it common in many universities like: University Mainz, University HU Berlin and
University Strasbourg and also among other thousands of people worldwide are: online file
editing, differential sync to minimize the bandwidth required, client-side encryption to secure
client data.
3. Pydio
Earlier known by the name AjaXplorer, Pydio is a freeware aiming to provide file
hosting, sharing and syncing. As a project it was initiated in 2009 by Charles du jeu and since
2010, it is on all NAS equipment’s supplied by LaCie.
Pydio is written in PHP and JavaScript and available for Windows, Mac OS and Linux
and additionally for iOS and Android also. With nearly 500,000 downloads on Sourceforge, and
acceptance by companies like Red Hat and Oracle, Pydio is one of the very popular Cloud
Storage Software in the market.
In itself, Pydio is just a core which runs on a web server and can be accessed through any
browser. Its integrated WebDAV interface makes it ideal for online file management and
SSL/TLS encryption makes transmission channels encrypted securing the data and ensuring its
privacy. Other features which come with this software are: text editor with syntax highlighting,
audio and video playback, integration of Amazon, S3, FTP or MySQL Databases, image editor,
file or folder sharing even through public URL’s.
APACHE HADOOP
Apache Hadoop is an open source software framework for storage and large scale
processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level
47
project being built and used by a global community of contributors and users. It is licensed under
the Apache License 2.0.
Hadoop was created by Doug Cutting and Mike Cafarella in 2005. It was originally
developed to support distribution for the Nutch search engine project. Doug, who was working at
Yahoo! at the time and is now Chief Architect of Cloudera, named the project after his son's toy
elephant. Cutting's son was 2 years old at the time and just beginning to talk. He called his
beloved stuffed yellow elephant "Hadoop" (with the stress on the first syllable). Now 12, Doug's
son often exclaims, "Why don't you say my name, and why don't I get royalties? I deserve to be
famous for this!"
The Apache Hadoop framework is composed of the following modules
1. Hadoop Common: contains libraries and utilities needed by other Hadoop modules
2. Hadoop Distributed File System (HDFS): a distributed file-system that stores data on the
commodity machines, providing very high aggregate bandwidth across the cluster
3. Hadoop YARN: a resource-management platform responsible for managing compute resources
in clusters and using them for scheduling of users' applications
4. Hadoop MapReduce: a programming model for large scale data processing
For the end-users, though MapReduce Java code is common, any programming language
can be used with "Hadoop Streaming" to implement the "map" and "reduce" parts of the user's
program. Apache Pig and Apache Hive, among other related projects, expose higher level user
interfaces like Pig latin and a SQL variant respectively. The Hadoop framework itself is mostly
written in the Java programming language, with some native code in C and command line
utilities written as shell-scripts.
48
HDFS and MapReduce
There are two primary components at the core of Apache Hadoop 1.x: the Hadoop
Distributed File System (HDFS) and the MapReduce parallel processing framework. These are
both open source projects, inspired by technologies created inside Google.
49
• HDFS stores large files (typically in the range of gigabytes to terabytes) across multiple
machines.
• It achieves reliability by replicating the data across multiple hosts, and hence does not
require RAID storage on hosts. With the default replication value, 3, data is stored on
three nodes: two on the same rack, and one on a different rack. Data nodes can talk to
each other to rebalance data, to move copies around, and to keep the replication of data
high.
50
What is MapReduce?
Hadoop MapReduce is the data processing layer. It processes the huge amount of
structured and unstructured data stored in HDFS. MapReduce processes data in parallel by
dividing the job into the set of independent tasks. So, parallel processing improves speed and
reliability.
Hadoop MapReduce data processing takes place in 2 phases- Map and Reduce phase.
• Map phase- It is the first phase of data processing. In this phase, we specify all the
complex logic/business rules/costly code.
• Reduce phase- It is the second phase of processing. In this phase, we specify light-
weight processing like aggregation/summation.
Steps of MapReduce Job Execution flow
MapReduce processess the data in various phases with the help of different components.
Let’s discuss the steps of job execution in Hadoop.
Input Files
In input files data for MapReduce job is stored. In HDFS, input files reside. Input files
format is arbitrary. Line-based log files and binary format can also be used.
InputFormat
After that InputFormat defines how to split and read these input files. It selects the files
or other objects for input. InputFormat creates InputSplit. Learn about InputFormat in detail.
InputSplits
It represents the data which will be processed by an individual Mapper. For each split,
one map task is created. Thus the number of map tasks is equal to the number of InputSplits.
Framework divide split into records, which mapper process. Learn about InputSplit in detail.
RecordReader
It communicates with the inputSplit. And then converts the data into key-value pairs
suitable for reading by the Mapper. RecordReader by default uses TextInputFormat to convert
data into a key-value pair. It communicates to the InputSplit until the completion of file reading.
It assigns byte offset to each line present in the file. Then, these key-value pairs are further sent
to the mapper for further processing. Learn about RecordReader in detail.
Mapper
It processes input record produced by the RecordReader and generates intermediate key-
value pairs. The intermediate output is completely different from the input pair. The output of the
mapper is the full collection of key-value pairs. Hadoop framework doesn’t store the output of
51
mapper on HDFS. It doesn’t store, as data is temporary and writing on HDFS will create
unnecessary multiple copies. Then Mapper passes the output to the combiner for further
processing. Learn about Mapper in detail.
HADOOP SCHEDULERS
The default Scheduling algorithm is based on FIFO where jobs were executed in the
order of their submission. Later on the ability to set the priority of a Job was added. Facebook
and Yahoo contributed significant work in developing schedulers i.e. Fair Scheduler [7] and
Capacity Scheduler respectively which subsequently released to Hadoop Community.
Fair Scheduler
• The Fair Scheduler allocates resources evenly between multiple jobs and also provides
capacity guarantees
. • Fair Scheduler assigns resources to jobs such that each job gets an equal share of the
available resources on average over time.
• Tasks slots that are free are assigned to the new jobs, so that each job gets roughly the
same amount of CPU time.
• Job Pools
• The Fair Scheduler maintains a set of pools into which jobs are placed. Each pool has a
guaranteed capacity.
• When there is a single job running, all the resources are assigned to that job. When there
are multiple jobs in the pools, each pool gets at least as many task slots as guaranteed.
• Each pool receives at least the minimum share.
• When a pool does not require the guaranteed share the excess capacity is split between
other jobs.
52
• Fairness
• The scheduler computes periodically the difference between the computing time
received by each job and the time it should have received in ideal scheduling.
• The job which has the highest deficit of the compute time received is scheduled next.
Capacity Scheduler
• The Capacity Scheduler has similar functionally as the Fair Scheduler but adopts a
different scheduling philosophy.
• Queues
• In Capacity Scheduler, you define a number of named queues each with a configurable
number of map and reduce slots.
• Each queue is also assigned a guaranteed capacity.
• The Capacity Scheduler gives each queue its capacity when it contains jobs, and shares
any unused capacity between the queues. Within each queue FIFO scheduling with priority is
used.
• Fairness
• For fairness, it is possible to place a limit on the percentage of running tasks per user, so
that users share a cluster equally.
• A wait time for each queue can be configured. When a queue is not scheduled for more
than the wait time, it can preempt tasks of other queues to get its fair share.
SCHEDULER IMPROVEMENTS
Many researchers are working on opportunities for improving the scheduling policies in
Hadoop. Recent efforts such as Delay Scheduler , Dynamic Proportional Scheduler offer
differentiated service for Hadoop jobs allowing users to adjust the priority levels assigned to
their jobs. However, this does not guarantee that the job will be completed by a specific deadline.
Deadline Constraint Scheduler addresses the issue of deadlines but focuses more on increasing
system utilization. The Schedulers described above attempt to allocate capacity fairly among
users and jobs, they make no attempt to consider resource availability on a more fine-grained
basis. Resource Aware Scheduler considers the resource availability to schedule jobs. In the
following sections we compare and contrast the work done by the researchers on various
Schedulers.
53
Longest Approximate Time to End (LATE)
Speculative Execution It is not uncommon for a particular task to continue to progress
slowly. This may be due to several reasons like–high CPU load on the node, slow background
processes etc. All tasks should be finished for completion of the entire job. The scheduler tries to
detect a slow running task to launch another equivalent task as a backup which is termed as
speculative execution of tasks. If the backup copy completes faster, the overall job performance
is improved.
54
processing is done by a job execution cost model that considers various parameters like map and
reduce runtimes, input data sizes, data distribution, etc.,
Purpose
This document describes how to install and configure Hadoop clusters ranging from a
few nodes to extremely large clusters with thousands of nodes. To play with Hadoop, you may
first want to install it on a single machine (see Single Node Setup).This document does not cover
advanced topics such as Security or High Availability.
Prerequisites
55
• Install Java. See the Hadoop Wiki for known good versions.
• Download a stable version of Hadoop from Apache mirrors.
Installation
Installing a Hadoop cluster typically involves unpacking the software on all the machines
in the cluster or installing it via a packaging system as appropriate for your operating system. It
is important to divide up the hardware into functions.
Typically one machine in the cluster is designated as the NameNode and another machine
as the ResourceManager, exclusively. These are the masters. Other services (such as Web App
Proxy Server and MapReduce Job History server) are usually run either on dedicated hardware
or on shared infrastructure, depending upon the load. The rest of the machines in the cluster act
as both DataNode and NodeManager. These are the slaves.
QUESTIONS
5 MARKS
1. Write short notes on compute services?
2. Explain about storage services.
3. Explain about analytics services.
4. Write short notes on apache hadoop?
5. Explain about content delivery services
10MARKS
6. Explain in detail about deployment and management services.
7. Discuss about identity and access management services.
8. Explain in detail about open source private cloud software.
9. Explain in detail about hadoop map reduce job execution.
10. Discuss about hadoop cluster setup.
UNIT II COMPLETED
56
UNIT III
APPLICATION DESIGN
57
Load
• How can we improve the design to avoid contention issues and bottlenecks? For example,
can we usequeues or a service bus between services in a co-operating producer,
competing consumer pattern?
• Which operations could be handled asynchronously to help balance load at peak times?
• How could we use the platform features for rate-leveling (e.g. Azure Queues, Service
Bus, etc.)?
Availability
• Availability describes the ability of the solution to operate in a manner useful to the
consumer in spite of transient and enduring faults in the application and underlying
operating system, network and hardware dependencies.
• In reality, there is often some crossover between items useful for availability and
scalability.
Conversations should cover at least the following items
Uptime Guarantees
• What Service Level Agreements (SLA’s) are the products required to meet?
• Can these SLA’s be met? Do the different cloud services we are planning to use all
conform to thelevels required? Remember that SLA’s are composite.
Disaster recovery
• In the event of a catastrophic failure, how do we rebuild the system?
• How much data, if any, is it acceptable to lose in a disaster recovery scenario?
• How are we handling backups? Do we have a need for backups in addition to data-
replication?
• How do we handle “in-flight” messages and queues in the event of a failure?
58
Performance
• What are the acceptable levels of performance? How can we measure that? What happens
if we dropbelow this level?
• Can we make any parts of the system asynchronous as an aid to performance?
• Which parts of the system are the mostly highly contended, and therefore more likely to
cause
Security
• This is clearly a huge topic in itself, but a few interesting items to explore which relate
directly to cloud-computing include:
• What is the local law and jurisdiction where data is held? Remember to include the
countries where
• failover and metrics data are held too.
• Is there a requirement for federated security (e.g. ADFS with Azure Active Directory)?
• Is this to be a hybrid-cloud application? How are we securing the link between our
corporate and cloudnetworks?
Manageability
`This topic of conversation covers our ability to understand the health and performance of
the livesystem and manage site operations. Some useful cloud specific considerations include:
Monitoring
• How are we planning to monitor the application?
• Are we going to use off-the-shelf monitoring services or write our own?
• Where will the monitoring/metrics data be physically stored? Is this in line with data
protection policies?
Feasibility
• When discussing feasibility we consider the ability to deliver and maintain the system,
within budgetary andtime constraints. Items worth investigating include:
• Can the SLA’s ever be met (i.e. is there a cloud service provider that can give the uptime
guarantees that we need to provide to our customer)?
• Do we have the necessary skills and experience in-house to design and build cloud
applications?
59
• Can we build the application to the design we have within budgetary constraints and a
timeframe that makes sense to the business?
• How much will we need to spend on operational costs (cloud providers often have very
complex pricing structures)?
Architectural Components
60
Service Deployment
As identified in the NIST cloud computing definition [1], a cloud infrastructure may be
operated in one of the following deployment models: public cloud, private cloud, community
cloud, or hybrid cloud. The differences are based on how exclusive the computing resources are
made to a Cloud Consumer. A public cloud is one in which the cloud infrastructure and
computing resources are made available to the general public over a public network. A public
cloud is owned by an organization selling cloud services, and serves a diverse pool of clients.
Architectural Components Service Deployment As identified in the NIST cloud computing
definition , a cloud infrastructure may be operated in one of the following deployment models:
• public cloud,
• private cloud,
• community cloud, or hybrid cloud.
The differences are based on how exclusive the computing resources are made to a Cloud
Consumer.
• public cloud
A public cloud is one in which the cloud infrastructure and computing resources are made
available to the general public over a public network. A public cloud is owned by an
organization selling cloud services, and serves a diverse pool of clients. Figure 9 presents a
simple view of a public cloud and its customers.
• private cloud
Private Cloud A community cloud serves a group of Cloud Consumers which have shared
concerns such as mission objectives, security, privacy and compliance policy, rather than serving
a single organization as does a private cloud. Similar to private clouds, a community cloud may
be managed by the organizations or by a third party, and may be implemented on customer
premise (i.e. on-site community cloud) or outsourced to a hosting company (i.e. outsourced
community cloudhybrid cloud.
• Hybrid cloud
A hybrid cloud is a composition of two or more clouds (on-site private, on-site community, off-
site private, off-site community or public) that remain as distinct entities but are bound together
by standardized or proprietary technology that enables data and application portability.
61
CLOUD APPLICATION DESIGN METHODOLOGIES
Cloud methodologies
(ii)Virtualization
• The concept of virtualization is to relieve the user from the burden of resourcepurchasesand
installations.
• The Cloud brings the resources to the users. Virtualization may refer to Hardware (execution
of software in an environment separated from the underlying hardware resources), Memory
(giving an application program the impression that it has contiguous working memory,
• isolating it from the underlying physical memory implementation), Storage (the process of
completely abstracting logical storage from physical storage), Software (hosting of multiple
virtualized environments within a single Operating System (OS) instance), Data (the
presentation of data as an abstract layer, Amongst the other important reasons for which the
Clouds tend to adopt virtualization are:
(i) Server and application consolidation – as multiple applications can be run on the same
server resources can be utilized more efficiently.
(ii) (ii) Configurability – as the resource requirements for various applications could differ
significantly, (some require largestorage,some requirehigher computation capability)
virtualization is the only solution for customized configuration and aggregation of resources
which are not achievable at the hardware level.
62
(iii) (iii) Increased application availability – virtualization allows quick recovery from
unplanned outages as virtual environments can be backed up and migrated with no
interruption in services.
(iv) (iv) Improved responsiveness – resource provisioning, monitoring and maintenance can be
automated, and common resources can be cached and reused.
• For ex: DropBox provide free space up to 2GB, Google Drive, Box, Amazon, Apple Cloud
provide free space up to 5GB, Microsoft SkyDrive provide free space up to
7GB[4].Customer have to pay amount according to the plan if they cross the free space
limit.
• Features like maximum file size, auto backup, bandwidth, upgrade for limited space differ
from one provider to another provider like maximum file size in DropBox is 300MB where
as maximum file size in Google Drive is 1TB .
63
Cloud Storage Standards
• Storage Network Industry Association TM published CDMI in the year 2009.This supports
both Legacy and New applications. Cloud storage standards define roles and
responsibilities for archiving, retrieving, data ownership.
• This also provides standard auditing way so that calculations are done in consistent
manner. These are helpful to the cloud storage providers, cloud storage subscribers, cloud
storage developers, cloud storage service brokers .By using CDMI, cloud storage subscribers
can easily identify the providers according to their requirements.
• Even, the CDMI provides common interface for providers to advertise their specific
capabilities so that subscribers can easily identify the providers.
64
Virtual storage architecture
• An important part of the cloud model is , the concept of a pool of resources that is drawn
from upon the demand in small increments .The recent innovation that has made this
possible is virtualization. Cloud Storage is simply the delivery of virtualized storage on
demand.
• This architecture is based on Storage Virtualization Model. It consists of three layers namely
1.Interface Layer, 2.Rule and Metadata Management, 3. Virtual Storage Management. In
Interface Layer, Administrator and users are provided with the interface modes that may
include icommands, client web browsers.
65
DEVELOPMENT IN PHYTHON
DESIGN APPROACH
• Creational Patterns - They describe how best an object can be created. A simple example
of such design pattern is a singleton class where only a single instance of a class can be
created. This design pattern can be used in situations when we cannot have more than one
instances of logger logging application messages in a file. Structural Patterns – They
describe how objects and classes can work together to achieve larger results. A classic
example of this would be the façade pattern where a class acts as a façade in front of
complex classes and objects to present a simple interface to the client. Behavioral Patterns
– They talk about interaction between objects. Mediator design pattern, where an object of
mediator class, mediates the interaction of objects of different classes to get the desired
process working, is a classical example of behavioral pattern This is an initial version of the
book covering the following 7 design patterns:
67
• To work on images, Python has a library i.e Python Imaging Library(PIL) for image
processing operations. The Python Imaging Library provides many functions for image
processing. We performed some basic operations
• using PIL modules.
Python Image Processing Library
• The Python Imaging Library, or PIL for short, is one of the core libraries for image
manipulation in Python programming language and is freely available on internet to
download. Many of the image processing tasks can be carried out using the PIL library such
as image inversion, binary conversion, cropping, writing text on images, changing intensity,
brightness, image filtering, such as blurring, contouring, smoothing and many more.
• Its initial release was in the year 1995. And many versions of PIL are available according to
our operating system.
OneDrive
• OneDrive, previously known as SkyDrive, was rolled out in 2007 as Microsoft’s own
cloud storage platform. It works as part of the Microsoft Office Suite and gives users
5GB of free storage space. Registered students and those working in academia are given
1TB of free storage.
• OneDrive is available for all platforms. You need to have a Hotmail or Microsoft account
but this is very easy to set up. Users can collaborate on, share and store documents.
• OneDrive also gives you offline access to documents so you can always have your most
important documents at your fingertips. It comes pre-installed on all Windows 10
machines and can be easily accessed or downloaded onto other platforms.
68
Egnyte
• Flexible pricing plus a robust interface makes Egnyte an ideal document storage platform
• 15-day free trial
• Excellent integration
• Some loading issues
• Egnyte was founded in 2007. The company provides software for enterprise file
synchronization and sharing.
• Egnyte allows businesses to store their data locally and in the cloud. All types of data can
be stored in the cloud, whilst data of a more sensitive nature can be stored on servers on-
premise. This provides better security.
• Business teams can work how and where they want with an easy to use collaboration
system through their content services platform.
• Egnyte integrates with the more popular industry applications such as Office 365. This
allows remote and internal employees to access all the files they need.
69
Dropbox
• Dropbox is one of the oldest cloud storage providers. It does offer a rather miniscule 2GB
of storage space for free users but this can be increased by up to 16GB through referrals
as well as by linking your Dropbox account to social media accounts.
• To date it is one of the simplest storage providers to use. Dropbox can be installed on
most computers or devices and syncs easily between apps. The app can store almost any
kind of file with no compatibility issues. You can drag and drop files into the desktop app
with ease.
• You can also share files with other users easily through links, even if they don’t have a
Dropbox account.
• You can sign up for Dropbox here
SpiderOak
SpiderOak offers military grade encryption but few collaboration options
• Strong security
• Central device management
70
• Few collaboration tools
• SpiderOak is a collaboration tool, online backup and file hosting service founded in 2007.
The platform allows users to access, synchronize and share data using a cloud-based
server.
• The company places a strong emphasis on data security and privacy. They offer a cloud
storage, online backup and sharing service which uses a ‘zero knowledge’ privacy
environment. This means the client is the only one who can view all stored data.
SpiderOak claim that even they cannot access your files.
MAPREDUCE APP
The MapReduce library is available in two versions: one for Java and one for Python.
The functionality of each is slightly different.
Both libraries are built on top of App Engine services, including Datastore and Task Queues.
You must download the MapReduce library and include it with your application. The library
provides:
71
Features and capabilities
The App Engine adaptation of Google's MapReduce model is optimized for the needs of the
App Engine environment, where resource quota management is a key consideration. This release
of the MapReduce API provides the following features and capabilities:
• Automatic sharding for faster execution, allowing you to use as many workers as you
need to get your results faster
• Standard data input readers for iterating over blob and datastore data.
• Standard output writers
• Status pages to let you see how your jobs are running
• Processing rate limiting to slow down your mapper functions and space out the work,
helping you avoid exceeding your resource quotas
MapReduce Job
A MapReduce job has three stages: map, shuffle, and reduce. Each stage in the sequence
must complete before the next one can run. Intermediate data is stored temporarily between
the stages. The map stage transforms single input items to key-value pairs, the shuffle stage
groups values with the same key together, and the reduce stage processes all the items with
the same key at once.
The map-shuffle-reduce algorithm is very powerful because it allows you to process all
the items (values) that share some common trait (key), even when there is no way to access
those items directly because, for instance, the trait is computed.
The data flow for a MapReduce job looks like this:
Map
The MapReduce library includes a Mapper class that performs the map stage. The map
stage uses an input reader that delivers data one record at a time. The library also contains a
collection of Input classes that implement readers for common types of data. You can also
create your own reader, if needed.
72
The map stage uses a map() function that you must implement. When the map stage runs,
it repeatedly calls the reader to get one input record at a time and applies the map() function
to the record.
The implementation of the map() function depends on the kind of job you are running.
When used in a Map job, the map() function emits output values. When used in a map reduce
job, the map() function emits key-value pairs for the shuffle stage.
Being flexible to allow adjustments
With social media, the feedback from the customers can be obtained in a span of very
short time. Any mistakes in the marketing strategy can be identified very quickly based on the
feedback from the customers and also suitable actions will be taken to rectify it. Being flexible to
allow adjustments also helps the entire process to become agile i.e. to address the ever-changing
customers demand very quickly.
Today, this has been addressed by the arrival of data visualisation tools and customised
ISVs that are built with industry specific templates, improving the user’s experience and
allowing executives to quickly gain access to the latest business information. Added to this have
been the integration of analytics tools and a host of social and collaborative procedures: all of
which can be accessed via mobile devices.
This combination of new technologies is known as Social, Mobile, Analytics and Cloud:
or SMAC for short. Social helps people to find their colleagues, who they can then collaborate
with; mobile provides access to other data sources and the cloud;
73
the cloud contains the information and the applications that people use; and analytics
allows people to make sense of this data. The broad idea of SMAC is that social networks like
Facebook and Twitter can be used for brand building and customer engagement.
Why SMAC is so important for businesses
In the coming years, it is widely expected that there will be three major trends that emerge
and affect not only IT technologies but also the way we do business: all of which will be heavily
impacted by SMAC. These include:
• New working styles: Today in business, both employees and customers expect a style of
content, collaboration and commerce that offers the same “anytime, anywhere” convenience
that they enjoy related to their personal lives with companies such as Facebook and Amazon.
It is expected that there will be an increase in the mobile elite workforce; especially as
wearable devices such as watches and glasses add to user’s options.In terms of SMAC,
business applications will be required to embrace this approach in order to maximise
productivity and convenience.
QUESTIONS
5 MARKS
1. Write short notes on cloud application design.
2. Explain about cloud storage approach.
3. Discuss about map reduce app.
4. Explain about image processing app.
10 marks
5. Explain in detail about reference architecture for cloud application
6. Write notes on design approaches.
7. Explain in detail about cloud application design methodology.
8. Explain the following:
a) Document storage app
b) Social media analytic app
74
UNIT IV
PYTHON FOR CLOUD
PYTHON FOR AMAZON WEB SERVICES
Amazon Web Services
AWS is a cloud platform service from amazon, used to create and deploy any type of
application in the cloud.
AWS is a Cloud platform service offering compute power, data storage, and a wide array
of other IT solutions and utilities for modern organizations. AWS was launched in 2006, and has
since become one of the most popular cloud platforms currently available.
We should have an account in AWS to use aws services. It offers many featured services
for compute, storage, networking, analytics, application services, deployment, identity and
access management, directory services, security and many more cloud services.
We can use Boto3 (python package) which provides interfaces to Amazon Web Services,
it makes us easy to integrate our Python application, library, or script with AWS services.
Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for
Python, which allows Python developers to write software that makes use of services like
Amazon S3(Simple storage service) and Amazon EC2(Elastic Compute Cloud).
75
AWS Elastic Beanstalk is a service for deploying and scaling web applications and
services. Elastic Beanstalk will also run instances (Computing environments) EC2, and it has
some additional components like Elastic Load Balancer, Auto-Scaling Group, Security Group.
Amazon Lambda
Amazon Lambda is a computing service which automatically manages the server. AWS
Lambda executes our code only when needed and scales automatically, from a few requests per
day to thousands per second.
We only pay for the compute time we comsume , and there will be no charge if the code
is not running.
The initial purpose of lambda is to simplify building on-demand applications that are
responsive to events. AWS starts a Lambda instance within milliseconds of an event.
76
• Pricing: GCP leaves all the competition way behind with its highly flexible pricing and
is rightly a leader here
• Scalability: Scaling down can always be an issue with cloud services. GCP lets you
scale up and down with extreme ease
• Custom Machines: With Custom Machine Types you can easily create a machine type
customized to your needs with discount implications up to 50% off
• Integrations: Easily use various API’s, practice Internet of Things and Cloud Artificial
Intelligence
• Big Data Analytics: Use Big Query and Big Data Analytics to carry out plethora of
analytical practices
• Serverless: Serverless is a new paradigm of computing that abstracts away the
complexity associated with managing servers for mobile and API back-ends, ETL, data
processing jobs, databases, and more.
In case you wish to know more about Google Cloud Platform and also get introduced to its
practical aspect then the following video is highly recommended.
Google offers a wide range of Services. In case you wish to know more about Google Cloud
Platform Services this blog talks about it in detail.
You can also watch the below video from our Google Cloud expert, discussing about Google
Cloud Platform.
• Google Compute Engine: Google Compute Engine helps you deliver VM that runs in
Google’s innovative data centers and worldwide fiber network. It lets you scale from
single instances to global and implement load-balanced cloud computing.
77
• App Engine: This PaaS offering lets developers access Google’s scalable hosting.
Developers are also free to access software SDK’s to develop software products that run
on App Engine.
• Cloud Storage: Google Cloud Storage platform enables you to store large, unstructured
data sets. Well, Google also offers database storage options such as Cloud Datastore
for No SQL non relational storage, Cloud SQL for MySQL fully relational storage and
Google’s native Cloud Big table database.
• Google Container Engine: It is a management and orchestration system
for Docker containers that runs within Google’s public cloud. Google Container Engine
is based on the Google Kubernetes container orchestration engine.
78
Tools for Developing Python Applications on Microsoft Azure
This section gives you an overview of the IPython Notebooks and PTVS development
tools.
Using IPython Notebooks The IPython project provides a collection of tools for scientific
computing that include interactive shells, high-performance and easy to use parallel libraries, and
a web-based development environment called the IPython Notebook.
The notebook provides a working environment for interactive computing that combines
code execution with the creation of a live computational document. These notebook files can
contain text, mathematical formulas, input code, results, graphics, videos and any other kind of
media that a modern web browser is capable of displaying.
79
The Design And Architecture Of Mrs
Mrs is a lightweight MapReduce implementation that works well for scientific
computing. It is designed to be simple for both programmers and users. The API includes
reasonable but overridable defaults in order to avoid any unnecessary complexity. Likewise, Mrs
makes it easy to run jobs without requiring a large amount of configuration.
It supports both Python 2 and Python 3 and depends only on the standard library for
maximum portability and ease of installation. Furthermore, Mrs is designed to easily run in a
varienty of enviornments and filesystems. Mrs is also compatible with PyPy, a high-performance
Python interpreter with a JIT compiler that accelerates numerical-intensive programs particularly
well.
Programming Model
As a programming framework, Mrs controls the execution flow and is invoked by a call
to mrs.main. The execution of Mrs depends on the command-line options and the specified
program class. In its simplest form, a program class has an __init__ method which takes the
arguments opts and args from command-line parsing and a run method that takes a job argument.
In practice, most program classes inherit from mrs.MapReduce, which provides a variety of
reasonable but overridable defaults including __init__ and run methods that are sufficient for
many simple programs. The simplest MapReduce program need only implementation map and a
reduce method.
Architecture
• Mrs owes much of its efficiency to simple design. Many choices are driven by concerns such
as simplicity and ease of maintainability.
80
• For example, Mrs uses XML-RPC because it is included in the Python standard library even
though other protocols are more efficient. Profiling has helped to identify
realbottlenecksandtoavoidworryingabouthypotheticalones.
• We include a few details about the architecture of Mrs. Communication between the master
and a slave occurs over a simple HTTP-based remote procedure call API using XMLRPC.
• Intermediate data between slaves uses either direct communication for high performance or
storage on a filesystem for increased fault-tolerance. Mrs can read and write to any
filesystem supported by the Linux kernel or FUSE, including NFS, Lustre, and the Hadoop
Distributed File System (HDFS), and native support for WebHDFS is in progress.
These are the most popular high-level frameworks. Many of them include components
listed on the WebComponents page.
Latest
Latest
Name update description
version
date
82
application in minutes. All with designer friendly templates,
easy AJAX on the browser side and on the server side, with
an incredibly powerful and flexible Object Relational
Mapper (ORM), and with code that is as natural as writing a
function. After reviewing the Documentation, check out the
Tutorials
See below for some other arguably less popular full-stack frameworks
83
DESIGNING A RESTFUL WEB API
So you need to build an API for your website, maybe you need to provide data to a
mobile app you’re working on, or maybe you just want to put the data from your website in a
format that other developers can easily use to build cool things.
But what is an API exactly? An API is just a fancy term for describing a way for
programs (or websites) to exchange data in a format that is easily interpreted by a machine. This
is in contrast to regular websites, which are exchanging data in a format that is easily interpreted
by a human. For a website, we might use HTML and CSS, but for an API we would use JSON or
XML.
In this article I will be focusing on designing an API using the RESTful paradigm. REST
is basically a list of design rules that makes sure that an API is predictable and easy to
understand and use.
Some of these rules include:
• Stateless design: Data will never be stored in a session (each request includes all
information needed by the server and client).
• Self-descriptive messages: Ideally you should be able to understand requests and
responses after spending minimal time reading the documentation.
• Semantics, semantics, semantics: The API should use existing features of the HTTP
protocol to improve the semanticness of input and output (e.g. HTTP Verbs, HTTP Status
Codes and HTTP Authentication)
Output formats
First, let’s talk about output formats. The most important thing to look at when
determining what format your API should output data in is what users of your API would be
using the data for and with.
Maybe you need to support legacy systems where JSON parsing is not feasible and XML
is more desirable, or maybe it makes more sense for you to output data in the CSV format for
easy import into spreadsheet applications. Whichever you choose, it's important to think about
your users and their use cases.
Versioning
This is a very important aspect that is often overlooked. As an API provider, one of your
most important tasks is to make sure that breaking changes will never occur in your API. Making
84
breaking changes will make life difficult for the developers who depend on your service and can
easily start causing frustration when things start to break.
But don’t worry! This is where versioning comes in handy. There are a lot of options for
versioning your API. For example, WePay uses an Api-Version header, and Twilio uses a similar
approach putting the version date in the URL.
URL Structure
• The URL structure is one of the most important pieces of the puzzle. Spending some time to
define the right endpoint names can make your API much easier to understand and also help
making the API more predictable.URLs should be short and descriptive and utilize the
natural hierarchy of the path structure.
• It’s also important to be consistent with pluralization.
QUESTIONS
5 MARKS
1. Explain python for amazon web services.
2. Write short notes on python for windows azure.
3. Write short notes on python package of interest.
4. Explain about Django.
5.Write short on python for map reduced.
10 MARKS
6.Explain in detail about python for cloud.
7.Explain about python for cloud platform.
8.Write notes on python web application frame work.
9. Illustrate designing a RESTful web API.
UNIT IV COMPLTED
85
UNIT V
BIG DATA ANALYTICS
CLUSTRING BIG DATA
• Clustering is an essential data mining and tool for analyzing big data. There are difficulties
for applying clustering techniques to big data duo to new challenges that are raised with big
data. As Big Data is referring to terabytes and petabytes of data and clustering algorithms
are come with high computational costs, the question is how to cope with this problem and
how to deploy clustering techniques to big data and get the results in a reasonable time.
• Big data clustering techniques can be classified into two major categories:
• single-machine clustering techniques and multiple-machine clustering techniques.
Recently multiple machine clustering techniques has attracted more attention because they
are more flexible in scalability and offer faster response time to the users. As it is
demonstrated in Fig. single-machine and multiple-machine clustering techniques include
different techniques:
• Single-machine clustering
o Sample based techniques
o Dimension reduction techniques
• Multiple-machine clustering
o Parallel clustering
o MapReduce based clustering
In this section advancements of clustering algorithms for big data analysis in categories that are
mentioned above will be reviewed.
86
Challenges of big data have root in its five important characteristics:
• Volume: The first one is Volume and an example is the unstructured data streaming in form of
social media and it rises question such as how to determine the relevance within large data
volumes and how to analyze the relevant data to produce valuable information.
• Velocity: Data is flooding at very high speed and it has to be dealt with in reasonable time.
Responding quickly to data velocity is one of the challenges in big data.
• Variety: Another challenging issue is to manage, merge and govern data that comes from
different sources with different specifications such as: email, audio, unstructured data, social
data, video and etc.
• Variability: Inconsistency in data flow is another challenge. For example in social media it
could be daily or seasonal peak data loads which makes it harder to deal and manage the data
specially when the data is unstructured.
• Complexity: Data is coming from different sources and have different structures;
consequently it is necessary to connect and correlate relationships and data linkages or you find
your data to be out of control quickly.
Traditional clustering techniques cannot cope with this huge amount of data because of
their high complexity and computational cost. As an instance, the traditional Kmeans clustering
is NP-hard, even when the number of clusters is k=2. Consequently, scalability is the main
challenge for clustering big data. The main target is to scale up and speed up clustering
87
algorithms with minimum sacrifice to the clustering quality. Although scalability and speed of
clustering algorithms were always a target for researchers in this domain, but big data challenges
underline these shortcomings and demand more attention and research on this topic. Reviewing
the literature of clustering techniques shows that the advancement of these techniques could be
classified in stages.
1. Prescriptive – This type of analysis reveals what actions should be taken. This is the most
valuable kind of analysis and usually results in rules and recommendations for next steps.
2. Predictive – An analysis of likely scenarios of what might happen. The deliverables are
usually a predictive forecast.
3. Diagnostic – A look at past performance to determine what happened and why. The result of
the analysis is often an analytic dashboard.
4. Descriptive – What is happening now based on incoming data. To mine the analytics, you
typically use a real-time dashboard and/or email reports.
88
determine where to focus treatment. The same prescriptive model can be applied to almost
any industry target group or problem.
• Predictive analytics use big data to identify past patterns to predict the future. For example,
some companies are using predictive analytics for sales lead scoring. Some companies have
gone one step further use predictive analytics for the entire sales process, analyzing lead
source, number of communications, types of communications, social media, documents,
CRM data, etc. Properly tuned predictive analytics
RECOMMENDATION SYSTEM
o Recommendation systems have impacted or even redefined our lives in many ways. One
example of this impact is how our online shopping experience is being redefined. As we
browse through products, the Recommendation system offer recommendations of products
we might be interested in. Regardless of the perspective — business or consumer,
Recommendation systems have been immensely beneficial. And big data is the driving force
behind Recommendation systems.
• A typical Recommendation system cannot do its job without sufficient data and big data
supplies plenty of user data such as past purchases, browsing history, and feedback for the
Recommendation systems to provide relevant and effective recommendations. In a nutshell,
even the most advanced Recommenders cannot be effective without big data.
Data collection
• Let us assume that a user of Amazon website is browsing books and reading the details.
Each time the reader clicks on a link, an event such as an Ajax event could be fired. The
event type could vary depending on the technology used. The event then could make an
entry into a database which usually is a NoSQL database. The entry is technical in content
but in layman’s language could read something like “User A clicked Product Z details
once”. That is how user details get captured and stored for future recommendations.
• How does the Recommendation system capture the details? If the user has logged in, then
the details are extracted either from an http session or from the system cookies. In case the
89
Recommendation system depends on system cookies, then the data is available only till the
time the user is using the same terminal. Events are fired almost in every case — a user
liking a Product or adding it to a cart and purchasing it. So that is how user details are
stored. But that is just one part of what Recommenders do.
The following paragraphs show how Amazon offers its product recommendations to a user who
is browsing for books:
• As shown by the image below, when a user searched for the book Harry Potter and the
Philosopher’s Stone, several recommendations were given.
• In another example, a customer who searched Amazon for Canon EOS 1200D 18MP
Digital SLR Camera (Black) was interestingly given several recommendations on camera
accessories.
Ratings
Ratings are important in the sense that they tell you what a user feels about a product.
User’s feelings about a product can be reflected to an extent in the actions he or she takes such as
likes, adding to shopping cart, purchasing or just clicking. Recommendation systems can assign
implicit ratings based on user actions. The maximum rating is 5. For example, purchasing can be
90
assigned a rating of 4, likes can get 3, clicking can get 2 and so on. Recommendation systems
can also take into account ratings and feedback users provide.
Filtering
Filtering means filtering products based on ratings and other user data. Recommendation
systems use three types of filtering: collaborative, user-based and a hybrid approach. In
collaborative filtering, a comparison of users’ choices is done and recommendations given. For
example, if user X likes products A, B, C, and D and user Y likes products A, B, C, D and E, the
it is likely that user X will be recommended product E because there are a lot of similarities
between users X and Y as far as choice of products is concerned.
MULTIMEDIA CLOUD
• Media Cloud is an open-source content analysis tool that aims to map news media coverage
of current events. It "performs five basic functions -- media definition, crawling, text
extraction, word vectoring, and analysis."
• Media cloud "tracks hundreds of newspapers and thousands of Web sites and blogs, and
archives the information in a searchable form.
• The database ... enable[s] researchers to search for key people, places and events — from
Michael Jackson to the Iranian elections — and find out precisely when, where and how
frequently they are covered.
• " Media Cloud was developed by the Berkman Center for Internet & Society at Harvard
University and launched in March 2009.[
91
• Thus, usability studies have turned to be a very vital element in evaluating the application.
Recently, with the emergence of various mobile apps, the role of usability studies extends
the scope of studies to the evaluation of mobile apps as well [1], including the user interface,
and performance.
• This scenario also goes for mobile video streaming apps as well. Researchers develop
various video streaming apps and perform usability test for different groups of users under
different conditions.
Systematic Review
In this paper, the activities to be performed in the facilitation of the process of the
systematic review are: the elaboration of the definition of a search strategy, the selection of
primary studies, the extraction of data, and the implementation of a synthesis strategy
Search Strategy
• In order to perform the search and selection of the usability test metrics for mobile video
streaming apps, articles and
• journals from different online databases were searched. Also, relevant data from the search
results were extracted and finally, the collection of studies for review was listed.
Study Selection
The scope of the review was defined to be the metrics used in usability test in mobile
video streaming apps. Since the scope had been defined clearly before the search process was
carried out, most of the articles and journals found were relevant to the review objective.
However, there were many articles and journals excluded from the search process, based on the
following criteria:
1) The study is only on mobile video apps development,
2) the study presents the usability test on mobile apps without touching on video
streaming apps,
3) the study is not written in English, and
4) the study is a book.
92
STREAMING PROTOCOLS
Basics of streaming protocols
• Streaming of audio and video is a confusing subject. This page is aimed at providing some
of the basic concepts.
• Streaming means sending data, usually audio or video, in a way that allows it to start being
processed before it's completely received. Video clips on Web pages are a familiar example.
• Progressive streaming, aka progressive downloading, means receiving an ordinary file and
starting to process it before it's completely downloaded. It requires no special protocols, but
it requires a format that can be processed based on partial content. This has been around for
a long time; interleaved images, where the odd-numbered pixel rows are received and
displayed before any of the even ones, are a familiar example. They're displayed at half
resolution before the remaining rows fill in the full resolution.
The protocol stack
Streaming involves protocols at several different layers of the OSI Reference Model. The
lower levels (physical, data link, and network) are generally taken as given. Streaming protocols
involve:
• The transport layer, which is responsible for getting data from one end to the other.
• The session layer, which organizes streaming activity into ongoing units such as movies
and broadcasts.
• The presentation layer, which manages the bridge between information as seen by the
application and information as sent over the network.
• The application layer, which is the level at which an application talks to the network.
Most Internet activity takes place using the TCP transport protocol. TCP is designed to
provide reliable transmission.
93
Its specification is available as an Internet Draft. The draft contains proprietary material, and
publishing derivative works is prohibited.
94
AvidBeam Smart Video Transcoding Solution
• AvidBeam has developed a comprehensive and robust solution for optimizing bandwidth for
surveillance systems with limited effect to the video stream quality [2]. Our solution is based
onthe use of a multistage filter pipeline.
• where several filters are used to eliminate unnecessary frames and identify region of interest
before invoking the video transcoder. Consequently, the transmitted bandwidth can be
reduced dramatically without affecting the quality of the important information in the video
frames.
• Clients can enable/disable each filter separately as well as configure each filter according to
their needs.
Those filters are described as follows
1. Frame Filter
The frame filter is used to detect motion in a given frame. The amount of motion to be
detected is configured. The filter passes only frames with motion greater or equal to the
configured motion size. This way, the small variation in each video frame due to external factor
such as wind blow, camera vibration, or small animals or birds moving in front of the
surveillance camera can be eliminated easily.
the experimental results from using motion detection filter before streaming out frames.
As shown in the Figure, when motion detection is enabled, frames with no significant motion are
not transmitted.
95
3. ROI Filter
The purpose of the ROI filter is to identify region of interest in each frame and pass this
information to the transcoder. The ROI information can be used to clip the transmitted frame or
to encode the frame with different quality values for both of the ROI and none-ROI frame
blocks.
4.Video Transcoder
The final stage in the pipeline includes the actual video transcoding. The transcoder
receives the selected ROI together with their proper quality (quantization) settings. Other
transcoding parameters are also selected (resolution, bitrate, frame rate) based on client system
configuration.
CLOUD SECURITY
security, data security becomes more important when using cloud computing at all
“levels”: infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-
service (SaaS). This chapter describes several aspects of data security, including:
• Data-in-transit
• Data-at-rest
• Processing of data, including multitenancy
• Data lineage
• Data provenance
• Data remanence
The objective of this chapter is to help users evaluate their data security scenarios and
make informed judgments regarding risk for their organizations. As with other aspects of cloud
computing and security, not all of these data security facets are of equal importance in all
96
topologies (e.g., the use of a public cloud versus a private cloud, or non-sensitive data versus
sensitive data).
97
certificates, identity management (CRUD), identity federation, creating and managing
XACML-based policies, and providing strong authentication mechanisms.
• All the components within the system are interoperable and act as a security service
providers in order to assure a secure cloud-based system. Figure 2 shows logical components
of our central security system.
• PKI server, also known as Local Certification Authority (LCA) in our system is responsible
for issuing and distributing X509 certificates to all components in a domain. This server can
either be configured as single certification authority, by generating self-signed certificates or
may be linked to PKI in order to exchange certificates and establish trust relationship
98
between various domains. In this case higher level trusted certification authority server
issues certificates to the issuing CA.
Authentication System Security Protection
• SSO service provider interacts with service consumers through request-response message
protocols. All system entities securely store their private keys locally. SAML server issues
tickets according to the decision made by the central authentication server. That is why they
communicate only over trusted internal network.
• At the same time central authentication server communicates with the IDMS and CA servers
over a trusted network. Therefore, the central security system is an isolated secure
environment, where all the system entities trust each other.
AUTHENTICATION
• Authentication System A single enterprise may provide many application services to end-
users. E-mail servers and web servers are examples of application services providers. As
company’s boundaries broaden, the number of application services grows. Mostly all service
providers should authenticate clients before service transactions are executed, because they
are dealing with personal information.
• This means that the client should have security context for each application server and log in
before it can consume any service. The same situation happens when the client accesses
resources in different security domains.
• As mentioned in the second chapter, having many security credentials for authentication
purposes is not an effective solution from security, system coordination, and management
perspectives. While organizations migrate to cloud environments, the same problem still
exists.
• To this problem, as a solution a Single Sign-on (SSO) protocol is proposed, which is part of
the shared security system of a cloud environment. This solution relies on the SAML web
browser SSO profile, which complete description can be found in the following referenced
document.
• The system consists of a SAML server which provides SSO services for application service
providers: SAML server issues SAML ticket which contains an assertion about the client’s
identity verification, thus confirming that it has been properly authenticated or not. Once the
user is authenticated, he or she can request access to different authorized resources at
different application provider sites without the need to re-authenticate for each domain.
99
• SAML server resides in the shared security system. Besides SAML assertions issuing
server, there are three other security entities in the central security system, coordinated with
each other, in order to accomplish the desired solution.
AUTHORIZATION
• As already mentioned earlier, different application services may be hosted in a cloud
environment and may use the same physical resources. However, each application service is
logically separated from others.
• Different types of system entities consume those services; therefore, application service
provider should manage a proper mechanism for access control decisions. This means that
various users, after being successfully authenticated, should request and access those
resources and services for which they are authorized in a particular enterprise security
domain.
• As the number of the services and service consumers grow, management of access control
mechanism becomes more complex and expensive: each service provider needs to
implement independent access control mechanism by means of self-governing security
policies and policy enforcement points.
• Decoupling policies from application services and managing them independently from
application services results in a solution which is more effective for an authorization system.
Applications focus only on system functionality and business value.
• Having a single security policy management point makes the entire authorization system
more flexible and secure, meaning that it can be administered, configured and protected
separately from application services. In this way, it is easy to configure and apply common
policies for every application service in a single security domain.
• Besides, changing a policy becomes very simple because of a single location for policy
management. Protection and auditing of the authorization system is managed separately thus
making it much harder to compromise
• Role-based authorization system is proposed for a cloud environment which is a component
of the central security system. XACML is the main standard adopted for this authorization
system. The system provides authorization services for cloud-based application services.
100
• As Policy Decision Point (PDP) server resides in the central security system. It implements
role-based access control mechanism and provides authorization services to application
service providers within a security domain.
DATA SECURITY
• Data protection is a crucial security issue for most organizations. Before moving into the
cloud, cloud users need to clearly identify data objects to be protected and classify data
based on their implication on security, and then define the security policy for data protection
as well as the policy enforcement mechanisms.
• For most applications, data objects would include not only bulky data at rest in cloud servers
(e.g., user database and/or filesystem), but also data in transit between the cloud and the
user(s) which could be transmitted over the Internet or via mobile media (In many
circumstances, it would be more cost-effective and convenient to move large volumes of
data to the cloud by mobile media like archive tapes than transmitting over the Internet.).
• Data objects may also include user identity information created by the user management
model, service audit data produced by the auditing model, service profile information used
to describe the service instance(s), temporary runtime data generated by the instance(s), and
many other application data..
Security Services:
• The basic security services for information security include assurance of data
Confidentiality, Integrity, and Availability (CIA). In Cloud Computing, the issue of data
security becomes more complicated because of the intrinsic cloud characteristics. Before
potential cloud users are able to safely move their applications/data to the cloud, a suit of
102
security services would be in place which we can identify as follows (not necessarily all
needed in a specific application):
1) Data confidentiality assurance: This service protects data from being disclosed to
illegitimate parties. In Cloud Computing, data confidentiality is a basic security service to be in
place. Although different applications may have different requirements in terms of what kind of
data need confidentiality protection, this security service could be applicable to all the data
objects discussed above.
2) Data integrity protection: This service protects data from malicious modification.
When having outsource their data to remote cloud servers, cloud users must have a way to check
whether or not their data at rest or in transit are intact. Such a security service would be of the
core value to cloud users. When auditing cloud services, it is also critical to guarantee that all the
audit data are authentic since these data would be of legal concerns. This security service is also
applicable to other data objects discussed above.
3) Guarantee of data availability: This service assures that data stored in the cloud are
available on each user retrieval request. This service is particularly important for data at rest in
cloud servers and related to the fulfillment of Service Level Agreement. For long-term data
storage services, data availability assurance is of more importance because of the increasing
possibility of data damage or loss over the time.
103
4) Secure data access: This security service is to limit the disclosure of data content to
authorized users. In practical applications, disclosing application data to unauthorized users may
threat the cloud user’s business goal. In missioncritical applications, inappropriate disclosure of
sensitive data can have juristic concerns. For better protection on sensitive data, cloud users may
need finegrained data access control in the sense that different users may have access to different
set of data. This security service is applicable to most of the data objects addressed above.
KEY MANAGEMENT
• Cloud key management Infrastructure consists of cloud key management client (CKMC)
and cloud key management server (CKMS) [5]. CKMC exits in cloud applications,
104
serving for three fundamental cloud service model, including Software, Platform or
Infrastructure (as a Service).
Key Management
At Cloud Service Provider Side In this approach, keys are maintained at cloud service
provider side. If the key is lost, customer is unable to read data which is present at cloud. Data is
stored in the encrypted form and decrypted by the key to get it in the original form.
Management Of Key
At Both Sides In this technique, key is divided into two parts. One part is stored at user
side and other part is stored at cloud side. If both parts are combined together, it is possible to
retrieve the data properly. Thus, data remains the secure and can be controlled by the user. Thus,
solution is also scalable. Cloud service provider and user do not need to maintain complete key
at Cloud side. If part of the key is lost, data cannot be recovered.
Key Management
At Centralized Server This approach uses asymmetric key approach. Data is encrypted
with the public key stored in key server. Data at cloud side is stored in the encrypted form. The
user accesses the data. This will be decrypted by private key maintained at each user.
Disadvantage of this method is that if key server is crashed, its single point of failure]. that each
user generates public and private keys. Public keys are stored at Key server. Suppose mobile
phone user wants to share data with desktop user. He/She will encrypt the data with public key of
desktop user. Thus desktop user will access data with its private key
105
Group Key Management
For Cloud Data Storage Data is shared in cloud by trusted members of the group. Group
key is established for securing data at cloud side. Group key is formed by the partial keys
maintained at each user. If particular group members want to access the data, group key used to
access the data.
QUESTIONS
5 MARKS
1. Write short notes on clustering big data.
2. Explain about recommendation system.
3. Explain about live stream app.
4. Discuss about video transcoding app.
5. Write notes on authentication.
6. Explain about authorization.
10 MARKS
7. Explain in detail about classification of big data.
8. Write notes on streaming protocol.
9. Discuss about CSA cloud security architecture.
10.explain the following:
a)data security
b)key management
UNIT V COMPLETED
106