Cloud Computing It is data center resources delivered like tap water. It is always on, and you pay only for what you use.
Microsoft describes Windows Azure as an “operating systemfor the cloud.”
OS -provides computing power => role
-informationhandling => storage
-informationmanagement => diagnostic
WindowsAzure is an “operating system for the cloud” in a similar
sense, because it abstracts away the hardware in the datacenter. Just as a good desktop
operating system abstracts away physical memory and hardwarespecifics, Windows
Azure tries to abstract away the physical aspects of thedata center.
Understanding the Characteristics of Cloud Computing
-The illusion of infinite resources
-Scale ondemand
-Pay-for-play
-Highavailability and an SLA
-Geographicallydistributed data centers
Understanding Cloud Services
-Infrastructure-as-a-Service(IaaS)
-(basichardware as a service,such as virtual machines, load-balancer settings,
and network attached storage): Amazon Web Services (AWS) andGoGrid
-Platform-as-a-service(PaaS)
-WindowsAzure and Google App Engine (GAE)
-Software-as-a-Service(SaaS)
Understanding Windows Azure Features
-Servicehosting
-Servicemanagement
-Storage
-WindowsServer
-Developmenttools
Under the Hood
Security (down to up)
-Physicalsecurity
-OS\DBsecurity
-Networksecurity
-Applicationsecurity
-Humansecurity
What Operating System Does Your Code Run Under on WindowsAzure?
-WindowsServer 2008 x64 EnterpriseEdition
The Fabric
-The fabricitself is a massive, distributed application that runs across all of Windows
Azure’s machines. Instead of having to deal with thousandsof machines individually,
other parts of Windows Azure (and the users of WindowsAzure) can treat the entire
set of machines as one common resource managed by thefabric.
The Fabric Controller
-The fabric controller is often called “the brain” of Windows Azure, and for good
reason: it controls the operation of all the other machines,as well as the services running
on them.
-To helpensure reliability, the fabric controller is built as a set of five to sevenmachines,
each with a replica
-Thisservice model is just a giant
XML file that contains the same elements that yourwhiteboard diagram does. However,
it describes them using well-defined elements such as roles,endpoints,configuration settings, and so on. The service model defines whatroles your service
contains, what HTTP or HTTPS endpoints they listen on, whatspecific configuration
settings you expect, and so on.
What Does a Service Model File Look Like?
-the.cspkgx file generated by the build process is just a fancy .zip file. If
you extract the contents of the .zip file, you can find theactual service model for your
application in the ServiceDefinition.rd file.
How does traffic directed toward foo.cloudapp.net end up ata web role instance?
-A requestto a *.cloudapp.net URL is redirected through DNS toward a virtual IP address
(VIP) in a Microsoft data center that your service owns.This VIP is the external
face for your service, and the load balancer has theknowledge to route traffic hitting
this VIP to the various role instances.
Each of these role instances has its own special IP addressthat is accessible only inside
the data center. This is often called a direct IP address(DIP). Though you can’t get at
these DIPs from outside the data center, these are usefulwhen you want roles to communicate
with each other.
Service Definition
-Thevarious roles used by your service.
-Optionsfor these roles (virtual machine size, whether native code execution is
supported).
-Inputendpoints for these roles (what ports the roles can listen on). You’ll see how
to use this in detail a bit later.
-Local diskstorage that the role will need.
-Configurationsettings that the role will use (though not the values themselves,
which come in the configuration file).
Service Configuration (the configuration file
can be updated without having to stop a running service)
-Number of roleinstances (or virtual machines) used by that particular role
-Values forsettings
How Is This Different from the Service Management API?
-TheService Runtime API thatis meant to be run “inside” the cloud. It gives codealready
running in Windows Azure the right hooks to interact withthe environment around
it. The Service Management API is meant to be run “outside”the cloud. You can use
it to manipulate your running services from outside.
External endpoint and Internal endpoint
The difference between declaring an endpoint that anyone onthe Internet can access
and opening up a port for inter-role communication comesdown to the load balancer.
When declaring an InputEndpoint in your service definitionlike you did earlier in this
chapter, you get to pick a port on the load balancer thatmaps to an arbitrary port inside
the virtual machine. When you declare an InternalEndpoint(which is the XML element
used in a service definition for inter-role communication),you don’t get to pick the
port. The reason is that, when talking directly between roleinstances, there is no reason
to go through the load balancer. Hence, you directly connectto the randomly assigned
port on each role instance.
Subscribing to Changes
**RoleEnvironmentChangedEventArgs
-RoleEnvironment.Changing
-RoleEnvironment.Changed
Let’s walk through the life of a worker role: (inheritingfrom RoleEntryPoint)
-WhenWindows Azure launches your worker role code, it calls the OnStart method.
-Thisis where you get to do any initialization, and return control when you’re
finished.
-The Runmethod is where all the work happens.
-Thisis where the core logic of your worker role goes
-RoleEnvironmentChanging
Service Management API vs storage APIs
The Service Management API and the storage APIs form the twomajor REST API
families in Windows Azure.However, there’s one majordifference between the two.
While the storage APIs are targeted at every Windows Azuredeveloper and will show
up in almost all Windows Azure applications, the ServiceManagement API is meant
primarily for toolmakers
Authentication in the Service Management API is built onSecure Sockets Layer (SSL)
client authentication.
Dealing with Upgrades
-In-PlaceUpgrade
-Automaticversus manual upgrade
-Upgradingspecific roles
-VIP Swap
-The“VIP” in “VIP Swap” refers to “Virtual IP Address,” which is the term WindowsAzure uses for your *.cloudapp.net endpoint.
Hypervisor and Standard User Privileges
All user code in Windows Azure runs on the Windows Azurehypervisor. Also, any
code that runs in Windows Azure runs under the context of a“standard user” (as
opposed to a “user with administrative privileges”).
Windows Azure Storage Characteristics
-Lots andLots of Space
-Distribution
-Scalability
-Inthis context, it means your performance should stay the same, regardless of the
amount of data you have.More importantly, performance staysthe same when load increases.
-Replication
-Consistency
-RESTfulHTTP APIs
-Geodistribution
-Pay forPlay
Windows Azure Storage Services
-WindowsAzure offers four key data services: blobs, tables, queues, and a database.
HTTP Requests and Responses
-Statuscodes
-2xx(“Everything’s OK”)
-3xx
-304
-Thecondition specified using HTTP conditional header(s) is not met
-If the content in Blob Storage hasn't changed since your browser last accessed the content ( HTTP If-None-Match header)
-4xx(“Bad request”)
-headersare incorrect,
-theURL is incorrect
-theaccount or resource doesn’t exist
-5xx(“Something bad happened on the server”)
-unknownerror happened on the server
-serveris too busy to handle requests
-UnderstandingAuthentication and Request Signing
-Usingthe Signing Algorithm
-Youmust construct an HTTP header of the form Authorization="SharedKey
{Account_Name}:{Signature}" and embed the header in thefinal request that goes
over the wire to the storage service.
- StringToSign = VERB + "\n"+
Content-MD5+ "\n" +
Content-Type+ "\n" +
Date+ "\n" +
CanonicalizedHeaders+
CanonicalizedResource;
-Onthe server side, the Windows
Azure storage service goes through the exact same processand generates a signature.
It checks to see whether the signature you generated matchesthe one it computed. If
it doesn’t, it throws a 403 error.
Dev Storage vs. Cloud
• The Dev Storage supports only one account and key(devstoreaccount1 found in
the samples). You must use your configuration files andchange which account
name/credentials you use, depending on whether you’rerunning in the cloud.
• You cannot use the Dev Storage for any sort of performancetesting. Since the
underlying store and implementation are completelydifferent, performance testing
against development storage is next to useless.
• The Dev Storage can be accessed only from the localmachine on which it is run.
You can work around this by using a port redirector so thatmultiple developers/
computers can get access to a shared instance of the DevStorage for testing.
• Sizes and date ranges are different, since the underlyingdata store is SQL Server.
For example, blobs can only be up to 2 GB in the DevStorage.
Using Blobs
-Filesystemreplacement
-Heavilyaccessed data
-Backupserver
-File-sharein the cloud
Mostimportantly, blobs are the only
service for which anonymous requests over public HTTP areallowed (if you choose to
make your container public). Both queues and tables (whichyou’ll learn more about
in the next few chapters) must have requests authenticatedat all times.
-Blockblobs can be split into chunks known as blocks, which can then be uploaded
separately. The typical usage for blocks is for streamingand resuming uploads.
-Page blobsare targeted at random read/write scenarios and provide the backing storefor Windows Azure
XDrive.
Apart from containing blobs, containers serve one otherimportant task. Containers
control sharing policy. You can make containers eitherpublic or private, and all blobs
underneath the container will inherit that setting. When acontainer is public, anyone
can read data from that container over public HTTP. When acontainer is private, only
authenticated API requests can read from the container.Regardless of whether a container
is public or private, any creation, modification, ordeletion operations must be
authenticated.
Using Containers
As you learned previously, containers contain blobs. Butcontainers cannot contain
other containers. Those two short statements pretty muchtell you everything you need
to know about containers.
You often see the word comp in Windows Azure storage’s querystrings.
The comp stands for “component,” and is typically used on asubresource
or a “meta” operation where you’re not directly manipulatingthe object
itself.
initiate a“Create blob” operation and start uploading datathis way (as opposed tousing blocks) means that
you can have only up to 64 MB in the contents of therequests.As a rule of thumb, if your blobs go over a
few megabytes, think about moving them into blocks.
With Windows Azure, you’re using MD5 only to protect against network corruption
In Windows Azure storage, hashing is done using MD5. Youhave two ways to use this:
• When uploading a blob, you can add a Content-MD5 headercontaining the MD5
hash of your blob. The blob server computes an MD5 hash overthe data it receives,
and returns an error (BadRequest) if they don’t match.
• When you create/overwrite a blob, the server sends down anMD5 hash of the data
it has received. You then verify on the client as to whetherthat hash matches the
hash of the data.
Listing, Filtering, and Searching for Blobs
-Class :CloudBlobDirectory
/testpath/hello.txt
/testpath/level1-1/hello.txt
/testpath/level1-2/hello.txt
/testpath/level1-1/level2/hello.txt
You can retrieve theblobs under the level1-1/level2 prefix with the following code,
whichtranslates into a request to /testpath?restype=container&comp=list&pre
fix=level1-1%2flevel2%2f&delimiter=%2fand returns the sole helloworld.txt:
CloudBlobDirectoryblobDirectory =
cloudBlobClient.GetBlobDirectoryReference("testpath/level1-1/level2");
varenumerator = blobDirectory.ListBlobs();
EnumerateResults(enumerator);
Using Blocks
Blocks were
designed because users often wanted to do things to parts ofan individual blob. The
most common scenario is to perform partial or paralleluploads.
Understanding Page Blobs
For more than a year, block blobs were the only form ofblobs available in Windows
Azure storage. One drawback with block blobs is randomaccess
-A largepart of the motivation for page blobs is Windows Azure XDrive,
Windows Azure XDrive
One of the downsides of using the Windows Azure blob servicefor existing applications
is to have to rewrite filesystem access code. Almost allapplications access the filesystem
in some way—whether via the local disk or shared networkstorage. Replacing all that
code to use Windows Azure blob storage API calls may not bepossible.
Understanding the Value of Queues
Think about your last trip to your local, high-volume coffeeshop. You walk up to a
counter and place an order by paying for your drink. Thebarista marks a cup with your
order and places it in a queue, where someone else picks itup and makes your drink.
You head off to wait in a different line (or queue) whileyour latte is prepared.
ADO.NET Data Services Primer
-ADO.NETData Services shipped along with .NET 3.5 Service Pack 1. It enables people
to expose data via web services accessed over HTTP. The datais addressed through a
RESTful URI.
-DataServiceContextand DataServiceQuery
-DataServiceContextis essentially used for state management. HTTP is stateless, and
the server doesn’t “remember” the status of each clientbetween updates. DataService
Context layers on top of the HTTP stack to support changetracking. When you make
changes to data in your client, the changes are accumulatedin the DataServiceCon
text. They’re committed to the server when you callSaveChanges. DataServiceCon
text also controls conflict resolution and merge strategies.
-DataServiceQuerygoes hand in hand with DataServiceContext. In fact, the only way
you can get your hands on an instance of DataServiceQuery isthrough DataServiceCon
text. If you imagine DataServiceContext as the localrepresentation of a service, Data
ServiceQuery would be analogous to a single query on thatservice. The two key methods
on DataServiceQuery—CreateQuery<T> andExecute<T>
Why use the SharedKeyLite authentication scheme?
Why not usethe
same authentication scheme as blobs and queues? This stemsfrom the
way ADO.NET Data Services is implemented. In blobs andqueues,
signing takes place as the last step, and has access to allthe headers.
However, ADO.NET Data Services doesn’t give access to allthe headers
through a “hook” that could let the same kind of signinghappen. Hence,
a variant of the standard authentication scheme was devisedfor use in
the Table service.
Why Have Both Partition Keys and Row Keys?
In general, partition keys are considered the unit of distribution/scalability, while row
keys are meant for uniqueness.
Common Storage Tasks
-ExploringFull-Text Search
-Documentsand terms
Let’sbecome familiar with two terms that are common in the FTS world:
Document
Adocument is the unit of content that is returned in your search results.
-TermA document is made up of several terms.
-Casefolding and stemming
-youmust transform your documents by case-folding into a standard
case, either all uppercase or all lowercase
-Theright thing to do is to convert every word into its root form through a processcalled
stemming. During indexing, you convert every word into astemmed version.
-Invertedindexes
-ModelingData
-One-to-Many
-Thefirst way is to store all the OrderIDs for a given Customer as a property ofthe Customer
object as a serialized list.
-Abetter model is to add a helper method to the Customer entity class to look upall
Order entities associated with it.
publicIEnumerable<Order> GetOrders()
{
returnfrom o in new CustomerOrderDataServiceContext().OrderTable
whereo.PartitionKey == this.ID
selecto;
}
-Many-to-Many
-Howdo you now represent the relationship between Friend and Group? The best way
to deal with this is to create a separate “join” table thatcontains one entity per one
friend-group relation.
-MakingThings Fast
-SecondaryIndexes class
-Thekey to secondary indexes is to ensure that they’re in sync with the main tableat all
times.
-EntityGroup Transactions
-UtilizingConcurrent Updates
-ETags
Protecting Data in Motion
SSL is built on two core concepts: certificates andcertification authorities.
A certificate (an X.509v3 certificate, to be specific) is awrapper around a public key
and a private key, installed together on the web server.When the browser contacts the
web server, it gets the public key, and uses some standardcryptographic techniques to
set up a secure connection with data that only the servercan decrypt. That is sufficient
to ensure that you are securely talking to the server,buthow do you tell that the server
itself can be trusted? How do you know thathttps://www.paypal.com is actually the
company PayPal?This is where a certification authority (CA)comes in.
Protecting Data at Rest
Now that the transfer of data over the wire is secure, thenext step is to secure the data
when it has reached Microsoft’s servers.
The first is to encrypt data so that only the original usercan
decrypt it. The second is to digitally sign the encrypteddata so that any modification
is automatically detected.
Uploading Efficiently Using Blocks
The straightforward
way to back up encrypted data to the cloud is to initiate a“Create blob” operation and
start uploading data.
However, there are two downsides to doing things this way.First, uploads are limited
to 64 MB with a single request. Backups of huge directorieswill often be larger than
64 MB. Second, making one long request means not only thatyou’re not making use
of all the bandwidth available to you, but also that you’llhave to restart from the
beginning in case a request fails.
SQL Azure
A big selling point for SQL Azure is that it is just SQLServer in the sense that, if you
have existing code and tools written against SQL Server, itis an easy (and natural)
migration path.
Windows Azure Tables Versus SQL Azure
Windows Azure tables are optimized for large-scale, cheap
data access. However, Windows Azure tables do not supportstandard relational
database management system (RDBMS) features such asreferential integrity, the SQL
language, or the ecosystem of tools around SQL Server. SQLAzure, on the other hand,
is meant to extend your existing investments in SQL into thecloud.
One good rule of thumb is this. If you absolutely need therelational model, or if you
have a lot of legacy code already using SQL, go for SQLAzure. If you can denormalize
your data, look hard at Windows Azure tables. Also, look atthe price differences between
the two, because they have very different pricing models.
Tips and Tricks
Since SQL Azure is a remote service, you might need to tweakthe way you call SQL in
your code to ensure that you’re not affected by latency ornetwork failures. Also, given
the size limitations (1 GB/10 GB), you must partition datadifferently. Following are
some tips and tricks for moving to SQL Azure:
• Connections could drop when talking to SQL Azure moreoften than when talking
to SQL Server. When you detect a closed connection,reconnect immediately. If
you still get a connection failure, back off (10–15 secondswould be a good period
of time) and then try again. If you continue gettingfailures, check the portal to see
the health of the service and your database.
• Use pooled connections. By reusing connections as much aspossible, you avoid
expensive re-login logic.
• Make communication with SQL Azure as “chunky” as possible,as opposed to
“chatty.” Since SQL Azure is a remote service, networklatency plays a huge part
in query performance time. Using stored procedures andlimiting round trips to
the server can greatly aid in performance.
• Partition your data into small pieces. This lets you fitthem into SQL Azure’s
database sizes easily, and also, it lets the serviceload-balance you effectively. It is
easier to do this upfront, rather than having to do thisafter hitting size limits.