The day 2 keynote was not as exciting as the day 1 keynote – no Satya, no Gu, no Hanselman. But don’t get me wrong – there was still some excitement in the air. We got another glimpse of the HoloLens, but also got a look at a lot more code. There was an extended session on building Windows universal Apps – Microsoft’s version of write once, run everywhere. This is a bold promise, and I don’t feel it’s reality.
Lessons from Scale: Building Applications for Azure
Mark Russinovich, CTO for Azure, held a session on developing applications for Azure, with an intent and mindset for scale. Mark shared tips and tricks for designing systems for the cloud. The basics include automation, scaling out, the need for testing in production, and deploying early and often.These basics were not covered, but Mark focused more on traps you can get into from an architecture standpoint.
The value Mark’s session brought were real-world customers scenarios based upon feedback and discussions with users.
The first example was from BlinkBox, a movie streaming service in the UK. They started using PaaS, with WCF Binary Remoting between the Web Roles to the backend. They also built a custom caching layer for WCF. Some troubles they had were with this caching layer. Initially, if caching retrieval took longer than 2 seconds, they would query directly to the movie metadata database. What tye found is when maintenance activities occurred, the database was hit so hard, it fell over. To balance this risk, they added an additional layer of caching on the front web-end.
This next case was for a company that does election tracking in the US. The customer created a service for reporting live tally of national, state, and local elections. They were able to weather local and state elections, but at the national level, they had some concerns heir system could not handle the load. Their question was, “would our architecture scale?”.
Their architecture started with precincts, the precincts upload local results into blobs. The blobs uploaded data into SQL Azure. Web Roles then hit the SQL Azure database when they need results. To disaster proof their deployments, they had duplicate environments, with a work role that converted the blob data to SQL Azure pushing to both SQL Azure databases. Traffic manager was used to balance traffic between the two environments, allowing instant switching, as needed.
They found their solution caused 10 SQL queries per request. At peak, they were expecting 16K SQL queries per second. SQL Azure was limited to far fewer requests per second. To solve their issue, they implemented Azure Cache on the Web Roles. Today, in Azure, Redis would be used.
The lesson learned here is to leverage caching heavily at every layer.
Lift and Shift
This customer case was a lift and shift, but during the migration, they wanted to leverage all of the solutions. They started by moving the ASP.NET code to a Web Role, application logic moved to a Worker Role, and SQL moves to IaaS SQL. Their initial results were drastically reduced. It performed half as well as the on-premise solution.
Azure CAT came in to investigate and they found several things: when moving to Azure, the separation of web and app server roles caused additional inter-process communication. They also found the default SQL IaaS configuration uses the same storage space. After changes to these, they were able to exceed their on-premise performance.
Data Upload to Azure DB
This business case leveraged hybrid PaaS architecture to pull data from client endpoints to feed the data into an analytics system to provide targeted ads to the users. Their arch. is a local on-premise data sources that imported data into Azure Blobs. A worker role ingested the blobs and loaded the data into SQL Azure. They also needed a 7-day trailing aggregate view of the data. HDInsight was used for this data. The challenge was their data was around 40GB of data daily. They wrote a custom ETL process to SQL Bulk Copy into SQL Azure. Unfortunately, the time to run this was 37 hours.
How did they address this? They moved to SQL Azure Premium, they parallelized the data upload, scaled out the worker roles by 8x times, and they created a SQL Azure databases per day, instead of 1 massive database. An aggregator then looked over the entire multi-database architecture to bring all the data together. The results were a 3 hour upload.
Telematics Car Company
This case study was a customer that collected car data and tracked it on a large school. Their solution pumped data to the cloud via the service bus, with worker roles that processed the service bus queues. Notification worker roles sent messages, and a database worker roles to write to the database
Their fix was to change the service bus queue to read multiple messages at once, and process them in parallel. The lesson here is to process multiple items at once, asynchronously.
This case study was about a company that reports thermostats reporting data to Azure. Building managers can look at the data and decide whether they are spending too much on energy based upon the current temperatures.
The architecture was thermostats that were process by a worker role that wrote to a SQL Azure database. They worker role also sent data to several queues for emailing and notifications. Another web role was used for mobile device API access to the database. Initial load tests could handle approx. 1000 requests per second. The solution did not scale well. The biggest issue was a synchronous HTTP handler. They first changed “interactive”queries to synchronous. The second thing changed was to batch SQL updates. The final change was to upgrade the SLQ Azure database level (only for hot tables). They also moved to multiple Azure queues (which is a best practice – partition them by purpose). Their results of these changes allowed for over 2500 requests per second.
The lesson here is to leverage cloud services. It’s easy to spin up and down services. Create multiple – you’re only billed on consumption.
Smart Card eCommerce Platform
This vendor authenticated smart cards. Their problem was their web role randomly crashed on a consistent basis. The web role had some code that would respond to a button click that would spin up a new thread. The problem was the thread had no global exception handler. As a result, when the request would throw an exception, it would crash the entire web role.
Their solution was to decouple the backend from the front end and then put in exception handling.
Their solution architecture was a typical front end with a middle tier that talked to the backend. The problem was their middle tier was very chatty, their SQL Azure database held XML data that was very chatty, and their OLTP and reporting was done on the same database instance.
Their solution was to avoid the database by moving the XML processing to the middle tier. They also leveraged geo-replication and moved reporting to this geo-replicated database. They also changed database queries to consolidate queries. They saw 10x performance increases.
A company was doing photo sharing and image processing in the cloud. The first release had a limit of 50 queries per second. The target was 7000 per second. Their problem was all processing was done in a single web role. The Azure team separated the application into separate web and worker roles, image processing went to a blob storage (instead of SQL Azure), they also added caching at multiple layers.
This was an amazing session, and really gave me insight into the architectures of several real world Azure-based products.
Building Highly Scalable and Available SaaS Applications with SQL Azure
This session discussed new features and functionalities with Azure SQL. Azure SQL is built for SaaS and enterprise applications. As a service, it has 99.99% service availability, with the ability to be geo-replicated between regions. As a database, it is fully compatible with SQL Server 2014 and built upon industry standards of security, protection, and compliance capabilities, with a goal of self-service management and ease of development.
A key characteristic and concern of SaaS applications is data segregation of customers. Each customer of the application typically want their data to be separate from other customers. You don’t want someone accidentally breaking out of one customer’s area, ultimately giving them access to someone else’s data.
One strategy for multi-customer databases, is to share a database server, with all queries filtering by a particular customer identifier. this is a pattern that is successful for some. But, why separate each customer into their own database? This question poses several issues around the management of databases, and aggregate system queries. However, there are some advantages – think of data recovery – you’re only concerned with single customers at a time. Some new features of Azure allow you to manage these databases and query them in unison in a simplified way.
One of the first challenges when constructing a multi-database solution is determining which customer belongs to which database. How do you route the correct customer to the correct database (in code)? The Azure SQL team recommends using a “catalog” database that holds routing information for customers. The team created reference code and documentation on how to do this via the Elastic Database Tooling as a NuGet package. These tools have 2 major components: Multi-database management and data-dependent routing. One problem is “catalog” database bottlenecks, as every request needs to talk to this database. The client libraries provided in this Nuget package explicitly use caching to prevent this potential bottleneck.
The next challenge for SaaS applications is unpredictable performance levels, based upon unexpected load. Some customers may have busy times on weekends, others weekdays. The introduced a new concept for Azure SQL – a DTU. It’s like horsepower for a database. It’s a relative measure of performance on 4 dimensions (memory, read capacity, write capacity, and compute). These are relative measures to help us understand the different levels of service. A DTU measures the largest of those 4 metrics at any time. So a database using 10 DTUs of memory, and 5 of the other metrics is considered to be a 10 DTU database. With this model, the Azure SQL team can help you to determine the right size of database you may need.
You can scale databases to different levels, and the databases stay online during scaling periods; however, it may take some time to occur, so you should plan for your particular capacity needs.
Elastic Database Model
The concept is creating a pool of resources that all of your databases can leverage. So, instead of giving all customers over-provisioned resources, you can group databases into a pool and have them share resources. Database can be moved in and out of pools, while remaining online. Monitoring and alerting is available on the database pools for common events (over capacity, sudden bursts, etc.). Databases within a pool can individually burst to a certain level, but they’re restricted from consuming the entire DTU available within the pool.
What does this mean? Multiple pools, multiple database, each with bursting characteristics. More efficient DTU usage and no need to manually swap database tiers to respond to demand. Overall, this is more economical for businesses.
When you add databases to pools, there is an advisor that helps you select similar databases (from usage, DTUs, burst profiles, etc.) .A recommended pool size is also selected for you. When creating the pool, you can select the max burst DTUs and minimum allocated DTUs for each database. All of the recommendations are based upon historical performance of your databases, which is a nice feature to help guide you in pooling databases.
Querying Across Database Sets
The next obvious challenge is querying across these databases in a data tier. How complicated is it to manage the connection strings and do the manual aggregation? To address this challenge, elastic database query was created. You register underlying databases as part of a pool, and you can query in the elastic database query to run a single query that is spread across your database simultaneously.
Elastic Database Query leverages information within a catalog database to execute the queries. It’s simple to setup with 2 new TSQL DDL statements. The first statement is for creating an external data source, which points to a centralized catalog database that provides metadata. The second statement is an “external table”, which is not a real table, but a virtual representation of a all data “unioned” from the contributing databases. The external table creation statement simply defines the schema. Bottom line, you connect to a single database, run normal TSQL, and behind the scenes manages the brokering of these queries across all databases.
For management of these pooled databases, you can run elastic database jobs to also perform database maintenance. You can give Azure a SQL script to run against all databases in your pool.
With these new enhancements, the Azure team believes the Azure SQL platform can support global database-as-a-service workloads, predictable and unpredictable data load needs, and provide simple self-service management capabilities. This was an amazing session.
Thinking in Containers: Building a Scalable, Next-Gen Application with Docker on Azure
Let’s start form here: DevOps has been a whale of a problem. Picture Moby Dick – it’s like developers and operations are in a boat and Moby Dick has destroyed us. The reality is: VMs go down, networks go down, builds break. It’s a developer’s life. You need to accept this and we should be building robust systems.
Docker is the newest “whale” on the block, exploding over the past two years as an enabler of micro-service architecture. Docker has been VERY successful, and Microsoft has been very publically supportive of Docker.
Docker is about thinking in smaller parts. The old way of thinking is an application is one large box. The problem is that when your big VM application goes down, there’s a lot of hurt. In the past, the decisions were to scale UP or scale OUT. The popular opinion (at this time) is to scale out. The scale out model is to take what you have right now, duplicating that multiple times. Upon further looking into your application, you may have multiple functional units (i.e., web front end, api layer, etc.) The question that is really required, what if you only need to scale out a portion of your application. scaling out by VM isn’t necessarily the way to go.
When thinking about Docker, you need to start thinking about images. A Docker image may be the web front end image, and an API image. Once you have an image, you can spin up multiple containers to the image. Finally, you deploy the container to a VM.
This session had some promise, but I really felt I missed the “journey” of why I may containerize my database layer and have several different databases that are copies. This is a fast-moving paradigm, so I’m going to keep my eye on Docker.
Managing Cloud Environment and Application Lifecycle Using Azure Tools and Visual Studio Online
A problem currently exists: in Azure, all of your resources are individuals. A VM, a virtual network, a SQL Azure database. The problem is you don’t know which resources as associated with each other. Enter Azure Resource Manager (ASM).
ASM is a lifecycle container, providing declarative configuration (via code that can be checked in). Support is limited right now, but in Preview there is support for VMs, Networks, and Storage.
ASM uses resource groups are tightly coupled containers of multiple resources of similar or different types. A resource group allows you to provision all the resources in a group via a template, and forces a “linked” feeling.
How this Integrates with Visual Studio
Within Visual Studio, you can create a Resource Group Project that defines a template for deploying a resource group to Azure. You can store Powershell scripts used to provision VMs, Web Sites, etc. to Azure. This can then be customized and then stored in source control along side your code.
Azure Dev/Test Labs
This is a new feature (and hidden gem of the conference), which allows an organization to place parameters around an area of Azure quotas, policies, parameters, which limit what your team can do within Azure. Once a lab has been established, the parameters dictate the behaviors that are allowed.
The current thinking around labs are having a bucket of available resources, with Active VMs and Available (pre-created) VMs. On a nightly basis, several VMs are automatically provisioned with new software versions. You have the ability to “claim” an available VM and keep it for yourself. You can also manually create additional VMs.
For management of the lab environment, you can configure VM quotas (# per user, $ target by month), scheduled shutdowns of all VMs (and idle shutdown), available artifacts (artifacts are pre-installed components, i.e., VS, Office, SQL Server, etc.), allowable VM sizes, source control providers for saving the VM template definitions, roles and users, permissions, and many more settings.
This new functionality is going into private preview soon, with the purpose of self-service while adhering to policies for an organization. This also allows templates of process – create once, use everywhere, by everyone.
This new feature is also fully integrated into the new TFS Release Management changes as an action, listed as the “Azure Dev/Test Lab Provisioning” action. Very exciting.