Cost Management with Cloud Services
Which cost-saving mechanisms take effect in the public cloud?
Successful IT Controlling When Using Public Cloud Services
Many companies are already in the midst of a transformation in the direction of the public cloud, as they have recognized major advantages for themselves compared to traditional IT operations. However, a cloud transformation leads to advantages. Thus, companies are facing new challenges in the areas of business, people, and technology.
This is particularly true for the management of IT costs: outdated approval processes applied to on-premises data centers or outsourcing contracts can conflict with the cloud's rapid market advantages. The deployment of new technologies makes new processes essential. As a result, CIOs and IT organizations need to develop new approaches and processes to manage and optimize their cloud costs.
Cost Management Basics for Public Cloud Services
Source: Own design based on Microsofts Cloud Adoption Framework.
The IT governance concepts known today often only cover the areas of security and governance. In the cloud, the term "governance" is taken a step further and, in many cases, also relates to the management of cloud costs.
Tagging
As part of cloud governance, a tagging strategy is set up that allows costs to be allocated. The defined tags assign costs to cost centers, customers, application owners, or even departments. The tags and costs can thus be evaluated and also displayed later via APIs in PowerBI or SAP. In addition, the assignment can be used to map an automated reposting of the public cloud provider's holistic invoice to the originating cost centers.
For example, the effects of changes in cloud costs can be quickly identified, and countermeasures can be taken. Especially since tags are not technical components at first but can be analyzed in an Excel table by business people. (Further information on tagging.)
Billing and budget management
In addition to the actual billing process, the "Costs" governance area also contains the management of budgets. While tags or even the subscription structure supports the billing process, budget management can make a significant contribution to ensuring that budget limits are adhered to: Using policies, mechanisms can be established which, for example, issue warnings when budget limits are reached or delete or shut down all public cloud services. (The latter is not recommended for production environments).
On the one hand, authorization management can be facilitated via a subscription structure. At the same time, in the case of Microsoft Azure, costs are aggregated and allocated at the subscription level. This makes it possible, for example, to establish an allocation and control of costs at the subscription level. (Further information on billing and budget management.)
If cloud governance is structurally aligned to the company's needs in this way, the question of which cost models the public cloud providers offer is obvious.
Cost Models in the Public Cloud
The top three cloud providers have tens of thousands of virtual machine prices: Instances can vary significantly in price depending on the region in which they run. Older versions of instance families are often more expensive than upgrading to the current version. Even instances with similar CPU and memory parameters can vary widely based on their add-on features. Things get just as complicated with storage, with its many different levels and classes. Selecting memory classes that exceed the required capacity can lead to significantly higher costs.
This makes cloud costs, which at first glance seem straightforward - an hourly fee for a cloud instance or a monthly fee per GB of storage, i.e. quantity times price - much more confusing.
But what mechanisms can be used to optimize public cloud costs?
The main factors for cost optimization are in the application and its operation. If an application is also operated in the public cloud in the traditional way with virtual servers and manual patch management, for example, the only option is often reserved instances or regular rightsizing. However, if Immutable Iaas, Paas services, containers or cloud native services are used to operate the application, the range of mechanisms is significantly greater. The following diagram is intended to illustrate this.
Source: Own design based on Microsoft Virtual Machines and Azure Functions.
In addition to a large number of mechanisms, it is also necessary to consider how great the influence of each mechanism is as well as the maturity of the cloud operating model and the services that are used: As a rule, the greater the influence of individual mechanisms, the higher the maturity of public cloud use must be, since only then can the mechanisms that are actually effective also be implemented in a relevant manner. The following diagram is intended to illustrate an approximation of influence and maturity per mechanism.
Rightsizing
At the beginning of a project, it is often easiest to perform an initial rightsizing. This is to ensure that the selected public cloud instances are suitable for the requirements of the application. Experience shows that both CPU and necessary RAM can be reduced by up to 60% during initial rightsizing.
At the same time, the storage classes can be evaluated: There are not only price differences between different storage classes such as managed disks, file storage, Azure Data Lake storage, and block blobs, but also within these storage classes. For example, the first 50 TB/month of Azure File Storage costs €0.16445 per GB for Premium Storage. For Archive Storage, however, only €0.00152 per GB is charged. Significant cost savings can thus be achieved by choosing the right storage. (More information about rightsizing.)
Automation
In the next step, companies can save costs through initial automation. This essentially involves automating processes to save process costs.
A first step could be to automate the creation of Azure subscriptions or the provisioning of entire sandbox environments so that test scenarios and ideas can be tried out more quickly. This also includes, for example, automatically deleting unused resources. The simplest way to save costs in the public cloud is to remove unused resources. Experience shows that unused resources often continue to be operated because it is not known whether they are still needed.
In the public cloud, such resources can be easily removed - because if there is a need for them again, they are simply provisioned again automatically. Removing unused instances is also important for security, as unused resources can create vulnerabilities.
Reserved Instances
The longer the public cloud is used for different workloads, the more experience can be gained about load peaks or the behavior of the application. Only through longer use and the collection of information on the utilization of individual workloads can initial decisions on reservations be made. Reserved instances offer a significant discount compared to on-demand resource pricing. Rightsizing should always be performed before making a reservation.
In addition to experience with workload utilization, other aspects are also relevant when reserving VMs, for example:
- First of all, the period should be considered for how long an application will continue to use certain services. For example, it is conceivable not to reserve VMs if the application is migrated to higher-quality services such as containers or serverless in the near future.
- Reservation should also be avoided if the application is to be shut down completely in the near future.
The impact of a reservation on costs can be very high - in some cases, cost savings of up to 80% are possible if reservations are carried out correctly. This value is impressive at first, but it is important to note that canceling or downsizing a reservation can result in cancellation fees. So to properly utilize reservations, it is important to know what is actually needed. (More information about reserved instances.)
Deletion and shutdown of services
If the utilization of applications and workloads is known over a longer period of time, planned shutdown and deletion of services can also be considered.
It is important to consider when customers or users use the applications in detail. An online store in the B2C environment, for example, should be available on Sundays, while a B2B platform may be deleted or shut down for entire weekends, nights, and even holidays.
With a higher understanding of the public cloud mechanisms and a high adaptation of the services, the use of now very mature cost management mechanisms becomes possible: One of them is the use of discounted instances. These usually do not have a defined availability, as is the case with other VMs. Therefore, they are not suitable for mission-critical workloads and are not intended to run for more than 730 hours per month. However, for occasional use, they can lead to significant cost reductions.
Discounted Instances
In Azure, B-Series is among these discounted instances. The background to B-Series is that for many workloads running in Azure - such as web servers, small databases, and development and test environments - CPU performance is very erratic. These workloads run for a long time at a small fraction of the possible CPU power and then require the full power of the CPU at peak times due to incoming traffic or required work. The B-Series provides a cost-effective option for the workloads. While the B-Series VMs run in the low points and do not fully utilize the CPU's base power, the VM instance builds up credits. When the VM has accumulated enough credits, usage can "burst" up to 100% of the vCPU for the period of time that the application requires the higher CPU performance. Proper implementation requires great expertise on the application and the mechanism described above. (More information about discounted instances.)
Spot Instances
Spot instances in Azure are similar to the B-Series. Azure virtual spot machines (Spot VMs) can provide discounts of up to 90% compared to usage-based payment pricing on unused Azure compute capacity. No more than the maximum price is paid, which can be optionally set beforehand. Spot VMs are ideal for workloads that can be interrupted and offer scalability at a lower cost. However, unlike B-Series, Spot instances are not continuously and unrestrictedly available. In addition, Spot instances do not accumulate credits. Spot instances are based on the fact that the hyper-scaler has immense excess capacity, which is also to be used by customers. In order to use this excess capacity, they are offered very cheaply - but for a limited time. So workloads are removed when Azure has less excess capacity. Workloads are also removed if the current price exceeds the maximum price that was agreed upon before the VMs were allocated. However, if scheduled events have been subscribed to, a notification is generated 30 seconds before the workload is removed. This instance type is therefore particularly suitable for the following types of workloads: High-performance computing scenarios, batch processing jobs or visual rendering applications, dev/test environments, Big Data applications, analytics applications, and container-based applications, as well as large, stateless applications.
Successful IT Controlling When Using Public Cloud Services
For many companies, cost management of the public cloud is a new discipline that must first be learned. The close integration of controlling with IT is becoming increasingly relevant. Only through close cooperation and interlocking of both the commercial processes and the necessary technical architectures can the large number of mechanisms for influencing public cloud consumption be used to a significant extent for the company.