In this post, I will dig into each common security concern to look at its implication and suggest a set of practices and techniques to tackle the associated risks. I argue than with proper use of existing security technology, running application in the public cloud can often be more secure than running within your own data center.
Legal and Compliance
Country laws, auditing practices, compliance requirement typically evolves pretty slow compare to technology. For example, if the country law requires you to store your data within certain geography location, then there is not much that technology can help. You need to identify these set of data as well as the corresponding application that manipulate them because you have to run them at your data center residing in that particular geography location.
Other than acknowledging that legal practices needs to be modified before enterprise can legally run their application in the cloud, I will not go further into this area because there is no much we can do at the technical level.
Trust the Cloud Provider
First of all, you have to establish some level of trust to the cloud provider. At the very least, you need to believe your cloud provider have no incentive to steal your information. Also, you need to have confidence that they have sufficient security expertise to handle different kinds of attacks. Typically, an extensive dual diligence is necessary to get a deep understand of the cloud provider's operation procedures, underlying technologies being used, expertise/experience of handling various kind of security attacks ... etc.
In the dual diligence session, here are some typical questions ...
- What is the level of physical security that their data center runs on ?
- How much power does their system administrator has ? Can they see all your sensitive information or modify your environment ? e.g. Can the system admin modify the machine image after creation ?
- Is the cloud provider subcontract certain functionalities to another 3rd party provider ? If so, you may need to expand the evaluation to those 3rd parties ?
- Are they have a bigger consequence (legally, financially, or reputation) in case any security compromises happen ?
- Do they have sufficient security expertise and toolset for detecting and isolating attacks ? A good probe is to understand how they tackle DDOS attack ?
- What guarantee do they provide in the SLA and are they willing to share your lost in case of security breach happens ? (almost no cloud provider today provide risk sharing in their SLA)
- In the cloud provider goes out of business, is there a process to get your data back ? How is your existing data disposed ?
- In case any government agencies or legal entities need to examine the cloud provider, how will your sensitive data being protected ?
- How do they control the access of their personnel to the data center environment ?
- In case of security attacks that has caused damage, how and when will you be notified ?
- How do they isolate different tenants running in their shared physical environment ?
- What tools do they use to monitor their operating environment ? How do they diagnose a security attack incident ?
- Are all sensitive operations leave an audit trail ?
On the other hand, cloud providers are more likely to be the target of attacks.
Trust the Underlying Technologies
Since your application will be running in a shared environment, you highly depend on the virtualization technology (e.g. Hypervisor) to provide the fundamental isolation from other tenants. Any defects/bugs of the virtualization technology can leak your sensitive information to other tenants.
Talk to your cloud provider to get a deep understand of their selected technology to make sure it is proven and robust.
Computation / Execution environment
Most likely your cloud provider runs a hypervisor on top of physical machines and therefore the isolation mechanism is especially important. You need to be sure that when your virtual machine instance (guest VM) is started, even the cloud provider's administrator is technically impossible to gain access of any information residing in memory or local disk. And after you shut down your VM, all data in memory, swap device, and local disk will be completely wiped out and unrecoverable.
Data Storage / Persistence
Make sure your data is stored in a highly reliable environment such that the chance of data lost or corruption is practically zero even in event of disaster. To achieve high data resilience, Cloud providers typically create additional copies of your data and put them into geographically distributed locations. An auto-sync mechanism is also involved to keep the copies up to date.
- How many copies of your data are made ?
- Are these copies being placed in independent availability zones ?
- How do they keep these copies in sync ? Is there any window that the data is inconsistent ?
- In case of some copies are lost, what is the mechanism to figure out which is the valid copy ? How is the recovery done and what is the worst possible damage ?
- Is my data recoverable after delete ? If so, what is their data retention policies ?
Network communication
Cloud providers in general do a pretty good job in isolating network communications between guest VMs. For example, Amazon EC2 allow the setup of security group which is basically a software firewall to control who can talk to who. Nevetheless, "security group" is not equivalent to the same LAN segment. The guest VMs (even within the same security group) can only see the network traffic that is directed to themselves but not communication among other VMs. By disallowing access to the physical network, it is much harder for other tenants to sniff your network communication traffic. On the other hand, running any low-level network intrusion detection software becomes unnecessary or even impossible. Also note that multicast communication is typically not supported in the cloud environment. Your application need some changes in case it is relying on multicast enabled network.
Nevertheless, the cloud provider's network admin can get access to the physical network and see everything flowing across them. For highly sensitive data, you should encrypt the communication.
Best Practices
There are some common practices that you can use to improve the cloud security
Encrypt all sensitive information
If your application is running in an environment that you have no control, cryptography is your best friend. In general, you should encryption your data whenever there is a possibility of them being exposed to an environment accessible by someone you don't trust.
- Network Communication: For all communication channels that has sensitive information flowing across, you should encrypt the network traffic.
- Data Storage: For all sensitive data that you plan to store into the cloud, encrypt them first.
The effectiveness of any cryptography technique depends largely on how well you secure your private key. Make sure you put your private key in a separate safe environment (e.g. in a different cloud provider's environment). You can also encrypt your private key with another secret key, or split the secret key into multiple pieces and put each piece in a different place. The basic idea is to increase the number of independent systems that the hacker need to compromise before gaining access to your sensitive information.
On the other hand, you also need to consider how to recover your data if you lost your private key (e.g. the system where your private key is stored has been irrecoverably damaged). One way is to store additional copies of the keys but this will increase chances of it being stolen. Another way is to apply erasure coding technique to break your private key into n pieces such that the key can be recovered by any m pieces where m is less than n.
Build your image carefully
If the underlying OS is malicious, then all the cryptographic technique is useless. Therefore, it is extremely important that your application is run on top of a trustworthy OS image. In case you are building your own image, you need to make sure you start from a clean and trustworthy OS source.
The base OS may come with a configuration containing software components that you don't need. You should harden the OS by removing these unused software, as well as the services, ports and user accounts that you don't need. It is good practices to take a default close policy: Every service is disabled by default and explicit configuration change is needed to turn it on.
Afterwards, you need to make sure each software that you install on the clean image is also trustworthy. The effective trustworthiness of your final OS image is the weakest link, which is the minimum trustworthiness of all the installed software and the base OS.
Never put any sensitive information (e.g. your private key) into your customized image. Even you are think to use your image in a private environment, you may share your image to the public in future and forget to remove the sensitive information. In case you need the password or secret key to start your VM, such sensitive information should be pushed-in (via startup user paremeters, or data upload) by the initiator when it starts the VM.
Over the life cycle of your customized image, you need to monitor the relevant security patches and apply them in a timely manner.
Abnormaly detection
Continuous health monitoring is important to detect security attacks or abnormal workload to your application. One effective way to take a baseline of your application by observing the traffic patterns over a period of time. Novelty detection technqiue can be used to detect sudden change of traffic pattern which usually indicates DDOS attacks.
Keep sensitive data within your data center
The bottomline is ... you don't have to put your sensitive data into the public cloud. For any sensitive information that you have concerns, just keep them and the associated operations within your data center and carefully select a set of sensitive operations to be exposed as a service. Instead of worrying about data storage protection, this way you just concentrate the protection at the service interface level and apply proven authentication/authorization technique.
e.g. you can run your user database within your data center and expose an authentication services to applications running in the public cloud.
Backup cloud data periodically
To make sure your data is not lost even when the cloud provider goes out of business, you should install a periodic backup process. The backup should be in a format restorable in a different environment and also should be transferred and stored in a different location (e.g. a different cloud provider).
A common approach is to utilize DB replication mechanism where data updates are propagated from the master to the slaves in an asynchronous manner. One of the slave replica will be taken offline regularly for the backup purpose. The backup data will be transferred to a separated environment, restored and tested/validated. The whole process should be automated and execute periodically base on how much data lost you can tolerate.
Run on Multiple Clouds simultaneously
An even better approach is to prepare your application to run on multiple clouds simultaneously. This force you to isolate your application from vendor specific features upfront and all the vendor lock-in issues disappears.
Related to security, since each cloud providers has their own set of physical resources, it is extremely unlikely that hackers or DDOS can bring all of them down at the same time. Therefore you gain additional reliability and resiliency when run your application across different cloud providers.
Since your application is already sitting in multiple cloud provider's environment, you no longer need to worry about migrating your application across different provider when the primary provider goes out of business. On the other hand, storing encrypted data in one cloud provider and your key in another cloud provider is a very good idea as the hacker now need to compromise 2 cloud provider's security measures (which is very difficult) before gaining access to your sensitive data.
You can also implement your data backup strategy by asynchronously replicated data from one cloud provider to another cloud provider and provide application services using the replicated data on different cloud providers. In this model, no data restoration is necessary when one cloud provider is inaccessible.