AWS launched yet another storage mechanism today – Elastic File System. This adds on to an ever increasing list of storage solutions, so I wanted to write down my notes on when to use what.
Instance Storage on EC2 – Magnetic & SSD
Use for: When you need high IOPS without the cost, and you are willing to write your own backup mechanism
This is what you’d call “hard drive” on your laptop. It will have the best I/O performance amongst all options. On the flip side, this storage is “ephemeral” – if the machine is terminated, all data is lost.
AWS is moving away from instance storage. Or at least, it wants most of its customers to use EBS. For people who really want a lot of storage and IOPS, there are more expensive machines that are optimized to use instance storage.
There are two flavors of storage – Magnetic and SSD. Magnetic is the regular hard-drive. It has a terrible performance for random I/O, but sequential I/O is very fast. So, if you read a file from start to end, or if you keep appending to the end of the file – you’ll get great I/O performance.
SSD – solid state devices – are newer technology, don’t have moving parts. Its more expensive than magnetic storage, but has much better random access I/O.
Elastic Block Storage, or EBS
Use for: General purpose usage, the replacement for a hard drive.
Think of this as an “external hard drive” for extra storage.
This storage is mounted on the operating system, so to the OS it looks like a regular OS. Because it is elastic, you can add more capacity pretty easily. And because it is an “external hard drive”, if your server crashes, your data isn’t lost.
Just like an “external hard drive”, you can attach an EBS volume to only one server. Two servers cannot write to the same EBS volume. However, you can unmount from one server and attach it to another server anytime.
Because it is farther than the computer, I/O performance is not as good as of instance store. However, AWS has Provisioned IOPS, in which you can pay Amazon to get guaranteed performance.
Elastic File Storage
Used for :
- Migrating a legacy application that needs access to a shared drive across app servers.
- New applications – only use it as an immutable, read-only drive. For example, to load builds on all servers.
This is a network attached storage or NAS. It is a shared hard drive, and more than one server can write to this.
NAS is very common in enterprise software. When migrating an existing software to the cloud, Elastic File Storage becomes an obvious choice.
This was launched very recently, so not much information as yet.
S3 is an object store and is accessed via an HTTP based API. Which means you don’t need a server to access S3; you can upload/download files from a browser or a client-side application.
To the operating system – Instance Store, EBS, and EFS – all appear as a regular file system. But in case of S3, the operating system does not have any knowledge of S3. It is just a remote HTTP server, and the application program is supposed to connect to it. You cannot mount S3 to a local file system (well, you can write wrappers, or use some existing providers, but they are hacks, and best avoided). Put another way, you can host a database on instance store/EBS/EFS; but you cannot instruct a database to use S3 for storage.
S3 has two modes – regular, which guarantees 99.9999% durability, but is more expensive. There is a reduced durability option, in which S3 doesn’t maintain as many backups; this is cheaper than regular S3.
Glacier is a long-term storage. It is meant for data that needs to be stored for regulatory reasons, and you do not expect to fetch again quickly.
Access to data in Glacier is SLOW. It can take ~3 hours to get the data. Plus, you are not expected to download the data very frequently – if you do that, there are charges as well.