S3
S3 stands for simple storage service. It is one of AWS’s longest serving products and is defined by Amazon as a “secure, durable, highly-scalable object storage” system.
S3 is used for storing flat files e.g. text, video, images - files that do not change periodically and have already been encoded. This is not suitable for database storage.
S3 is an object based storage system:
- It is used for storing files and content and allows you to upload files.
- Files can be between 0 bytes and 5 terabytes in size.
- Storage is unlimited
This is in contrast to a block based storage system which is used for installing an operating systems or databases etc.
S3 is a global service and changes your region to global
when selected.
You can access the S3 service by going to to Services -> Storage -> S3
.
When creating an S3 bucket you must give it a DNS compliant name
which simply means the bucket must be valid as a web url.
You can make your buckets publicly available by unchecking Block all public access
when creating your bucket.
Individual files that you add to your bucket are not public by default, even if the bucket they reside in is public.
You can make a file public by:
- Checking the file in the bucket file list.
- Going to
Actions -> Make Public
You can also make a file public by:
- Checking the file in the bucket file list
- Select
Permissions
from the right side menu for the file - Checking the circle under
Everyone
- Checking
Read Object
and clickingSave
You can change your object’s storage class (i.e. Intelligent-Tiering, Glacier etc.) by:
- Checking the file in the bucket file list
- Select
Properties
from the right side menu for the file - Select
Storage Class
from the available panels.
You can turn on transfer acceleration for your bucket by:
- Select
Properties
from the top of the Bucket menu - Scroll down to the
Advanced Settings
- Select
Transfer Acceleration
from the available panels. You can test the comparative speed of transfer acceleration for your bucket here which will upload files to different regions and then show a side by side of different speeds as percentages
Features
- Tiered storage
- Lifecycle Management
- Versioning
- Encryption
- Secure data using Access Control Lists and Bucket Policies.
Buckets
S3 files are stored in buckets which are like folders in the cloud. Buckets can be viewed globally but bucket storage is localised to individual regions.
When creating a new bucket, the bucket name must be unique globally across the entire AWS S3 storage system they use a universal shared namespace. Buckets are available at URL in the format https://s3-eu-west-1.amazonaws.com/<bucket name>
which adds a unique DNS entry to amazon for access to the bucket. This is why the bucket namespace has to be globally unique.
When you successfully upload a file to S3 you will receive a status code 200
which allows you to test if an upload was successful or not.
You can change object permissions for an entire bucket by going to the Permissions
tab of the bucket and then editing the Bucket Policy
section. These policies are defined in JSON
.
You will need to replace the resource location with whatever ARN
you have on the page which points to the bucket you are editing if you are using a bucket policy template.
If you make everything in a bucket public you should see a warning in the buckets section showing Access
as Public
. You can make a bucket public after you have unchecked the Block All Public Access
options.
Hosting Static Content
You can host the contents of a bucket as a static website by going to Properties -> Static Website Hosting -> Edit
and setting your index
and error
page HTML documents as entry points for your site. This will automatically root requests etc. and serve an error page if an extension is not known.
Static S3 websites are serverless and scale with demand.
Objects
S3 is a key value based object store.
An Object in S3 consists of:
Key
: The name of the object e.g.menu.txt
orfunnycat.gif
Value
: The data that makes up the object (as a sequence of bytes)Version ID
Metadata
They also contain subresources consisting of:
Access Control Lists
Torrent
You can make an object public by selecting the object in your bucket’s Object
section and selecting Make Public
from the Actions
drop down.
You can access an object by clicking into the object in your bucket’s Object
section and clicking the Object URL
.
Data Consistency
There is read after write consistency for PUT
on new objects. This means, as soon as you create a new object you will be able to read that data immediately.
There is eventual consistency for overwrite PUT
and DELETE
on existing objects. This means it can take some time for updates and deletes to propagate across the S3 system. After updating you may still be able to read the old version of a file and after deletion you may still be able to access a file for a period of time (up to a second).
Reliability
S3 platform is built for 99.99% availability, however Amazon only guarantee 99.9% availability.
Amazon guarantee 11 x 9 durability of files which is equivalent to 99.999999999%. Durability describes you certain you can be that a file will not be lost once it is uploaded to S3, thus S3 is very reliable in this capacity.
Security
Access Control Lists
An access control list allows you to create access rules on an individual file basis. For example, only allowing certain users to access a file, this is a type of IAM Policy.
Bucket Policies
Bucket Policies allow you to create access rules for entire buckets to a control the security of large sets of data.
Object Policies
Object Policies allow you to create access rules for individual files in a bucket to a control the security of data.
Cloudfront Access
You can give a cloudfront user access to an S3 using a policy that specifies the ARN of the Cloudfront distribution’s access user (not the distribution’s ARN) as well as an Action
with an access method to allow and the ARN of the bucket.
{
"Version": "2008-10-17",
"Id": "PolicyForCloudFrontPrivateContent",
"Statement": [
{
"Sid": "1",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity <CLOUDFRONT_USER_ID>"
},
"Action": "s3:<S3_ACCESS_METHOD>",
"Resource": "arn:aws:s3:::<BUCKET_NAME>/*"
}
]
}
If you want to allow all actions to the S3 bcuket you can use a *
asterisk wildcard character as the access method.
"Action": "s3:*"
Storage Classes
Amazon offers several storage class options using S3. You find a breakdown of the options here.
You can change the storage class of your objects at any time.
S3 Standard
The original S3 storage service.
- 99.99% Availability and 99x11% durability
- Stored redundantly across multiple devices
- Can survive loss of two facilities concurrently
S3 IA
S3 - IA stands for infrequently accessed data which is data that you might only access every 2 -3 months.
- Designed for data that is not used often
- Allows for rapid access when needed (within milliseconds)
- Lower fee than S3 but charges incurred for retrieval of data.
S3 One Zone IA
S3 One Zone IA is similar to IA but limits data to a single availability zone.
S3 Intelligent Tiering
S3 Intelligent Tiering uses machine learning to automatically optimise storage usage for the lowest cost based on how you use the stored files. It will use some combination of the above options for thee optimisation to this end.
S3 Glacier
- Used for storing files that are used very infrequently
- Costing is competitive with on-premises storage solutions
- Retrieval times are configurable for between minutes and hours for retrieval
S3 Glacier Deep Archive
- Amazon’s lowest cost storage solution.
- Retrieval time takes 12 hours + from when file is requested.
S3 Outposts
Outpost users can now use S3 buckets as well.
S3 RRS
- RRS stands for reduced redundancy storage
- Currently being phased out and will not be tested
Charges
Charging for S3 is based on:
- Storage usage
- Requests made
- Storage Management usage
- Data transfer usage
- Transfer Acceleration usage
- Cross Region Replication usage
Transfer Acceleration
Transfer Acceleration is used for speeding up and securing long distance file transfers between users and an S3 bucket. This is achieved by using Amazon’s CloudFront edge locations which are a network of globally distributed centers that extend Amazon’s internal network.
For example, if a user in Australia is trying to upload a file to an S3 bucket in London. Transfer Acceleration will allow them to upload the file directly to a CloudFront edge location in Australia (or whatever is nearest to them) this will then send that file directly across the world to the London S3 storage via Amazon’s internal network (no longer over the public internet) which supports an optimised network that runs very fast.
Cross Region Replication
Cross Region Replication allows for the replicated storage of files across different regions when your primary S3 bucket changes. Changes will be persisted to different regions allowing for disaster recovery.
Encryption
You can change the encryption of your objects at any time.
SDK
Javascript
Link to the basics of the Javascript S3 SDK.
You can access S3 specific scripting functions using the S3
function on the AWS
object.
var AWS = require('aws-sdk');
var s3 = new AWS.S3();
You can access objects from S3 using the getObjects
method on the S3
object. This is an asynchronous function that takes an object with a Bucket
property that takes the globally unique name of the bucket, and a Key
property which is a path to the object within the bucket that you want to access.
You must convert the result of getObjects
into a promise to be able to handle it with await
or other asynchronous protocols.
const getS3Object = async () {
const params {
Bucket: "my-buckets-name",
Key: "file.txt"
};
const data = s3.getObjects(params).promise();
// destructure data here to get file contents back
}
S3 URI
If see <SR URI>
in the S3
AWS CLI documentation this referring to the S3 URI format that comes in the form:
s3://<BUCKET_NAME>
To get an S3 URI to an object in a bucket simply append the object path to the S3 and bucket name base url shown above. The format for this would be:
s3://<BUCKET_NAME>/<OPTIONAL_PATH>/<OBJECT>
And, for a single object:
s3://<BUCKET_NAME>/<OBJECT>
For example if we wanted to get a path to an file.json
in a folder called stuff
in our bucket called mybucket
then the S3 URI would be.
s3://mybucket/stuff/file.json