S3 - dpwdec

S3

S3 stands for simple storage service. It is one of AWS’s longest serving products and is defined by Amazon as a “secure, durable, highly-scalable object storage” system.

S3 is used for storing flat files e.g. text, video, images - files that do not change periodically and have already been encoded. This is not suitable for database storage.

S3 is an object based storage system:

It is used for storing files and content and allows you to upload files.
Files can be between 0 bytes and 5 terabytes in size.
Storage is unlimited

This is in contrast to a block based storage system which is used for installing an operating systems or databases etc.

S3 is a global service and changes your region to global when selected.

You can access the S3 service by going to to Services -> Storage -> S3.

When creating an S3 bucket you must give it a DNS compliant name which simply means the bucket must be valid as a web url.

You can make your buckets publicly available by unchecking Block all public access when creating your bucket.

Individual files that you add to your bucket are not public by default, even if the bucket they reside in is public.

You can make a file public by:

Checking the file in the bucket file list.
Going to Actions -> Make Public

You can also make a file public by:

Checking the file in the bucket file list
Select Permissions from the right side menu for the file
Checking the circle under Everyone
Checking Read Object and clicking Save

You can change your object’s storage class (i.e. Intelligent-Tiering, Glacier etc.) by:

Checking the file in the bucket file list
Select Properties from the right side menu for the file
Select Storage Class from the available panels.

You can turn on transfer acceleration for your bucket by:

Select Properties from the top of the Bucket menu
Scroll down to the Advanced Settings
Select Transfer Acceleration from the available panels. You can test the comparative speed of transfer acceleration for your bucket here which will upload files to different regions and then show a side by side of different speeds as percentages

Features

Tiered storage
Lifecycle Management
Versioning
Encryption
Secure data using Access Control Lists and Bucket Policies.

Buckets

S3 files are stored in buckets which are like folders in the cloud. Buckets can be viewed globally but bucket storage is localised to individual regions.

When creating a new bucket, the bucket name must be unique globally across the entire AWS S3 storage system they use a universal shared namespace. Buckets are available at URL in the format https://s3-eu-west-1.amazonaws.com/<bucket name> which adds a unique DNS entry to amazon for access to the bucket. This is why the bucket namespace has to be globally unique.

When you successfully upload a file to S3 you will receive a status code 200 which allows you to test if an upload was successful or not.

You can change object permissions for an entire bucket by going to the Permissions tab of the bucket and then editing the Bucket Policy section. These policies are defined in JSON.

You will need to replace the resource location with whatever ARN you have on the page which points to the bucket you are editing if you are using a bucket policy template.

If you make everything in a bucket public you should see a warning in the buckets section showing Access as Public. You can make a bucket public after you have unchecked the Block All Public Access options.

Hosting Static Content

You can host the contents of a bucket as a static website by going to Properties -> Static Website Hosting -> Edit and setting your index and error page HTML documents as entry points for your site. This will automatically root requests etc. and serve an error page if an extension is not known.

Static S3 websites are serverless and scale with demand.

Objects

S3 is a key value based object store.

An Object in S3 consists of:

Key: The name of the object e.g. menu.txt or funnycat.gif
Value: The data that makes up the object (as a sequence of bytes)
Version ID
Metadata

They also contain subresources consisting of:

Access Control Lists
Torrent

You can make an object public by selecting the object in your bucket’s Object section and selecting Make Public from the Actions drop down.

You can access an object by clicking into the object in your bucket’s Object section and clicking the Object URL.

Data Consistency

There is read after write consistency for PUT on new objects. This means, as soon as you create a new object you will be able to read that data immediately.

There is eventual consistency for overwrite PUT and DELETE on existing objects. This means it can take some time for updates and deletes to propagate across the S3 system. After updating you may still be able to read the old version of a file and after deletion you may still be able to access a file for a period of time (up to a second).

Reliability

S3 platform is built for 99.99% availability, however Amazon only guarantee 99.9% availability.

Amazon guarantee 11 x 9 durability of files which is equivalent to 99.999999999%. Durability describes you certain you can be that a file will not be lost once it is uploaded to S3, thus S3 is very reliable in this capacity.

Security

Access Control Lists

An access control list allows you to create access rules on an individual file basis. For example, only allowing certain users to access a file, this is a type of IAM Policy.

Bucket Policies

Bucket Policies allow you to create access rules for entire buckets to a control the security of large sets of data.

Object Policies

Object Policies allow you to create access rules for individual files in a bucket to a control the security of data.

Cloudfront Access

You can give a cloudfront user access to an S3 using a policy that specifies the ARN of the Cloudfront distribution’s access user (not the distribution’s ARN) as well as an Action with an access method to allow and the ARN of the bucket.

{
    "Version": "2008-10-17",
    "Id": "PolicyForCloudFrontPrivateContent",
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity <CLOUDFRONT_USER_ID>"
            },
            "Action": "s3:<S3_ACCESS_METHOD>",
            "Resource": "arn:aws:s3:::<BUCKET_NAME>/*"
        }
    ]
}

If you want to allow all actions to the S3 bcuket you can use a * asterisk wildcard character as the access method.

"Action": "s3:*"

Storage Classes

Amazon offers several storage class options using S3. You find a breakdown of the options here.

You can change the storage class of your objects at any time.

S3 Standard

The original S3 storage service.

99.99% Availability and 99x11% durability
Stored redundantly across multiple devices
Can survive loss of two facilities concurrently

S3 IA

S3 - IA stands for infrequently accessed data which is data that you might only access every 2 -3 months.

Designed for data that is not used often
Allows for rapid access when needed (within milliseconds)
Lower fee than S3 but charges incurred for retrieval of data.

S3 One Zone IA

S3 One Zone IA is similar to IA but limits data to a single availability zone.

S3 Intelligent Tiering

S3 Intelligent Tiering uses machine learning to automatically optimise storage usage for the lowest cost based on how you use the stored files. It will use some combination of the above options for thee optimisation to this end.

S3 Glacier

Used for storing files that are used very infrequently
Costing is competitive with on-premises storage solutions
Retrieval times are configurable for between minutes and hours for retrieval

S3 Glacier Deep Archive

Amazon’s lowest cost storage solution.
Retrieval time takes 12 hours + from when file is requested.

S3 Outposts

Outpost users can now use S3 buckets as well.

S3 RRS

RRS stands for reduced redundancy storage
Currently being phased out and will not be tested

Charges

Charging for S3 is based on:

Storage usage
Requests made
Storage Management usage
Data transfer usage
Transfer Acceleration usage
Cross Region Replication usage

Transfer Acceleration

Transfer Acceleration is used for speeding up and securing long distance file transfers between users and an S3 bucket. This is achieved by using Amazon’s CloudFront edge locations which are a network of globally distributed centers that extend Amazon’s internal network.

For example, if a user in Australia is trying to upload a file to an S3 bucket in London. Transfer Acceleration will allow them to upload the file directly to a CloudFront edge location in Australia (or whatever is nearest to them) this will then send that file directly across the world to the London S3 storage via Amazon’s internal network (no longer over the public internet) which supports an optimised network that runs very fast.

Cross Region Replication

Cross Region Replication allows for the replicated storage of files across different regions when your primary S3 bucket changes. Changes will be persisted to different regions allowing for disaster recovery.

Encryption

You can change the encryption of your objects at any time.

SDK

Javascript

Link to the basics of the Javascript S3 SDK.

You can access S3 specific scripting functions using the S3 function on the AWS object.

var AWS = require('aws-sdk'); 
var s3 = new AWS.S3();

You can access objects from S3 using the getObjects method on the S3 object. This is an asynchronous function that takes an object with a Bucket property that takes the globally unique name of the bucket, and a Key property which is a path to the object within the bucket that you want to access.

You must convert the result of getObjects into a promise to be able to handle it with await or other asynchronous protocols.

const getS3Object = async () {
  const params {
    Bucket: "my-buckets-name",
    Key: "file.txt"
  };
  const data = s3.getObjects(params).promise();
  // destructure data here to get file contents back
}

S3 URI

If see <SR URI> in the S3 AWS CLI documentation this referring to the S3 URI format that comes in the form:

s3://<BUCKET_NAME>

To get an S3 URI to an object in a bucket simply append the object path to the S3 and bucket name base url shown above. The format for this would be:

s3://<BUCKET_NAME>/<OPTIONAL_PATH>/<OBJECT>

And, for a single object:

s3://<BUCKET_NAME>/<OBJECT>

For example if we wanted to get a path to an file.json in a folder called stuff in our bucket called mybucket then the S3 URI would be.

s3://mybucket/stuff/file.json