The Occasional Masthead

It’s too common these days that I struggle to find solid stretches of time to think about things, so I tend to make progress on tech chores while I’m on holiday. This is an example…

Callistemon growing in a hothouse in Amsterdam (Photo by the author)

One of the things that has been on my mind is to follow AWS best practices, and try to eliminate the use of IAM access key and secret access key pairs from my laptop. It’s a good rule of thumb from AWS, but it turns out to be rather frustratingly poorly explained how you can go about it. So, here’s what I worked out for my purposes.

Two big caveats: this solution was for my own personal use cases, and may not match yours. Also, it’s quite Mac-centric, because that’s where I work. Your mileage may vary.

Let’s start with my use case. I have a personal AWS account (well, technically, accounts so that I can experiment with cross-account configurations) that I manage using Terraform. I do some work directly in the console, and have various command line helper tools and scripts. I don’t have (or need) many IAM principals in the account, and always retain the root credentials locked in a box marked “break glass in emergency” and otherwise never use. I’ve been using 1Password for a long time, and am very happy with it as a secure repository for secrets. I’m also pretty happy with the base level of security that MacOS provides.

With all that in mind, I had several specific goals:

make sure that the $HOME/.aws/config and $HOME/.aws/credentials do not contain IAM secret key pairs;
make sure that MFA (multi factor authentication) is required for (almost) all interactions with the AWS account;
Have a solution that worked equally well for Terraform, AWS CLI, and console use.

All of which should be straight forward to configure, and clearly documented. Sigh. Of course it’s not. Well, here’s what I worked out. I repeat my caveat, this works for me, but may not work for you.

Important safety tip: if you are doing these sorts of changes, make very sure that you have access to the root account, and try to have a super-user account that is not subject to these constraints. It’s easy to accidentally lock yourself out of your account if you are not careful!

I did not start here, but it’s a useful place to start the explanation. First up, we need a policy that requires (almost) all actions to have MFA active. The logic here is a bit upside down and back to front but is basically “deny everything except the listed actions unless MFA is present”

{
    "Statement": [
        {
            "Action": [
                "iam:ResyncMFADevice",
                "iam:ListVirtualMFADevices",
                "iam:ListUsers",
                "iam:ListMFADevices",
                "iam:EnableMFADevice",
                "iam:CreateVirtualMFADevice"
            ],
            "Condition": {
                "Bool": {
                    "aws:MultiFactorAuthPresent": "false"
                }
            },
            "Effect": "Deny",
            "Resource": "*",
            "Sid": "RequireMFA"
        }
    ],
    "Version": "2012-10-17"
}

This is attached to the IAM groups that contain my IAM principals. (My memory is that I think I got the inspiration for this from Radish Logic)

I also have a policy that allows IAM principals with console access to self-manage their MFA configuration – this means I don’t need to switch to a super user principal to manage other principal’s MFA. This one is a bit tricky, and I’m not sure that the resources defined are completely correct, as we need to ensure a principal can only alter it’s own MFA configuration, and not other principal’s. We also need to make sure that destructive actions require MFA to be present. Again, this is attached to the IAM groups.

{
    "Statement": [
        {
            "Action": [
                "iam:EnableMFADevice",
                "iam:CreateVirtualMFADevice"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:iam::304388919931:user/${aws:username}",
                "arn:aws:iam::304388919931:mfa/${aws:username}/*"
            ],
            "Sid": "ActivateMFA"
        },
        {
            "Action": [
                "iam:ResyncMFADevice",
                "iam:DeleteVirtualMFADevice",
                "iam:DeactivateMFADevice"
            ],
            "Condition": {
                "Bool": {
                    "aws:MultiFactorAuthPresent": "true"
                }
            },
            "Effect": "Allow",
            "Resource": [
                "arn:aws:iam::304388919931:user/${aws:username}",
                "arn:aws:iam::304388919931:mfa/${aws:username}/*"
            ],
            "Sid": "DeactivateMFA"
        },
        {
            "Action": [
                "iam:ListVirtualMFADevices",
                "iam:ListMFADevices",
                "iam:GetMFADevice",
                "iam:GetLoginProfile"
            ],
            "Condition": {
                "Bool": {
                    "aws:MultiFactorAuthPresent": "true"
                }
            },
            "Effect": "Allow",
            "Resource": [
                "arn:aws:iam::304388919931:user/${aws:username}",
                "arn:aws:iam::304388919931:mfa/${aws:username}"
            ],
            "Sid": "ListMFA"
        }
    ],
    "Version": "2012-10-17"
}

Finally, I created an IAM principal that I would use for nothing other than boot-strapping up to an admin role. That principal only has:

the require MFA policy
the self-manage MFA policy
the arn:aws:iam::aws:policy/IAMUserChangePassword policy so it can self-manage it’s password.

This limited user also has attached:

An access key / secret access key pair;
An MFA device

Last step is I need an “administrator” role that this principal can adopt. I could of course have multiple roles, but for me that’s overkill. The role is very straightforward, with only the inbuilt arn:aws:iam::aws:policy/AdministratorAccess policy attached. The trust relationship is a little bit tricky, just to give me some more definite locking down of who can adopt the role, and when.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::304388919931:user/robert"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": "142.85.240.69"
                }
            }
        }
    ]
}

As you can see, it permits only a precise IAM principal to adopt the role (since I define all this with Terraform, it’s trivial for me to add and maintain a list of principals in the future, and I’m never going to need many). It also locks adoption down to a specific IP address – this works off my laptop, at home, and everywhere else is locked down. This might be a pain in the butt in the future, but I like the belts-and-braces security over the super-user account.

Ok then! Let’s recap. We have:

a user that can adopt an admin role under certain constraints;
the user has almost no other permissions;
users are locked down to require MFA to act against AWS;
users can self-manage their MFA configuration;

Let’s turn to the next pieces. The only way I could get all this to work was to turn to aws-vault. This has become something of a de-facto standard, and really (really) should be functionality that AWS provide directly themselves as part of their command line tooling. I won’t go into the operation of aws-vault, beyond noting that the tool is aware of all the profiles listed in $HOME/.aws/config and can store credentials for those profiles in one of a number of back ends. In my case I’ve opted to use the MacOS Keychain (which is pretty bulletproof), so in my shell environment I define

AWS_VAULT_BACKEND=keychain

And following the usage notes from 99Designs I added the key chain database to the MacOS Keychain Access tool for convenience.

First step was to add the access/secret IAM pair for my “bootstrap” user to aws-vault. Note that because the tool is aware of all the profiles, it shows up the other profiles, but I’ve only got credentials for the desired user stored.

% aws-vault list
Profile                  Credentials              Sessions                 
=======                  ===========              ========                 
default                  -                        -                        
robert                   robert                   -                        
admin                    -                        -

There’s nothing particularly magical about how aws-vault stores the credentials – if you look in the keychain using Keychain Access you’ll see it’s just a password with the account ‘robert’ and the password set to a chunk of JSON containing the secret/access pair.

So technically the secret/access pair are still on my laptop, but they are no longer present anywhere in plain text, and they are locked away behind access control managed by MacOS.

Next step! the profiles in $HOME/.aws/config – please note that you shouldn’t need $HOME/.aws/credentials, and ideally that file can be removed. Here’s what I have (or what I had as the first cut, see below):

[default]
region=eu-west-2
output=json

[profile admin]
role_arn = arn:aws:iam::304388919931:role/admin_role
source_profile = robert
role_session_name = Robert

[profile robert]
mfa_serial = arn:aws:iam::304388919931:mfa/OnePassword
mfa_process = op item get 'AWS - robert' --otp
credential_process = aws-vault export --format=json robert

Going from the top down… the default profile is self explanatory, setting default output to JSON, and using a particular region by default.

The admin profile is the profile that I use for working against AWS from the command line. It specifies the role to adopt, but also specifies the profile used to adopt the role. The session name is just a nice-to-have, as it will show up in various logging contexts.

The juicy parts are in the source profile. credential_process specifies a command that is executed to obtain a short lived session token (defaulting to 24 hours, but that can be adjusted) from the AWS STS service. It uses the access/secret key pair we stored in aws-vault to call the STS service which means… AWS requires MFA for that account to act!

Woohoo!

Ok, so we’ve marked the profile (mfa_serial) with the ARN of the MFA device associated with the robert IAM principal. In my case here, that is 1Password. How do we get the TOTP one time password out of 1Password then? That’s where mfa_process comes in.

Here I use the 1Password command line tool to pull the TOTP

op item get 'AWS - robert' --otp

This will of course engage with the 1Password app itself, so the app may ask you to unlock your vault. What’s that magic item id? just the name of the 1Password entry.

So, if I now try to use the admin profile, what happens:

% aws --profile admin sts get-caller-identity
{
    "UserId": "AROA46CDIYCJWSGJG7YCQ:Robert",
    "Account": "304388919931",
    "Arn": "arn:aws:sts::304388919931:assumed-role/admin_role/Robert"
}

The sequence behind the scenes was:

the aws tool sees that we want to use the robert bootstrap profile
aws-vault is called to obtain some short lived credentials via STS
The op tool fetches the TOTP from 1Password
STS is called with the credentials
aws-vault hands back a set of short lived credentials associated with the admin_role IAM role

Magic!

Some additional notes on this before I move on to talk about how to use this with Terraform.

For one, you can see information in aws-vault about the session we’ve created, including the age of the session.

% aws-vault list
Profile                  Credentials              Sessions                    
=======                  ===========              ========                    
default                  -                        -                           
robert                   robert                   sts.GetSessionToken:56m49s  
admin                    -                        -

% aws-vault export --format=json robert
{
  "Version": 1,
  "AccessKeyId": "ASIA46CDIYCJUHVTRYOE",
  "SecretAccessKey": "gB56vXXnn4EhdsALNqyNtWAeuuKywOIww0JoKMhp",
  "SessionToken": "IQ...NZoS8VgS8=",
  "Expiration": "2023-12-29T17:25:56Z"
}

The session token is also cached by the AWS CLI as a chunk of JSON inside $HOME/.aws/cli/cache but you really should not rely on that, or on the format of the data there remaining unchanged. It does mean though that if you manually remove the session from aws-vault, the AWS CLI may unexpectedly keep working without a change!

The final real change I made here though was to switch from using 1Password as the MFA device to using a YubiKey. This is, again, harder than it should be. Simply assigning a YubiKey as MFA for logging into the console is simple and straightforward through the console. Using a YubiKey for locking the CLI access is annoying, but has been documented by AWS – see “Enhance programmatic access for IAM users using a YubiKey for multi-factor authentication“. It relies on using the ykman tool to set the YubiKey up, and then adjusting the profile to fetch it back out. So the new profile is:

[profile robert]
mfa_serial = arn:aws:iam::304388919931:mfa/YubiKeyCli
mfa_process = ykman oath accounts code -s arn:aws:iam::304388919931:user/robert
credential_process = aws-vault export --format=json robert

Once set up, everything works the same way, except that you need to authenticate with the YubiKey, rather than 1Password.

The final missing piece is authentication to allow me to run Terraform off my desktop against the AWS account.

Ironically, this was actually the first part of the problem I tried to tackle. My initial thinking was just “oh, require MFA for Terraform”, which rapidly ran into the problem that the terraform tool has no support at all for asking for MFA when executing. Hashicorp’s reasoning is that the terraform tool is used in hands-off automation but I think in reality they just don’t want to engage with the problem.

Sorting this out took me back to aws-vault, along with some changes in how my Terraform code was written.

Interestingly, 1Password provide a terraform plugin for their CLI, but I found it quite clunky and difficult to get working reliably. The documentation is not great, and it didn’t really help with the AWS CLI use case. So, back to aws-vault.

The solution is still a bit clunky, as it introduces some additional work on the command line to execute the terraform tool. I suppose that this could be worked around by adding shell aliases or some kind of wrapper shell script, but I’d prefer to just get used to the extra step.

The magic is to use aws-vault exec to either launch a shell, or to directly execute the terraform command

% aws-vault exec admin terraform plan    
data.aws_caller_identity.euwest1: Reading...
data.aws_region.euwest1: Reading...
.
.
.

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Using

aws-vault exec admin

on it’s own will launch a sub shell which has the short-lived credentials present as environmental variables, after which it’s possible to just execute terraform directly (caveat, see below):

% env | grep AWS
AWS_VAULT_BACKEND=keychain
AWS_VAULT=admin
AWS_REGION=eu-west-2
AWS_DEFAULT_REGION=eu-west-2
AWS_ACCESS_KEY_ID=ASIA46CDIYCJS75PQTHZ
AWS_SECRET_ACCESS_KEY=GnpHxkr0EXyJ1/rra5ME7wKK4AzGVPUgIhUOJL70
AWS_SESSION_TOKEN=IQo...sqxRJa4=
AWS_CREDENTIAL_EXPIRATION=2023-12-30T12:27:08Z
AWS_SDK_LOAD_CONFIG=true

As an aside, this seems to be a trend in working at the command line for various tools – use some sort of wrapper to drop into a sub shell with configuration suitable for working, e.g. pyenv to make it easier to use a particular Python version and configuration when you’re working off the command line.

I had to make some additional changes to my Terraform code to cater for the fact that the credentials were now in the environment, rather than in $HOME/.aws, but these were pretty simple. I dropped the “profile” specification from my backend definition, limiting it to just identifying the AWS assets involved

terraform {
  backend "s3" {
    bucket         = "terraform-state20230913"
    key            = "aws-personal"
    region         = "eu-west-2"
    encrypt        = true
    kms_key_id     = "arn:aws:kms:eu-west-2:304388919931:key/dcebdb69-dead-4d33-beef-b00aee818b6d"
    dynamodb_table = "terraform-state-lock"
  }
}

And I also dropped it out as a variable I was injecting as it was now obsolete.

What do I get after all of this?

MFA lock down on (almost) every action in the account
No plain text credentials in my development and operating environment
Use of short lived credentials rather than long lived under most circumstances

I’m pretty happy with that. It’s more reliant on tools external to the Hashicorp and AWS tools than I would like, but finally grappling with what are now more or less conventional standard tools like aws-vault has given me a better understanding of those tools, as well as a better understanding of how the AWS authentication dances work.

Somewhat ironically, toward the end of working this out I realised that it might now be redundant. I’m used in my workplace to using SSO and short-lived credentials supported by Okta. On a modern Mac with a fingerprint sensor, this gives a pleasant and easy experience, but it still relies on their being an identity service somewhere in the mix.

I’ve become aware of AWS Identity Center. It appears that this is well integrated now with the AWS CLI, and the tooling to obtain short-lived credentials is pretty simple, but it’s not fully clear to me whether I can use Identity Center to roll my own identity service and provide SSO, or whether it depends on third party services like Okta or Google identities. I guess that will be my next adventure!

(oh yeah, and I must do something to reduce that 24hour TTL on the short lived credentials)

No more AWS Access Keys?

Post a Comment

Meta

Links

Pages

Categories

RSS Links

Archives