ECS Task Role AccessDenied is always fixable if you stop blaming AWS

January 22, 2026

You know the vibe: the container is healthy, the service is green, the app starts… and then your logs say: AccessDeniedException: User is not authorized to perform s3:GetObject At that point, most teams do the classic panic dance. They slap AmazonS3FullAccess on some role, redeploy, and pray. Sometimes it “works” and sometimes it still fails, which is even worse because now it feels random. It isn’t random. It’s usually one of three things: You gave permissions to the wrong role The right role exists, but the task isn’t using it The role is right, but you’re missing a second permission edge (KMS, resource policies, STS, VPC endpoints) This post is the production-grade runbook for debugging it without guessing. The mental model that stops the pain In ECS you have two IAM roles that people constantly mix up: Task execution role This is what ECS needs to start your task: pulling images, writing logs, fetching secrets at startup, etc. AWS calls it the “task execution IAM role.” Task role This is what your application code uses once it’s running, when it calls AWS APIs via an SDK (S3, DynamoDB, SQS, Secrets Manager, you name it). AWS calls it the “task IAM role.” If your app is throwing AccessDenied while calling AWS APIs, the permissions almost always belong on the task role, not the execution role. Why this works cleanly: ECS delivers the role credentials to the container via a standard container credentials flow (SDK reads an env var like AWS_CONTAINER_CREDENTIALS_RELATIVE_URI and fetches temporary creds). The fastest way to see what identity your container is actually using You want to stop arguing about what role “should” be used and instead prove what role is used. Inside the container (or via ECS Exec), run: aws sts get-caller-identity If you don’t have AWS CLI in the image, do the same with your SDK (print the caller identity once on startup), or temporarily add a minimal debug endpoint that calls STS and returns the ARN. If you see an ARN you didn’t expect, the problem is upstream: task definition, role attachment, trust policy, or your SDK credential chain. The classic failure mode You added permissions to the execution role, redeployed, still got AccessDenied. That is completely consistent with how ECS is designed. The execution role is for ECS “plumbing,” the task role is for your app runtime calls. So the real question becomes: Does your task definition actually set a task role? In the task definition JSON, you want both (often): executionRoleArn taskRoleArn Example skeleton: { "family": "my-service", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "512", "memory": "1024", "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole", "taskRoleArn": "arn:aws:iam::123456789012:role/myServiceTaskRole", "containerDefinitions": [ { "name": "api", "image": "123456789012.dkr.ecr.eu-west-2.amazonaws.com/my-api:latest", "essential": true, "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/my-service", "awslogs-region": "eu-west-2", "awslogs-stream-prefix": "ecs" } } } ] } If taskRoleArn is missing, your app will not have the intended permissions. Trust policy checks that waste hours if you forget them Even if you set taskRoleArn, ECS can only assume it if the role trust policy allows the ECS tasks service principal. Your task role trust policy should look like this (the important part is the principal): { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } If this is wrong, you’ll see “access denied” style failures that look like permissions but are actually “role cannot be assumed.” AWS has a solid write-up on ECS role best practices and why task roles are the right isolation boundary. The boring but correct way to write the policy Let’s say your app needs to read from one S3 bucket prefix, and nothing else. Do this (least privilege), not AmazonS3FullAccess: { "Version": "2012-10-17", "Statement": [ { "Sid": "ReadArtifacts", "Effect": "Allow", "Action": ["s3:GetObject"], "Resource": ["arn:aws:s3:::my-bucket/private/artifacts/*"] }, { "Sid": "ListPrefix", "Effect": "Allow", "Action": ["s3:ListBucket"], "Resource": ["arn:aws:s3:::my-bucket"], "Condition": { "StringLike": { "s3:prefix": ["private/artifacts/*"] } } } ] } If you only grant GetObject and forget ListBucket, you’ll get weird “works sometimes” behaviour depending on whether your code lists before reading. The hard mode gotchas that make people think ECS is cursed These are the ones that bite experienced engineers because they are not obvious. 1) SSE-KMS encrypted S3 objects Your role can have S3 permissions and still fail because the object is encrypted with KMS and you don’t have kms:Decrypt for the key. Symptoms: S3 calls fail even though policy “looks right.” Fix: Add KMS permissions on the key (and check key policy too). 2) Bucket policy or resource policy overrides you IAM allow does not automatically win if the bucket policy denies, or only allows a different principal. Same for Secrets Manager resource policies. Symptoms: You swear the role has permissions, but AccessDenied persists. Fix: Inspect the resource policy and make sure it allows your task role ARN. 3) Your SDK is not using ECS container credentials ECS injects a container credentials endpoint for the SDK to use. If your app is overriding credentials (env vars, shared config, a hardcoded profile), it might ignore the task role entirely. AWS documents how container credential providers work and what variables SDKs use. Symptoms: get-caller-identity shows an unexpected principal. Fix: remove overrides, ensure the SDK is allowed to use default credential resolution, and verify the ECS-provided env var exists. 4) You’re debugging the execution role instead of the task role This happens a lot when the logs are fine (execution role works), but runtime calls fail (task role missing/wrong). The roles have different jobs. The no-guesswork triage flow I use in real systems Confirm the failing AWS action and resource from the error text From inside the container, run aws sts get-caller-identity (or SDK equivalent) Confirm the task definition has taskRoleArn set Confirm the task role trust policy allows ecs-tasks.amazonaws.com Confirm the IAM policy includes the exact action and the correct resource ARN Check resource policies (S3 bucket policy, KMS key policy, Secrets Manager policy) If still stuck, use AWS’s ECS IAM role config troubleshooting guide as a structured checklist This is the difference between a senior engineer and a chaos goblin: you turn “it’s broken” into a deterministic elimination process. Make it visual on your blog If you want this post to slap, add one diagram. Either draw it in Figma, or recreate it cleanly. Diagram idea 1 Two lanes: ECS agent lane: “Pull image, send logs, fetch secrets at startup” → Execution role Application lane: “Call S3, DynamoDB, SQS, Secrets at runtime” → Task role Reference AWS docs in the caption so it looks legit. Diagram idea 2 Credentials flow: Task role → STS temporary creds → ECS injects container credentials endpoint → SDK reads env var → API call succeeds Copy-ready ending checklist If your ECS task is throwing AccessDenied, check this in order: Task definition has taskRoleArn set (not just execution role) Task role trust policy allows ECS tasks (ecs-tasks.amazonaws.com) Your container is actually using that role (STS caller identity) Policy matches the exact action + resource ARN Resource policies and KMS aren’t silently blocking you SDK is not overridden away from container credentials

Interested in updates on new npm releases?

Sign up with your email and get fresh updates as soon as they drop.