Skip to content

[Bug]: Task run S3 Permission issues when using env file, possible racecondition/out of order execution #6066

@nagelfargithub

Description

@nagelfargithub

Description:

We have a service, using an env file that gets deployed quite often. ECS tasks initially run fine. After a random number of deployments ECS tasks don't start anymore and we see an S3 permission error
Image

The env file is located in a bucket created by CoPilot. All permissions/policies are set via CoPilot. There were no manual changes.

We believe this is a racecondition/eventual consistency problem/out of order execution in CoPilot when deploying a service that uses an env file. We believe that the file is uploaded but the policy, that gets updated through CoPilot uses the old hash in the policy. See Debugging for details.

This is similar to #5756 which was fixed in 1.33.3 but we believe it is a different issue.

We believe, that the S3 permissions are set up with overly specific and version-locked which doesn't account for Copilot's content-addressed file storage system. The IAM role has permissions for a specific hash version of the environment file, but the ECS task is trying to access a different (newer) hash version, resulting in the 403 Forbidden error.

Probable Order of Events:

  1. Copilot deployment updates IAM permissions for specific env file hash (but it is the old hash)
  2. Later: Deployment creates a new version of the env file with a different (new) hash
  3. ECS task gets updated with new hash
  4. Result: 403 Forbidden because IAM permissions still reference the old hash

Note: AWS Q suggested this fix, we can confirm after executing this change the tasks started again without issues

# Update the IAM role to allow access to any version
aws iam put-role-policy \
  --role-name pipelines-dev-daemon-ExecutionRole-[redacted] \
  --policy-name S3EnvFileAccess \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Action": [
          "s3:GetObject"
        ],
        "Resource": "arn:aws:s3:::stackset-pipelines-infras-pipelinebuiltartifactbuc-[redacted]/manual/env-files/*"
      },
      {
        "Effect": "Allow",
        "Action": [
          "s3:GetBucketLocation"
        ],
        "Resource": "arn:aws:s3:::stackset-pipelines-infras-pipelinebuiltartifactbuc-[redacted]"
      }
    ]
  }'

Details:

Copilot v1.34.0, us-east-1

Observed result:

env file can't be downloaded from the Copilot generated S3 bucket

Expected result:

env file can be downloaded form the Copilot generated S3 bucket

Debugging:

Most of the debugging was done via Amazon Q but the analysis describes exactly what we see and the suggested fix also worked so I'll leave some important bits here

Initial Question:
What triggered the ResourceInitializationError: failed to download env files: file download command: non empty error stream: service call has been retried 1 time(s): Forbidden: Forbidden status code: 403

Based on my investigation, here's what triggered the ResourceInitializationError with the 403 Forbidden error:

Root Cause Analysis
The ResourceInitializationError occurs during the ECS task initialization phase when ECS tries to download environment files from S3 before starting your container.
The specific file:
s3://stackset-pipelines-infras-pipelinebuiltartifactbuc-[redacted]/manual/env-files/[redacted].env
2. What Triggered the Error
The 403 Forbidden error was triggered by one or more of these issues:

A. IAM Permission Issues

The task execution role lacks S3 permissions

Role: arn:aws:iam::[redacted]:role/pipelines-dev-daemon-ExecutionRole-[redacted]

[verified manually that the file in question existed, it did]
Had Q analyze the full task startup

  1. ECS starts task initialization
  2. ECS attempts to download env from S3
  3. S3 returns 403 Forbidden (permission denied)
  4. ECS retries the download (1 retry attempt)
  5. Second attempt also fails with 403
  6. ECS marks task as failed with ResourceInitializationError
  7. Container never starts because initialization failed

Why This Environment File is Required:
This appears to be a Copilot-managed task (based on the task definition name pattern) that requires environment variables from the [redacted].env file to configure the application properly.

[confirmed those were ressources automatically created and configured by CoPilot as part of the pipeline infrastructure.
AWS Copilot CLI initiated the infrastructure deployment CloudFormation StackSet managed the actual resource creation pipelines-executionrole was the execution role used by CloudFormation
The bucket is part of the Copilot pipelines application infrastructure
Purpose:
The bucket serves as an artifact storage for the Copilot pipeline, storing:
Build artifacts
Environment files
Pipeline assets

Question Why are the S3 permissions set up in such a way, that the permission error occured when accessing the [redacted].env file?

Let me analyze the S3 permissions configuration to understand why the 403 Forbidden error occurred when accessing the [redacted].env file.

Based on my analysis, here's exactly why the S3 permissions are set up in a way that caused the 403 Forbidden error:

Root Cause: Path Mismatch in Permissions
The permission error occurred due to a specific path mismatch between what the ECS task was trying to access and what permissions were granted.

What the ECS Task Tried to Access:

s3://stackset-pipelines-infras-pipelinebuiltartifactbuc-[redacted]/manual/env-files/[redacted].env
What the IAM Role Has Permission For:
s3://stackset-pipelines-infras-pipelinebuiltartifactbuc-[redacted]/manual/env-files/[redacted].env/80ce15b57b22bb6accb8f26c659e7ace7fdb3e56ce08f05ca99296e396540d07.env
The Permission Configuration Problem:

  1. Overly Specific IAM Permissions
    The ECS task execution role
    pipelines-dev-daemon-ExecutionRole-[redacted]
    has permissions for:
    Specific file with hash:

[redacted].env/80ce15b57b22bb6accb8f26c659e7ace7fdb3e56ce08f05ca99296e396540d07.env
Not the directory:
[redacted].env
(without the hash suffix)

  1. Copilot's Versioned File System
    Copilot uses content-addressed storage where:
    Environment files are stored with SHA256 hashes as filenames
    The IAM role gets permissions for the specific hash version
    But the task definition references the generic path without the hash

  2. S3 Bucket Policy vs IAM Role Mismatch
    S3 Bucket Policy (allows broad access):
    {
    "Effect": "Allow",
    "Principal": {
    "AWS": "arn:aws:iam::[redacted]:root"
    },
    "Action": "s3:",
    "Resource": [
    "arn:aws:s3:::stackset-pipelines-infras-pipelinebuiltartifactbuc-[redacted]/
    ",
    "arn:aws:s3:::stackset-pipelines-infras-pipelinebuiltartifactbuc-[redacted]"
    ]
    }
    IAM Role Policy (too specific):
    {
    "Effect": "Allow",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::stackset-pipelines-infras-pipelinebuiltartifactbuc-[redacted]/manual/env-files/[redacted].env/80ce15b57b22bb6accb8f26c659e7ace7fdb3e56ce08f05ca99296e396540d07.env"
    }
    Why This Design Causes Problems:

  3. Version Drift
    New deployment creates a new hash for [redacted].env IAM permissions still reference the old hash
    ECS task tries to access the new hash → 403 Forbidden

  4. Principle of Least Privilege Gone Wrong
    Copilot applied too granular permissions
    Should allow access to any version of the environment file
    Instead, it locked permissions to a specific version

  5. Deployment Race ConditionNew environment file uploaded with new hash
    ECS task definition updated to reference new file
    IAM permissions not updated to match new hash
    Task fails with 403 error

Summary:
The S3 permissions were set up with overly specific, version-locked access that doesn't account for Copilot's content-addressed file storage system. The IAM role has permissions for a specific hash version of the environment file, but the ECS task is trying to access a different (newer) hash version, resulting in the 403 Forbidden error.

This is a common issue with Copilot's security model where the principle of least privilege is applied too granularly, causing deployment failures when environment files are updated.

Question: But these permissions were setup by copilot automatically not myself so why are they not working although they have been working for some time?

Excellent point! You're absolutely right - if Copilot set these up automatically and they were working before, something changed to break the previously working configuration. Based on my investigation, here's what happened:

What Changed: The Smoking Gun
IAM Policy Update on August 27, 2025
When: August 27, 2025 at 10:11:38 UTC (about 1 day before the failure)

What: CloudFormation updated the inline policy

pipelines-dev-daemonGetEnvFilePolicy

Role:

pipelines-dev-daemon-ExecutionRole-[redacted]

The Problem: Stale File Hash in Permissions
The IAM policy was updated to point to a specific version of the environment file:

arn:aws:s3:::stackset-pipelines-infras-pipelinebuiltartifactbuc-[redacted]/manual/env-files/[redacted].env/80ce15b57b22bb6accb8f26c659e7ace7fdb3e56ce08f05ca99296e396540d07.env
But the ECS task is trying to access a different version (newer hash) of the same file.

What Likely Happened:
Timeline of Events:
August 27, 10:11 UTC: Copilot deployment updated IAM permissions for specific env file hash
Later: New deployment created a new version of [redacted].env with a different hash
August 28, 16:06 UTC: ECS task tries to access the new hash version
Result: 403 Forbidden because IAM permissions still reference the old hash

Root Cause: Copilot's Content-Addressed Storage Issue
How Copilot Environment Files Work:
Content Hashing: Copilot stores env files with SHA256 hashes as filenames
IAM Permissions: Generated for the specific hash at deployment time
File Updates: New content = new hash = new filename
Permission Lag: IAM permissions not updated when file content changes
Why This Breaks:

Old permission (August 27)

s3:GetObject on [redacted].env/80ce15b57b22bb6accb8f26c659e7ace7fdb3e56ce08f05ca99296e396540d07.env

New file (August 28)

[redacted].env/NEW_HASH_HERE.env

Result: Permission denied because hashes don't match

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugIssues that are bugs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions