Skip to content

EPIC: Production-level documentation #2268

@randomvariable

Description

@randomvariable

/kind documentation
/help

I've been going through documents for AWS Technical Baseline Reviews, and have drawn up this list of documentation that we should have to help end-users based on their checklist.

  • Typical deployment with list of all resources
  • List all deployment options (single-AZ, multi-AZ, multi-region)
  • Expected time to complete deployment
  • List skills / knowledge to complete deployment (familiarity with AWS, specific services etc...)
  • Supported environment configurations (networking, DNS etc...)
  • Architecture diagram using AWS simple icons, labelling where user data is stored
  • Network diagram showing VPCs, subnets, security groups, NACLs, and ingress/egress mappings
  • Integration points showing third-party assets (e.g. Kubernetes OCI registries)
  • Links to IAM and IAM best practice documentation
  • How to deploy without root privileges
  • Prescriptive guidance on least privilege policies
  • Clearly highlight public resources (like AMIs, clusterctl Github repos)
  • Describe purpose and location of each key (EBS root volume encryption etc....)
  • Document maintenance of AWS Secrets Manager
  • Highight where sensitive data is stored (PVCs and etcd root volumes)
  • List of all billable services, showing which are mandatory or optional
  • Guidance for EC2 instance type and size selection
  • Guidance for EBS volume type and size selection
  • Step by step instructions for typical deployment
    architecture
  • Step-by-step deployment guide for maximising uptime and reliability
  • Prescriptive guidance for testing and troubleshooting
  • Step-by-step Instruitions on how to assess and monitor the health of the cluster and Cluster API
  • Step-by-step instructions for restoring data from a backup
  • Step-by-step instructions for recovery from instance failure
  • Step-by-step instructions for recovery from AZ failure
  • Documentation on managing AWS & K8s service limits to allow for disaster recovery
  • Documented RTO and RPOs for deployments
  • Step-by-step instructions for rotating credentials and cryptographic keys
  • Prescriptive guidance for software patches and upgrades
  • Prescriptive guidance for managing AWS service limits
  • Step-by-step instructions on handling fault conditions
  • Step-by-step instructions for recovery
  • How to use externally provisioned ASGs via third-party services for both unmanaged and EKS
  • How to run "airgapped"
  • How to bootstrap with temporary credentials
  • Diagnosing CloudFormation errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/documentationCategorizes issue or PR as related to documentation.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions