Skip to content

Add folder to test dir for custom clusterloader2 configs #5539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions test/clusterloader2/overrides/scheduler_throughput.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
CL2_DEFAULT_QPS: 10000 # Default 500
CL2_DEFAULT_BURST: 20000 # Default 1000
CL2_UNIFORM_QPS: 10000 # Default 500
CL2_SCHEDULER_THROUGHPUT_PODS_PER_DEPLOYMENT: 50000
181 changes: 181 additions & 0 deletions test/clusterloader2/testing/access-tokens/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Stress testing access token validation
#
# Targeting 2 000 tokens with 5 000 total QPS for 5k node cluster, so it's 2.5
# QPS per token.
#
# For this test number of tokens is not changed with number of nodes.
# By default, those 2 000 tokens are are assigned to 80 service accounts, with
# 25 tokens each. There is 1:1 mapping between deployments and service
# accounts, so 80 deployments is generated, each with one pod.
#
# For smaller cluster, we scale down lineary QPS per token to
# 2.5 * (Number of nodes)/(5 000). This results in 1 QPS per node, if there is
# 2 000 tokens.
#
# Structure and mapping:
# * For each namespace (by default 1), we are generating service accounts and
# deployments (by default 80).
# * For each service account we are generating tokens (by default 25).
# * For each deployment we are creating pods (by default 1) and for those pods
# we are mounting all tokens generated from linked service account.
# * Each pod is running a number of clients equal to number of assigned tokens.
#
# When defining your own parameters:
# Number of tokens = ${namespaces} * ${serviceAccounts} * ${tokensPerServiceAccount}
# Total QPS = Number of tokens * ${replicas} * ${qpsPerWorker}
#
# For default values in 5k cluster this means:
# Number of tokens = 1 * 80 * 25 = 2000
# Total QPS = 2000 * 1 * 2.5 = 5000

# Size of test variables
{{$namespaces := DefaultParam .CL2_ACCESS_TOKENS_NAMESPACES 1}}
{{$serviceAccounts := DefaultParam .CL2_ACCESS_TOKENS_SERVICE_ACCOUNTS 80}}
{{$tokensPerServiceAccount := DefaultParam .CL2_ACCESS_TOKENS_TOKENS_PER_SERVICE_ACCOUNT 25}}
{{$replicas := DefaultParam .CL2_ACCESS_TOKENS_REPLICAS 1}}
{{$qpsPerWorker := DefaultParam .CL2_ACCESS_TOKENS_QPS (MultiplyFloat 2.5 (DivideFloat .Nodes 5000))}}

# TestMetrics measurement variables
{{$ENABLE_SYSTEM_POD_METRICS:= DefaultParam .ENABLE_SYSTEM_POD_METRICS true}}
{{$ENABLE_RESTART_COUNT_CHECK := DefaultParam .ENABLE_RESTART_COUNT_CHECK true}}
{{$RESTART_COUNT_THRESHOLD_OVERRIDES:= DefaultParam .RESTART_COUNT_THRESHOLD_OVERRIDES ""}}

# Configs
{{$ALLOWED_SLOW_API_CALLS := DefaultParam .CL2_ALLOWED_SLOW_API_CALLS 0}}

name: access-tokens
namespace:
number: {{$namespaces}}
tuningSets:
- name: Sequence
parallelismLimitedLoad:
parallelismLimit: 1
steps:
- name: Starting measurements
measurements:
- Identifier: APIResponsivenessPrometheus
Method: APIResponsivenessPrometheus
Params:
action: start
- Identifier: TestMetrics
Method: TestMetrics
Params:
action: start
systemPodMetricsEnabled: {{$ENABLE_SYSTEM_POD_METRICS}}
restartCountThresholdOverrides: {{YamlQuote $RESTART_COUNT_THRESHOLD_OVERRIDES 4}}
enableRestartCountCheck: {{$ENABLE_RESTART_COUNT_CHECK}}
allowedSlowCalls: {{$ALLOWED_SLOW_API_CALLS}}

- name: Creating ServiceAccounts
phases:
- namespaceRange:
min: 1
max: {{$namespaces}}
replicasPerNamespace: 1
tuningSet: Sequence
objectBundle:
- basename: service-account-getter
objectTemplatePath: role.yaml
- namespaceRange:
min: 1
max: {{$namespaces}}
replicasPerNamespace: {{$serviceAccounts}}
tuningSet: Sequence
objectBundle:
- basename: account
objectTemplatePath: serviceAccount.yaml
- basename: account
objectTemplatePath: roleBinding.yaml
templateFillMap:
RoleName: service-account-getter

- name: Creating Tokens
phases:
{{range $i := Loop $serviceAccounts}}
- namespaceRange:
min: 1
max: {{$namespaces}}
replicasPerNamespace: {{$tokensPerServiceAccount}}
tuningSet: Sequence
objectBundle:
- basename: account-{{$i}}
objectTemplatePath: token.yaml
{{end}}


- name: Starting measurement for waiting for pods
measurements:
- Identifier: WaitForRunningPods
Method: WaitForControlledPodsRunning
Params:
action: start
apiVersion: apps/v1
kind: Deployment
labelSelector: group = access-tokens
operationTimeout: 15m

- name: Creating pods
phases:
- namespaceRange:
min: 1
max: {{$namespaces}}
replicasPerNamespace: {{$serviceAccounts}}
tuningSet: Sequence
objectBundle:
- basename: account
objectTemplatePath: deployment.yaml
templateFillMap:
QpsPerWorker: {{$qpsPerWorker}}
Replicas: {{$replicas}}
Tokens: {{$tokensPerServiceAccount}}

- name: Waiting for pods to be running
measurements:
- Identifier: WaitForRunningPods
Method: WaitForControlledPodsRunning
Params:
action: gather

- name: Wait 5min
measurements:
- Identifier: Wait
Method: Sleep
Params:
duration: 5m

- name: Deleting pods
phases:
- namespaceRange:
min: 1
max: {{$namespaces}}
replicasPerNamespace: 0
tuningSet: Sequence
objectBundle:
- basename: account
objectTemplatePath: deployment.yaml
templateFillMap:
QpsPerWorker: {{$qpsPerWorker}}
Replicas: {{$replicas}}
Tokens: {{$tokensPerServiceAccount}}

- name: Waiting for pods to be deleted
measurements:
- Identifier: WaitForRunningPods
Method: WaitForControlledPodsRunning
Params:
action: gather

- name: Collecting measurements
measurements:
- Identifier: APIResponsivenessPrometheus
Method: APIResponsivenessPrometheus
Params:
action: gather
enableViolations: true
- Identifier: TestMetrics
Method: TestMetrics
Params:
action: gather
systemPodMetricsEnabled: {{$ENABLE_SYSTEM_POD_METRICS}}
restartCountThresholdOverrides: {{YamlQuote $RESTART_COUNT_THRESHOLD_OVERRIDES 4}}
enableRestartCountCheck: {{$ENABLE_RESTART_COUNT_CHECK}}
45 changes: 45 additions & 0 deletions test/clusterloader2/testing/access-tokens/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
{{$name := .Name}}

apiVersion: apps/v1
kind: Deployment
metadata:
name: {{.Name}}
labels:
group: access-tokens
spec:
selector:
matchLabels:
group: access-tokens
name: {{.Name}}
replicas: {{.Replicas}}
template:
metadata:
labels:
group: access-tokens
name: {{.Name}}
spec:
imagePullPolicy: Always
containers:
- name: access-tokens
image: gcr.io/k8s-testimages/perf-tests-util/access-tokens:v0.0.6
args:
{{range $tokenId := Loop .Tokens}}
- --access-token-dirs=/var/tokens/{{$name}}-{{$tokenId}}
{{end}}
- --namespace={{.Namespace}}
- --qps-per-worker={{.QpsPerWorker}}
resources:
requests:
cpu: {{AddInt 10 (MultiplyFloat .Tokens .QpsPerWorker)}}m # 1mCpu per Token * per QPS
memory: {{AddInt 50 (MultiplyInt .Tokens 5)}}Mi
volumeMounts:
{{range $j := Loop .Tokens}}
- name: {{$name}}-{{$j}}
mountPath: /var/tokens/{{$name}}-{{$j}}
{{end}}
volumes:
{{range $j := Loop .Tokens}}
- name: {{$name}}-{{$j}}
secret:
secretName: {{$name}}-{{$j}}
{{end}}
11 changes: 11 additions & 0 deletions test/clusterloader2/testing/access-tokens/role.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{.Name}}
rules:
- apiGroups:
- ""
resources:
- serviceaccounts
verbs:
- get
12 changes: 12 additions & 0 deletions test/clusterloader2/testing/access-tokens/roleBinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{.Name}}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{.RoleName}}-0
subjects:
- kind: ServiceAccount
name: {{.Name}}
namespace: {{.Namespace}}
4 changes: 4 additions & 0 deletions test/clusterloader2/testing/access-tokens/serviceAccount.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{.Name}}
7 changes: 7 additions & 0 deletions test/clusterloader2/testing/access-tokens/token.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: v1
kind: Secret
metadata:
name: {{.Name}}
annotations:
kubernetes.io/service-account.name: {{.BaseName}}
type: kubernetes.io/service-account-token
94 changes: 94 additions & 0 deletions test/clusterloader2/testing/batch/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
{{$MODE := DefaultParam .MODE "Indexed"}}
{{$NODES_PER_NAMESPACE := MinInt .Nodes (DefaultParam .NODES_PER_NAMESPACE 100)}}
{{$PODS_PER_NODE := DefaultParam .PODS_PER_NODE 30}}
{{$LOAD_TEST_THROUGHPUT := DefaultParam .CL2_LOAD_TEST_THROUGHPUT 10}}

{{$totalPods := MultiplyInt $PODS_PER_NODE .Nodes}}
{{$namespaces := DivideInt .Nodes $NODES_PER_NAMESPACE}}
{{$podsPerNamespace := DivideInt $totalPods $namespaces}}

# small_job: 1/2 of namespace pods should be in small Jobs.
{{$smallJobSize := 5}}
{{$smallJobsPerNamespace := DivideInt $podsPerNamespace (MultiplyInt 2 $smallJobSize)}}
# medium_job: 1/4 of namespace pods should be in medium Jobs.
{{$mediumJobSize := 20}}
{{$mediumJobsPerNamespace := DivideInt $podsPerNamespace (MultiplyInt 4 $mediumJobSize)}}
# Large_job: 1/4 of namespace pods should be in large Jobs.
{{$largeJobSize := 400}}
{{$largeJobsPerNamespace := DivideInt $podsPerNamespace (MultiplyInt 4 $largeJobSize)}}

{{$jobRunningTime := DefaultParam .CL2_JOB_RUNNING_TIME "30s"}}

name: batch

namespace:
number: {{$namespaces}}

tuningSets:
- name: UniformQPS
qpsLoad:
qps: {{$LOAD_TEST_THROUGHPUT}}

steps:
- name: Start measurements
measurements:
- Identifier: WaitForFinishedJobs
Method: WaitForFinishedJobs
Params:
action: start
labelSelector: group = test-job
- Identifier: JobLifecycleLatency
Method: JobLifecycleLatency
Params:
action: start
labelSelector: group = test-job
- name: Create {{$MODE}} jobs
phases:
- namespaceRange:
min: 1
max: {{$namespaces}}
replicasPerNamespace: {{$smallJobsPerNamespace}}
tuningSet: UniformQPS
objectBundle:
- basename: small
objectTemplatePath: "job.yaml"
templateFillMap:
Replicas: {{$smallJobSize}}
Mode: {{$MODE}}
Sleep: {{$jobRunningTime}}
- namespaceRange:
min: 1
max: {{$namespaces}}
replicasPerNamespace: {{$mediumJobsPerNamespace}}
tuningSet: UniformQPS
objectBundle:
- basename: medium
objectTemplatePath: "job.yaml"
templateFillMap:
Replicas: {{$mediumJobSize}}
Mode: {{$MODE}}
Sleep: {{$jobRunningTime}}
- namespaceRange:
min: 1
max: {{$namespaces}}
replicasPerNamespace: {{$largeJobsPerNamespace}}
tuningSet: UniformQPS
objectBundle:
- basename: large
objectTemplatePath: "job.yaml"
templateFillMap:
Replicas: {{$largeJobSize}}
Mode: {{$MODE}}
Sleep: {{$jobRunningTime}}
- name: Wait for {{$MODE}} jobs to finish
measurements:
- Identifier: JobLifecycleLatency
Method: JobLifecycleLatency
Params:
action: gather
timeout: 10m
- Identifier: WaitForFinishedJobs
Method: WaitForFinishedJobs
Params:
action: gather
timeout: 10m
21 changes: 21 additions & 0 deletions test/clusterloader2/testing/batch/job.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: batch/v1
kind: Job
metadata:
name: {{.Name}}
labels:
group: test-job
spec:
parallelism: {{.Replicas}}
completions: {{.Replicas}}
completionMode: {{.Mode}}
template:
metadata:
labels:
group: test-pod
spec:
containers:
- name: {{.Name}}
image: gcr.io/k8s-staging-perf-tests/sleep:v0.0.3
args:
- {{.Sleep}}
restartPolicy: Never
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
RESTART_COUNT_THRESHOLD_OVERRIDES: |
# Main purpose of this check is detection crashlooping pods.
# With enabled node killer pods running on a killed node crash, and this is expected
coredns: 1
fluentd-gcp: 1
kube-proxy: 1
konnectivity-agent: 1
metadata-proxy: 1
prometheus-to-sd-exporter: 1
volume-snapshot-controller: 1
Loading