Skip to content

Add OTel kuttl tests. #4152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ jobs:
registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.8-3.4-0
registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-17.4-0
registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-17.4-3.4-0
registry.developers.crunchydata.com/crunchydata/postgres-operator:latest
- run: go mod download
- name: Build executable
run: PGO_VERSION='${{ github.sha }}' make build-postgres-operator
Expand Down Expand Up @@ -143,8 +144,8 @@ jobs:
--env 'RELATED_IMAGE_POSTGRES_17=registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-17.4-0' \
--env 'RELATED_IMAGE_POSTGRES_17_GIS_3.4=registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-17.4-3.4-0' \
--env 'RELATED_IMAGE_STANDALONE_PGADMIN=registry.developers.crunchydata.com/crunchydata/crunchy-pgadmin4:ubi8-8.14-2' \
--env 'RELATED_IMAGE_COLLECTOR=ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.119.0' \
--env 'PGO_FEATURE_GATES=TablespaceVolumes=true' \
--env 'RELATED_IMAGE_COLLECTOR=registry.developers.crunchydata.com/crunchydata/postgres-operator:latest' \
--env 'PGO_FEATURE_GATES=TablespaceVolumes=true,OpenTelemetryLogs=true,OpenTelemetryMetrics=true' \
--name 'postgres-operator' ubuntu \
postgres-operator
- name: Install kuttl
Expand Down
6 changes: 6 additions & 0 deletions testing/kuttl/e2e/otel-logging-and-metrics/00--cluster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
apply:
- files/00--create-cluster.yaml
assert:
- files/00-cluster-created.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
apply:
- files/01--add-instrumentation.yaml
assert:
- files/01-instrumentation-added.yaml
63 changes: 63 additions & 0 deletions testing/kuttl/e2e/otel-logging-and-metrics/02-assert-instance.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
commands:
# First, check that all containers in the instance pod are ready.
# Then, grab the collector metrics output and check that a metric from both 5m
# and 5s queries are present, as well as patroni metrics.
# Then, check the collector logs for patroni, pgbackrest, and postgres logs.
# Finally, ensure the monitoring user exists and is configured.
- script: |
retry() { bash -ceu 'printf "$1\nSleeping...\n" && sleep 5' - "$@"; }
check_containers_ready() { bash -ceu 'echo "$1" | jq -e ".[] | select(.type==\"ContainersReady\") | .status==\"True\""' - "$@"; }
contains() { bash -ceu '[[ "$1" == *"$2"* ]]' - "$@"; }

pod=$(kubectl get pods -o name -n "${NAMESPACE}" \
-l postgres-operator.crunchydata.com/cluster=otel-cluster,postgres-operator.crunchydata.com/data=postgres)
[ "$pod" = "" ] && retry "Pod not found" && exit 1

condition_json=$(kubectl get "${pod}" -n "${NAMESPACE}" -o jsonpath="{.status.conditions}")
[ "$condition_json" = "" ] && retry "conditions not found" && exit 1
{ check_containers_ready "$condition_json"; } || {
retry "containers not ready"
exit 1
}

scrape_metrics=$(kubectl exec "${pod}" -c collector -n "${NAMESPACE}" -- \
curl --insecure --silent http://localhost:9187/metrics)
{ contains "${scrape_metrics}" 'ccp_connection_stats_active'; } || {
retry "5 second metric not found"
exit 1
}
{ contains "${scrape_metrics}" 'ccp_database_size_bytes'; } || {
retry "5 minute metric not found"
exit 1
}
{ contains "${scrape_metrics}" 'patroni_postgres_running'; } || {
retry "patroni metric not found"
exit 1
}

logs=$(kubectl logs "${pod}" --namespace "${NAMESPACE}" -c collector | grep InstrumentationScope)
{ contains "${logs}" 'InstrumentationScope patroni'; } || {
retry "patroni logs not found"
exit 1
}
{ contains "${logs}" 'InstrumentationScope pgbackrest'; } || {
retry "pgbackrest logs not found"
exit 1
}
{ contains "${logs}" 'InstrumentationScope postgres'; } || {
retry "postgres logs not found"
exit 1
}

kubectl exec --stdin "${pod}" --namespace "${NAMESPACE}" -c database \
-- psql -qb --set ON_ERROR_STOP=1 --file=- <<'SQL'
DO $$
DECLARE
result record;
BEGIN
SELECT * INTO result FROM pg_catalog.pg_roles WHERE rolname = 'ccp_monitoring';
ASSERT FOUND, 'user not found';
END $$
SQL
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
commands:
# First, check that all containers in the pgbouncer pod are ready.
# Then, scrape the collector metrics and check that pgbouncer metrics are present.
# Then, check the collector logs for pgbouncer logs.
- script: |
retry() { bash -ceu 'printf "$1\nSleeping...\n" && sleep 5' - "$@"; }
check_containers_ready() { bash -ceu 'echo "$1" | jq -e ".[] | select(.type==\"ContainersReady\") | .status==\"True\""' - "$@"; }
contains() { bash -ceu '[[ "$1" == *"$2"* ]]' - "$@"; }

pod=$(kubectl get pods -o name -n "${NAMESPACE}" \
-l postgres-operator.crunchydata.com/cluster=otel-cluster,postgres-operator.crunchydata.com/role=pgbouncer)
[ "$pod" = "" ] && retry "Pod not found" && exit 1

condition_json=$(kubectl get "${pod}" -n "${NAMESPACE}" -o jsonpath="{.status.conditions}")
[ "$condition_json" = "" ] && retry "conditions not found" && exit 1
{ check_containers_ready "$condition_json"; } || {
retry "containers not ready"
exit 1
}

scrape_metrics=$(kubectl exec "${pod}" -c collector -n "${NAMESPACE}" -- \
curl --insecure --silent http://localhost:9187/metrics)
{ contains "${scrape_metrics}" 'ccp_pgbouncer_clients_wait_seconds'; } || {
retry "pgbouncer metric not found"
exit 1
}

logs=$(kubectl logs "${pod}" --namespace "${NAMESPACE}" -c collector | grep InstrumentationScope)
{ contains "${logs}" 'InstrumentationScope pgbouncer'; } || {
retry "pgbouncer logs not found"
exit 1
}
30 changes: 30 additions & 0 deletions testing/kuttl/e2e/otel-logging-and-metrics/04-assert-pgadmin.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
commands:
# First, check that all containers in the pgadmin pod are ready.
# Then, check the collector logs for pgadmin and gunicorn logs.
- script: |
retry() { bash -ceu 'printf "$1\nSleeping...\n" && sleep 5' - "$@"; }
check_containers_ready() { bash -ceu 'echo "$1" | jq -e ".[] | select(.type==\"ContainersReady\") | .status==\"True\""' - "$@"; }
contains() { bash -ceu '[[ "$1" == *"$2"* ]]' - "$@"; }

pod=$(kubectl get pods -o name -n "${NAMESPACE}" \
-l postgres-operator.crunchydata.com/pgadmin=otel-pgadmin)
[ "$pod" = "" ] && retry "Pod not found" && exit 1

condition_json=$(kubectl get "${pod}" -n "${NAMESPACE}" -o jsonpath="{.status.conditions}")
[ "$condition_json" = "" ] && retry "conditions not found" && exit 1
{ check_containers_ready "$condition_json"; } || {
retry "containers not ready"
exit 1
}

logs=$(kubectl logs "${pod}" --namespace "${NAMESPACE}" -c collector | grep InstrumentationScope)
{ contains "${logs}" 'InstrumentationScope pgadmin'; } || {
retry "pgadmin logs not found"
exit 1
}
{ contains "${logs}" 'InstrumentationScope gunicorn.access'; } || {
retry "gunicorn logs not found"
exit 1
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
commands:
# First, check that all containers in the repo host pod are ready.
# Then, ensure that the collector logs for the repo-host do not contain any
# pgbackrest logs as the backup completed before the collector started up and we
# have the collector configured to only ingest new log records on start up.
- script: |
retry() { bash -ceu 'printf "$1\nSleeping...\n" && sleep 5' - "$@"; }
check_containers_ready() { bash -ceu 'echo "$1" | jq -e ".[] | select(.type==\"ContainersReady\") | .status==\"True\""' - "$@"; }
contains() { bash -ceu '[[ "$1" == *"$2"* ]]' - "$@"; }

pod=$(kubectl get pods -o name -n "${NAMESPACE}" \
-l postgres-operator.crunchydata.com/cluster=otel-cluster,postgres-operator.crunchydata.com/data=pgbackrest)
[ "$pod" = "" ] && retry "Pod not found" && exit 1

condition_json=$(kubectl get "${pod}" -n "${NAMESPACE}" -o jsonpath="{.status.conditions}")
[ "$condition_json" = "" ] && retry "conditions not found" && exit 1
{ check_containers_ready "$condition_json"; } || {
retry "containers not ready"
exit 1
}

logs=$(kubectl logs "${pod}" --namespace "${NAMESPACE}" -c collector | grep InstrumentationScope)
{ !(contains "${logs}" 'InstrumentationScope pgbackrest') } || {
retry "pgbackrest logs were found when we did not expect any"
exit 1
}
6 changes: 6 additions & 0 deletions testing/kuttl/e2e/otel-logging-and-metrics/06--backup.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
apply:
- files/06--annotate-cluster.yaml
assert:
- files/06-backup-completed.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
commands:
# First, check that all containers in the repo host pod are ready.
# Then, ensure that the repo-host collector logs have pgbackrest logs.
- script: |
retry() { bash -ceu 'printf "$1\nSleeping...\n" && sleep 5' - "$@"; }
check_containers_ready() { bash -ceu 'echo "$1" | jq -e ".[] | select(.type==\"ContainersReady\") | .status==\"True\""' - "$@"; }
contains() { bash -ceu '[[ "$1" == *"$2"* ]]' - "$@"; }

pod=$(kubectl get pods -o name -n "${NAMESPACE}" \
-l postgres-operator.crunchydata.com/cluster=otel-cluster,postgres-operator.crunchydata.com/data=pgbackrest)
[ "$pod" = "" ] && retry "Pod not found" && exit 1

condition_json=$(kubectl get "${pod}" -n "${NAMESPACE}" -o jsonpath="{.status.conditions}")
[ "$condition_json" = "" ] && retry "conditions not found" && exit 1
{ check_containers_ready "$condition_json"; } || {
retry "containers not ready"
exit 1
}

logs=$(kubectl logs "${pod}" --namespace "${NAMESPACE}" -c collector | grep InstrumentationScope)
{ contains "${logs}" 'InstrumentationScope pgbackrest'; } || {
retry "pgbackrest logs were not found"
exit 1
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
apply:
- files/08--add-custom-queries.yaml
assert:
- files/08-custom-queries-added.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
commands:
# First, check that all containers in the instance pod are ready.
# Then, grab the collector metrics output and check that the two metrics that we
# checked for earlier are no longer there.
# Then, check that the two custom metrics that we added are present.
- script: |
retry() { bash -ceu 'printf "$1\nSleeping...\n" && sleep 5' - "$@"; }
check_containers_ready() { bash -ceu 'echo "$1" | jq -e ".[] | select(.type==\"ContainersReady\") | .status==\"True\""' - "$@"; }
contains() { bash -ceu '[[ "$1" == *"$2"* ]]' - "$@"; }

pod=$(kubectl get pods -o name -n "${NAMESPACE}" \
-l postgres-operator.crunchydata.com/cluster=otel-cluster,postgres-operator.crunchydata.com/data=postgres)
[ "$pod" = "" ] && retry "Pod not found" && exit 1

condition_json=$(kubectl get "${pod}" -n "${NAMESPACE}" -o jsonpath="{.status.conditions}")
[ "$condition_json" = "" ] && retry "conditions not found" && exit 1
{ check_containers_ready "$condition_json"; } || {
retry "containers not ready"
exit 1
}

scrape_metrics=$(kubectl exec "${pod}" -c collector -n "${NAMESPACE}" -- \
curl --insecure --silent http://localhost:9187/metrics)
{ !(contains "${scrape_metrics}" 'ccp_connection_stats_active') } || {
retry "5 second metric still present"
exit 1
}
{ !(contains "${scrape_metrics}" 'ccp_database_size_bytes') } || {
retry "5 minute metric still present"
exit 1
}
{ contains "${scrape_metrics}" 'custom_table_count'; } || {
retry "fast custom metric not found"
exit 1
}
{ contains "${scrape_metrics}" 'custom_pg_stat_statements_row_count'; } || {
retry "slow custom metric not found"
exit 1
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: kuttl.dev/v1beta1
kind: TestStep
apply:
- files/10--add-logs-exporter.yaml
assert:
- files/10-logs-exporter-added.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
commands:
# First, check that the standalone otel-collector container is ready.
# Then, check the standalone collector logs for logs from all six potential
# sources: patroni, pgbackrest, postgres, pgbouncer, pgadmin, and gunicorn.
- script: |
retry() { bash -ceu 'printf "$1\nSleeping...\n" && sleep 5' - "$@"; }
check_containers_ready() { bash -ceu 'echo "$1" | jq -e ".[] | select(.type==\"ContainersReady\") | .status==\"True\""' - "$@"; }
contains() { bash -ceu '[[ "$1" == *"$2"* ]]' - "$@"; }

pod=$(kubectl get pods -o name -n "${NAMESPACE}" -l app=opentelemetry)
[ "$pod" = "" ] && retry "Pod not found" && exit 1

condition_json=$(kubectl get "${pod}" -n "${NAMESPACE}" -o jsonpath="{.status.conditions}")
[ "$condition_json" = "" ] && retry "conditions not found" && exit 1
{ check_containers_ready "$condition_json"; } || {
retry "containers not ready"
exit 1
}

logs=$(kubectl logs "${pod}" --namespace "${NAMESPACE}" -c otel-collector | grep InstrumentationScope)
{ contains "${logs}" 'InstrumentationScope patroni'; } || {
retry "patroni logs not found"
exit 1
}
{ contains "${logs}" 'InstrumentationScope pgbackrest'; } || {
retry "pgbackrest logs not found"
exit 1
}
{ contains "${logs}" 'InstrumentationScope postgres'; } || {
retry "postgres logs not found"
exit 1
}
{ contains "${logs}" 'InstrumentationScope pgbouncer'; } || {
retry "pgbouncer logs not found"
exit 1
}
{ contains "${logs}" 'InstrumentationScope pgadmin'; } || {
retry "pgadmin logs not found"
exit 1
}
{ contains "${logs}" 'InstrumentationScope gunicorn.access'; } || {
retry "gunicorn logs not found"
exit 1
}
29 changes: 29 additions & 0 deletions testing/kuttl/e2e/otel-logging-and-metrics/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Test OTel Logging and Metrics

## Assumptions

This test assumes that the operator has both OpenTelemetryLogs and OpenTelemetryMetrics feature gates turned on and that you are using an operator versioned 5.8 or greater.

## Process

1. Create a basic cluster with pgbouncer and pgadmin in place.
1. Ensure cluster comes up, that all containers are running and ready, and that the initial backup is complete.
2. Add the `instrumentation` spec to both PostgresCluster and PGAdmin manifests.
1. Ensure that OTel collector containers and `crunchy-otel-collector` labels are added to the four pods (postgres instance, repo-host, pgbouncer, & pgadmin) and that the collector containers are running and ready.
2. Assert that the instance pod collector is getting postgres and patroni metrics and postgres, patroni, and pgbackrest logs.
3. Assert that the pgbouncer pod collector is getting pgbouncer metrics and logs.
4. Assert that the pgAdmin pod collector is getting pgAdmin and gunicorn logs.
5. Assert that the repo-host pod collector is NOT getting pgbackrest logs. We do not expect logs yet as the initial backup completed and created a log file; however, we configure the collector to only ingest new logs after it has started up.
6. Create a manual backup and ensure that it completes successfully.
7. Ensure that the repo-host pod collector is now getting pgbackrest logs.
3. Add both "add" and "remove" custom queries to the PostgresCluster `instrumentation` spec and create a ConfigMap that holds the custom queries to add.
1. Ensure that the ConfigMap is created.
2. Assert that the metrics that were removed (which we checked for earlier) are in fact no longer present in the collector metrics.
3. Assert that the custom metrics that were added are present in the collector metrics.
4. Add an `otlp` exporter to both PostgresCluster and PGAdmin `instrumentation` specs and create a standalone OTel collector to receive data from our sidecar collectors.
1. Ensure that the ConfigMap, Service, and Deployment for the standalone OTel collector come up and that the collector container is running and ready.
2. Assert that the standalone collector is receiving logs from all of our components (i.e. the standalone collector is getting logs for postgres, patroni, pgbackrest, pgbouncer, pgadmin, and gunicorn).

### NOTES

It is possible this test could flake if for some reason a component is not producing any logs. If we start to see this happen, we could either create some test steps that execute some actions that should trigger logs or turn up the log levels (although the latter option could create more problems as we have seen issues with the collector when the stream of logs is too voluminous).
Loading