Skip to content

Forward port external query files from stable-2.6#16312

Open
fosterseth wants to merge 13 commits intoansible:develfrom
fosterseth:forward-port/external-query-files
Open

Forward port external query files from stable-2.6#16312
fosterseth wants to merge 13 commits intoansible:develfrom
fosterseth:forward-port/external-query-files

Conversation

@fosterseth
Copy link
Member

@fosterseth fosterseth commented Feb 27, 2026

Summary

Forward port of all external query file commits from tower stable-2.6 to AWX devel.

Reverts the out-of-order #16309 and re-applies all commits in correct order.

Cherry-picked commits

Tower PR Jira Description
#7208 AAP-58441 Add runtime integration for external query collection
#7223 AAP-58451 Add callback plugin discovery for external query files
#7254 AAP-58452 Add version fallback for external query files
#7283 AAP-58456 Unit test suite for external query handling
#7278 AAP-58470 Integration tests (core) for external queries
#7336 Fix: remove unused EXTERNAL_QUERY_PATH constant
#7337 Fix: avoid global sys.modules mutation
#7338 Fix: add queries_dir guard
#7345 Fix: attempt exact query file match before Version parsing

Additional fixes

  • Mock flag_enabled in unit tests to avoid DB access from BaseTask.build_private_data_files

Supersedes #16308 and #16309.


Note

Medium Risk
Changes job execution setup (private data dir contents and ANSIBLE_COLLECTIONS_PATH) when FEATURE_INDIRECT_NODE_COUNTING_ENABLED is on, which could affect ansible collection resolution and job runtime behavior. Risk is mitigated by being feature-flagged and backed by new unit/integration tests.

Overview
When FEATURE_INDIRECT_NODE_COUNTING_ENABLED is enabled, AWX now copies /var/lib/awx/vendor_collections into the job private_data_dir and prepends it to ANSIBLE_COLLECTIONS_PATH, making vendor-provided external query files available inside Execution Environments.

The indirect_instance_count callback plugin is updated to prefer embedded event_query.yml, otherwise discover external query files from the vendor collection and fall back to the nearest compatible (same-major, <= installed) version, with improved guarding around missing vendor collections.

Adds substantial unit coverage for external query discovery/fallback and vendor collection copying, plus live integration tests and test collections/projects that validate precedence, fallback behavior, logging, and no-cross-major fallback.

Written by Cursor Bugbot for commit da723ac. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • New Features

    • External query plugins from vendor collections are now discoverable and available during job execution.
    • Jobs expose vendor collection paths to Ansible so external collection content can be used.
    • Implemented version-aware fallback for external queries, choosing the closest compatible version when exact matches are missing.
  • Tests

    • Added extensive unit and integration tests covering discovery, fallback behavior, counting rules, and related scenarios.

@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds external-query discovery with semantic-version fallback and embedded-query precedence for indirect node counting; exposes vendor_collections to Ansible by copying and prepending vendor paths when the feature flag is enabled; and adds test collections, playbooks, and extensive unit and live tests.

Changes

Cohort / File(s) Summary
Job env & vendor collections
awx/main/tasks/jobs.py
When the indirect-counting feature flag is enabled, copy /var/lib/awx/vendor_collections into private_data_dir/vendor_collections during private data assembly and prepend that path to ANSIBLE_COLLECTIONS_PATH; log successes/warnings.
Callback & query resolution
awx/playbooks/library/indirect_instance_count.py
Added _get_query_file_dir(); updated list_external_queries and find_external_query_with_fallback to use filesystem lookup and semantic-version fallback; v2_playbook_on_stats now prefers embedded event_query.yml and logs fallback/external usage; removed display param from fallback function.
Collection listing helper
awx/playbooks/library/indirect_instance_count.py
Added list_collections(artifacts_manager=None) decorated with collection artifacts manager to enumerate installed collections for callback logic.
Test collections & modules
awx/main/tests/data/collections/host_query_external_v1_0_0/..., .../host_query_external_v1_5_0/..., .../host_query_external_v3_0_0/...
Added demo/test collections (v1.0.0, v1.5.0, v3.0.0) with galaxy.yml manifests and plugins/modules/example.py example modules used by tests.
Test project requirements & playbooks
awx/main/tests/data/projects/test_host_query_external_v*/collections/requirements.yml, .../run_task.yml
Added requirements pointing to local file:// collection repositories and simple playbooks invoking the demo module, registering and printing results.
Live test infra
awx/main/tests/live/tests/conftest.py
Added COLL_DATA constant and copy of collection test data into live test temp folders via _copy_folders in fixture.
Integration tests (live)
awx/main/tests/live/tests/test_indirect_counting_external_queries.py
New end-to-end tests covering exact-match, embedded precedence, version fallback behavior, fallback counting correctness, fallback logging, and no-compatibility scenarios; includes fixtures to set up vendor_collections and helpers for running/cleaning jobs.
Unit tests (discovery & tasks)
awx/main/tests/unit/test_indirect_query_discovery.py, awx/main/tests/unit/test_tasks.py
New unit tests for external-query discovery, version parsing/fallback selection, embedded-vs-external precedence, fallback logging, and vendor copying into private_data_dir; added autouse fixtures patching flag_enabled in two test classes.
Misc test data
awx/main/tests/data/...
Added various test data files (collection manifests, module examples, requirements, playbooks) used by unit and live tests.

Sequence Diagram(s)

sequenceDiagram
    actor Job as Job Execution
    participant Env as Env Builder
    participant FS as Vendor Collections FS
    participant Ansible as Ansible Runtime
    participant Callback as CallbackModule
    participant Resolver as Query Resolver

    Job->>Env: build_env (feature flag enabled)
    Env->>FS: copy /var/lib/awx/vendor_collections -> private_data_dir/vendor_collections
    Env->>Env: prepend vendor path to ANSIBLE_COLLECTIONS_PATH
    Env->>Ansible: export updated env vars
    Ansible->>Ansible: run playbook/tasks
    Ansible->>Callback: v2_playbook_on_stats
    Callback->>Resolver: request query for collection (namespace:name, installed_version)
    Resolver->>FS: resolve embedded event_query.yml in collection artifacts
    alt embedded present
        Resolver-->>Callback: return embedded query (fallback_used=false)
    else embedded missing
        Resolver->>FS: list external_queries files
        Resolver->>Resolver: select exact or nearest lower same-major fallback
        Resolver-->>Callback: return external query + fallback_used flag + used_version
    end
    Callback->>Callback: process query and produce counts
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.93% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: forward-porting external query files from stable-2.6 branch.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
awx/playbooks/library/indirect_instance_count.py (1)

65-83: ⚠️ Potential issue | 🟠 Major

Add is_dir() guard before calling iterdir() on the external queries directory.

At line 73, queries_dir.iterdir() lacks a safety check. If the collection package exists but the extensions/audit/external_queries directory is missing from the distribution, iterdir() will raise FileNotFoundError and break callback execution. The files() API constructs Traversable objects for paths even if they don't exist; you must check with is_dir() before iterating.

Suggested fix
     try:
         queries_dir = files(EXTERNAL_QUERY_COLLECTION) / 'extensions' / 'audit' / 'external_queries'
     except ModuleNotFoundError:
         return versions
 
+    if not queries_dir.is_dir():
+        return versions
+
     for query_file in queries_dir.iterdir():
         match = pattern.match(query_file.name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/playbooks/library/indirect_instance_count.py` around lines 65 - 83, The
code calls queries_dir.iterdir() without checking existence; change the logic
that builds queries_dir (from files(EXTERNAL_QUERY_COLLECTION) / 'extensions' /
'audit' / 'external_queries') to first check queries_dir.is_dir() and if it is
not a directory return the current versions (or skip iteration), then only call
queries_dir.iterdir() when is_dir() is True so FileNotFoundError is avoided;
update the block handling EXTERNAL_QUERY_COLLECTION, queries_dir, and the for
loop accordingly.
🧹 Nitpick comments (1)
awx/main/tests/live/tests/test_indirect_counting_external_queries.py (1)

134-156: Minor: Unused timeout parameter in positive case.

The timeout parameter is only used when expect_records=False (line 154). The positive case has a hardcoded 5-second timeout (20 iterations × 0.25s). Consider using the parameter consistently for both branches.

♻️ Suggested improvement
 def wait_for_indirect_processing(job, expect_records=True, timeout=5):
     ...
     if expect_records:
         # Poll for audit records to appear
-        for _ in range(20):
+        start = time.time()
+        while time.time() - start < timeout:
             if IndirectManagedNodeAudit.objects.filter(job=job).exists():
                 break
             time.sleep(0.25)
         else:
             raise RuntimeError(f'No IndirectManagedNodeAudit records populated for job_id={job.id}')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/tests/live/tests/test_indirect_counting_external_queries.py` around
lines 134 - 156, The positive branch of wait_for_indirect_processing uses a
hardcoded 20×0.25s loop instead of the provided timeout param; update the
function to honor the timeout argument when polling for IndirectManagedNodeAudit
records (e.g., compute iterations from timeout/interval or use a deadline-based
while loop) so the same timeout controls both expect_records=True and False
branches; keep the call to save_indirect_host_entries.delay(job.id,
wait_for_events=False) and the final RuntimeError if no records appear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@awx/main/tests/data/projects/test_host_query_external_v1_0_0/run_task.yml`:
- Around line 6-7: The YAML shows `register: result` placed at the module level
under the task `demo.external.example`, causing a parsing error; move `register:
result` to be a child of the task (indent it under the task entry so it is part
of the task dict for `demo.external.example`) so the `register` directive is
associated with that task rather than the module-level key.

In `@awx/main/tests/unit/test_indirect_query_discovery.py`:
- Around line 49-60: The test currently mutates sys.modules directly (e.g.,
assignments like sys.modules['ansible'] = mock.MagicMock(),
sys.modules['ansible.plugins.callback'] = _mock_callback_module, and setting
__version__ on sys.modules['ansible.release']), which affects global import
state; replace these assignments with a pytest fixture that uses
mock.patch.dict(sys.modules, {...}) to temporarily inject the same keys and
values (including the _mock_callback_module and the mock with
__version__='2.16.0') and yield so the mocks are scoped to each test (or module)
instead of persisting for the whole test process, then add that fixture to tests
in test_indirect_query_discovery.py that need the fake ansible modules.

---

Outside diff comments:
In `@awx/playbooks/library/indirect_instance_count.py`:
- Around line 65-83: The code calls queries_dir.iterdir() without checking
existence; change the logic that builds queries_dir (from
files(EXTERNAL_QUERY_COLLECTION) / 'extensions' / 'audit' / 'external_queries')
to first check queries_dir.is_dir() and if it is not a directory return the
current versions (or skip iteration), then only call queries_dir.iterdir() when
is_dir() is True so FileNotFoundError is avoided; update the block handling
EXTERNAL_QUERY_COLLECTION, queries_dir, and the for loop accordingly.

---

Nitpick comments:
In `@awx/main/tests/live/tests/test_indirect_counting_external_queries.py`:
- Around line 134-156: The positive branch of wait_for_indirect_processing uses
a hardcoded 20×0.25s loop instead of the provided timeout param; update the
function to honor the timeout argument when polling for IndirectManagedNodeAudit
records (e.g., compute iterations from timeout/interval or use a deadline-based
while loop) so the same timeout controls both expect_records=True and False
branches; keep the call to save_indirect_host_entries.delay(job.id,
wait_for_events=False) and the final RuntimeError if no records appear.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0f2692b and 51e06b5.

📒 Files selected for processing (17)
  • awx/main/tasks/jobs.py
  • awx/main/tests/data/collections/host_query_external_v1_0_0/galaxy.yml
  • awx/main/tests/data/collections/host_query_external_v1_0_0/plugins/modules/example.py
  • awx/main/tests/data/collections/host_query_external_v1_5_0/galaxy.yml
  • awx/main/tests/data/collections/host_query_external_v1_5_0/plugins/modules/example.py
  • awx/main/tests/data/collections/host_query_external_v3_0_0/galaxy.yml
  • awx/main/tests/data/collections/host_query_external_v3_0_0/plugins/modules/example.py
  • awx/main/tests/data/projects/test_host_query_external_v1_0_0/collections/requirements.yml
  • awx/main/tests/data/projects/test_host_query_external_v1_0_0/run_task.yml
  • awx/main/tests/data/projects/test_host_query_external_v1_5_0/collections/requirements.yml
  • awx/main/tests/data/projects/test_host_query_external_v1_5_0/run_task.yml
  • awx/main/tests/data/projects/test_host_query_external_v3_0_0/collections/requirements.yml
  • awx/main/tests/data/projects/test_host_query_external_v3_0_0/run_task.yml
  • awx/main/tests/live/tests/conftest.py
  • awx/main/tests/live/tests/test_indirect_counting_external_queries.py
  • awx/main/tests/unit/test_indirect_query_discovery.py
  • awx/playbooks/library/indirect_instance_count.py

@djulich
Copy link
Contributor

djulich commented Feb 28, 2026

@fosterseth, I forgot to mention that this feature is generally targeted at downstream aap, it is not supposed to be available in awx (@djyasin, can you confirm?) In that case we should move your PR to tower, right?

@tvo318
Copy link
Member

tvo318 commented Mar 2, 2026

@djulich - my understanding is 2.7 needs to be done in devel on awx... Anything already released needs to be opened in tower.

@djulich
Copy link
Contributor

djulich commented Mar 3, 2026

@fosterseth, the following PRs contain fixes for some of the PRs you used as a basis for this PR:

https://github.com/ansible/tower/pull/7336
https://github.com/ansible/tower/pull/7337
https://github.com/ansible/tower/pull/7338
https://github.com/ansible/tower/pull/7345

Can we incorporate these into this PR as well? Or should we forward port them seperately after this one here has been merged?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
awx/playbooks/library/indirect_instance_count.py (1)

178-194: ⚠️ Potential issue | 🟠 Major

Handle ModuleNotFoundError when accessing embedded query file.

files() can raise ModuleNotFoundError if a collection exists on disk but isn't importable as a Python package (e.g., malformed structure). Unlike _get_query_file_dir() which handles this exception, line 179 does not—causing the entire callback to crash and preventing ansible_data.json from being written for all collections.

🛡️ Proposed fix to wrap the embedded query check
             # 1. Check for embedded query file (takes precedence)
+            try:
+                embedded_query_file = files(f'ansible_collections.{candidate.namespace}.{candidate.name}') / 'extensions' / 'audit' / 'event_query.yml'
+            except ModuleNotFoundError:
+                embedded_query_file = None
+
+            if embedded_query_file is not None and embedded_query_file.exists():
-            embedded_query_file = files(f'ansible_collections.{candidate.namespace}.{candidate.name}') / 'extensions' / 'audit' / 'event_query.yml'
-            if embedded_query_file.exists():
                 with embedded_query_file.open('r') as f:
                     collection_print['host_query'] = f.read()
                 self._display.vv(f"Using embedded query for {candidate.fqcn} v{candidate.ver}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/playbooks/library/indirect_instance_count.py` around lines 178 - 194,
Wrap the embedded collection lookup that calls files(...) and subsequent
embedded_query_file access in a try/except that catches ModuleNotFoundError; on
catching it, log a verbose message via self._display (e.g., vv or v) noting the
collection is not importable and then proceed to the existing external-query
path (i.e., call find_external_query_with_fallback with candidate.namespace,
candidate.name, candidate.ver and populate collection_print['host_query'] as
before). Ensure the exception handling surrounds the creation of
embedded_query_file and its .exists()/open() usage so the callback continues and
ansible_data.json still gets written for other collections.
🧹 Nitpick comments (2)
awx/main/tests/unit/test_indirect_query_discovery.py (2)

15-25: Harden the chainable path mock to validate path segments.

Line 23 only checks traversal depth, so incorrect segment names would still pass. That weakens these tests against regressions in query path construction.

Proposed refactor
-def create_chainable_path_mock(final_mock, depth=3):
-    """Mock that supports chained / operations: mock / 'a' / 'b' / 'c' -> final_mock"""
+def create_chainable_path_mock(final_mock, expected_parts):
+    """Mock that supports chained / operations and validates each segment."""

     class ChainableMock:
-        def __init__(self, d=0):
-            self.d = d
+        def __init__(self, idx=0):
+            self.idx = idx

         def __truediv__(self, other):
-            return final_mock if self.d >= depth - 1 else ChainableMock(self.d + 1)
+            assert self.idx < len(expected_parts)
+            assert other == expected_parts[self.idx]
+            return final_mock if self.idx >= len(expected_parts) - 1 else ChainableMock(self.idx + 1)

     return ChainableMock()

Example call sites:

mock_files.return_value = create_chainable_path_mock(
    mock_queries_dir,
    expected_parts=('extensions', 'audit', 'queries'),
)

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/tests/unit/test_indirect_query_discovery.py` around lines 15 - 25,
The create_chainable_path_mock currently only enforces traversal depth; change
it to accept an optional expected_parts tuple and have ChainableMock.__truediv__
validate each segment name against expected_parts[self.d] as it traverses,
raising an AssertionError (or ValueError) if a segment mismatches, and still
returning final_mock when depth is reached; update defaults so behavior is
unchanged when expected_parts is None and preserve depth and final_mock
semantics.

414-436: Tighten missing-source assertions to avoid false positives.

mock_exists.return_value = False affects every path check, and assert result is not None is a weak proxy for “handled gracefully.” This can pass even if the wrong path is being checked.

Proposed test tightening
 def test_missing_source_handled_gracefully(self, mock_exists, mock_copytree, mock_logger, mock_flag):
@@
-    mock_exists.return_value = False
+    mock_exists.side_effect = lambda p: p != '/var/lib/awx/vendor_collections'
@@
-    result = task.build_private_data_files(task.instance, private_data_dir)
+    task.build_private_data_files(task.instance, private_data_dir)

     # copytree should not be called when source doesn't exist
     mock_copytree.assert_not_called()
-    # Function should complete without raising an exception
-    assert result is not None
+    mock_logger.warning.assert_called()

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/tests/unit/test_indirect_query_discovery.py` around lines 414 - 436,
The test_missing_source_handled_gracefully currently uses
mock_exists.return_value = False which can mask which path was checked; update
it to assert the specific source path was queried and only that call returned
False, e.g., set mock_exists.side_effect or inspect mock_exists.call_args_list
to confirm os.path.exists was called for the expected private-data collection
source path used by BaseTask.build_private_data_files, then keep
mock_copytree.assert_not_called() and also assert mock_logger.error or
mock_logger.exception was not called (or that a specific logger.warning was
emitted) to prove the missing-source was handled gracefully; reference
BaseTask.build_private_data_files, mock_exists, mock_copytree and mock_logger to
locate and tighten the assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@awx/main/tests/unit/test_indirect_query_discovery.py`:
- Around line 130-147: Add a regression test to cover non-PEP440 exact-version
filenames: extend the existing test_exact_match_preferred to call
find_external_query_with_fallback with a non-PEP440 version string (e.g.
'v2.5.0' or '2.5.0+build') and set up mock_exact_file (via
create_queries_dir_mock and create_chainable_path_mock) to return an existing
file whose open() yields the expected content; assert content, fallback_used is
False, and version equals the non-PEP440 string so exact-file lookup is
preserved for non-PEP440 versions.

---

Outside diff comments:
In `@awx/playbooks/library/indirect_instance_count.py`:
- Around line 178-194: Wrap the embedded collection lookup that calls files(...)
and subsequent embedded_query_file access in a try/except that catches
ModuleNotFoundError; on catching it, log a verbose message via self._display
(e.g., vv or v) noting the collection is not importable and then proceed to the
existing external-query path (i.e., call find_external_query_with_fallback with
candidate.namespace, candidate.name, candidate.ver and populate
collection_print['host_query'] as before). Ensure the exception handling
surrounds the creation of embedded_query_file and its .exists()/open() usage so
the callback continues and ansible_data.json still gets written for other
collections.

---

Nitpick comments:
In `@awx/main/tests/unit/test_indirect_query_discovery.py`:
- Around line 15-25: The create_chainable_path_mock currently only enforces
traversal depth; change it to accept an optional expected_parts tuple and have
ChainableMock.__truediv__ validate each segment name against
expected_parts[self.d] as it traverses, raising an AssertionError (or
ValueError) if a segment mismatches, and still returning final_mock when depth
is reached; update defaults so behavior is unchanged when expected_parts is None
and preserve depth and final_mock semantics.
- Around line 414-436: The test_missing_source_handled_gracefully currently uses
mock_exists.return_value = False which can mask which path was checked; update
it to assert the specific source path was queried and only that call returned
False, e.g., set mock_exists.side_effect or inspect mock_exists.call_args_list
to confirm os.path.exists was called for the expected private-data collection
source path used by BaseTask.build_private_data_files, then keep
mock_copytree.assert_not_called() and also assert mock_logger.error or
mock_logger.exception was not called (or that a specific logger.warning was
emitted) to prove the missing-source was handled gracefully; reference
BaseTask.build_private_data_files, mock_exists, mock_copytree and mock_logger to
locate and tighten the assertions.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 000ecde and af057f2.

📒 Files selected for processing (2)
  • awx/main/tests/unit/test_indirect_query_discovery.py
  • awx/playbooks/library/indirect_instance_count.py

Comment on lines +130 to +147
def test_exact_match_preferred(self, mock_files):
"""AC7.4: Exact version match is preferred over fallback version."""
from awx.playbooks.library.indirect_instance_count import find_external_query_with_fallback

mock_exact_file = mock.Mock()
mock_exact_file.exists.return_value = True
mock_exact_file.open.return_value.__enter__ = mock.Mock(return_value=StringIO('exact_version_query'))
mock_exact_file.open.return_value.__exit__ = mock.Mock(return_value=False)

queries_dir = create_queries_dir_mock(lambda f: mock_exact_file)
mock_files.return_value = create_chainable_path_mock(queries_dir)

content, fallback_used, version = find_external_query_with_fallback('demo', 'external', '2.5.0')

assert content == 'exact_version_query'
assert fallback_used is False
assert version == '2.5.0'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add a regression test for non-PEP440 exact-version filenames.

This exact-match test only uses 2.5.0 (PEP440-valid). The forward-ported fix explicitly preserves exact-file lookup for non-PEP440 version strings; that case should be covered to prevent regressions.

Proposed additional test
 `@mock.patch`('awx.playbooks.library.indirect_instance_count.files')
 def test_exact_match_preferred(self, mock_files):
@@
     assert fallback_used is False
     assert version == '2.5.0'
+
+@mock.patch('awx.playbooks.library.indirect_instance_count.files')
+def test_exact_match_with_non_pep440_version(self, mock_files):
+    from awx.playbooks.library.indirect_instance_count import find_external_query_with_fallback
+
+    mock_exact_file = mock.Mock()
+    mock_exact_file.exists.return_value = True
+    mock_exact_file.open.return_value.__enter__ = mock.Mock(return_value=StringIO('non_pep440_exact'))
+    mock_exact_file.open.return_value.__exit__ = mock.Mock(return_value=False)
+
+    queries_dir = create_queries_dir_mock(lambda _filename: mock_exact_file)
+    mock_files.return_value = create_chainable_path_mock(queries_dir, ('extensions', 'audit', 'queries'))
+
+    content, fallback_used, version = find_external_query_with_fallback('demo', 'external', '7.4.z')
+
+    assert content == 'non_pep440_exact'
+    assert fallback_used is False
+    assert version == '7.4.z'

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🧰 Tools
🪛 Ruff (0.15.2)

[warning] 139-139: Unused lambda argument: f

(ARG005)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/tests/unit/test_indirect_query_discovery.py` around lines 130 - 147,
Add a regression test to cover non-PEP440 exact-version filenames: extend the
existing test_exact_match_preferred to call find_external_query_with_fallback
with a non-PEP440 version string (e.g. 'v2.5.0' or '2.5.0+build') and set up
mock_exact_file (via create_queries_dir_mock and create_chainable_path_mock) to
return an existing file whose open() yields the expected content; assert
content, fallback_used is False, and version equals the non-PEP440 string so
exact-file lookup is preserved for non-PEP440 versions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
awx/main/tests/unit/test_indirect_query_discovery.py (1)

133-150: ⚠️ Potential issue | 🟠 Major

Add a regression test for exact-match lookup with non-PEP440 versions.

test_exact_match_preferred only validates a PEP440 version (2.5.0). The forward-ported behavior specifically protects exact filename matching before semantic version parsing; without a non-PEP440 case, that fix can regress silently.

Proposed test addition
 `@mock.patch`('awx.playbooks.library.indirect_instance_count.files')
 def test_exact_match_preferred(self, mock_files):
@@
     assert content == 'exact_version_query'
     assert fallback_used is False
     assert version == '2.5.0'
+
+@mock.patch('awx.playbooks.library.indirect_instance_count.files')
+def test_exact_match_with_non_pep440_version(self, mock_files):
+    from awx.playbooks.library.indirect_instance_count import find_external_query_with_fallback
+
+    mock_exact_file = mock.Mock()
+    mock_exact_file.exists.return_value = True
+    mock_exact_file.open.return_value.__enter__ = mock.Mock(return_value=StringIO('non_pep440_exact_query'))
+    mock_exact_file.open.return_value.__exit__ = mock.Mock(return_value=False)
+
+    queries_dir = create_queries_dir_mock(lambda _filename: mock_exact_file)
+    mock_files.return_value = create_chainable_path_mock(queries_dir)
+
+    content, fallback_used, version = find_external_query_with_fallback('demo', 'external', '7.4.z')
+
+    assert content == 'non_pep440_exact_query'
+    assert fallback_used is False
+    assert version == '7.4.z'

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/tests/unit/test_indirect_query_discovery.py` around lines 133 - 150,
Add a regression test mirroring test_exact_match_preferred that uses a
non-PEP440 version string (e.g., 'v2.5' or '2.5-fix') to ensure
find_external_query_with_fallback still prefers an exact filename match before
any semantic version parsing; reuse the same mocks (mock_exact_file,
create_queries_dir_mock, mock_files) and assert content equals the exact file
contents, fallback_used is False, and version equals the non-PEP440 string
passed in, referencing the find_external_query_with_fallback function and
test_exact_match_preferred as the locations to model the new test after.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@awx/main/tests/unit/test_indirect_query_discovery.py`:
- Around line 133-150: Add a regression test mirroring
test_exact_match_preferred that uses a non-PEP440 version string (e.g., 'v2.5'
or '2.5-fix') to ensure find_external_query_with_fallback still prefers an exact
filename match before any semantic version parsing; reuse the same mocks
(mock_exact_file, create_queries_dir_mock, mock_files) and assert content equals
the exact file contents, fallback_used is False, and version equals the
non-PEP440 string passed in, referencing the find_external_query_with_fallback
function and test_exact_match_preferred as the locations to model the new test
after.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between af057f2 and 35e75a3.

📒 Files selected for processing (1)
  • awx/main/tests/unit/test_indirect_query_discovery.py

shutil.copytree(vendor_src, vendor_dest)
logger.debug(f"Copied vendor collections from {vendor_src} to {vendor_dest}")
except Exception as e:
logger.warning(f"Failed to copy vendor collections: {e}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vendor collection copy runs for all task types unnecessarily

Low Severity

The vendor collection shutil.copytree is placed in BaseTask.build_private_data_files, which runs for every task type (RunJob, RunProjectUpdate, RunInventoryUpdate, RunAdHocCommand, RunSystemJob). However, only RunJob.build_env adds the vendor collections path to ANSIBLE_COLLECTIONS_PATH — the other task types never reference the copied data. This means every project sync, inventory update, and ad-hoc command performs a needless recursive directory copy when the feature flag is enabled.

Additional Locations (1)

Fix in Cursor Fix in Web

fosterseth and others added 12 commits March 3, 2026 18:06
…ible#7208)

Extend build_private_data_files() to copy vendor collections from
/var/lib/awx/vendor_collections/ to the job's private_data_dir,
making external query files available to the indirect node counting
callback plugin in execution environments.

Changes:
- Copy vendor_collections to private_data_dir during job preparation
- Add vendor_collections path to ANSIBLE_COLLECTIONS_PATH in build_env()
- Gracefully handle missing source directory with warning log
- Feature gated by FEATURE_INDIRECT_NODE_COUNTING_ENABLED flag

This enables external query file discovery for indirect node counting
across all deployment types (RPM, Podman, OpenShift, Kubernetes) using
the existing private_data_dir mechanism.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…ery files (ansible#7223)

* AAP-58451: Add callback plugin discovery for external query files

Extend the indirect_instance_count callback plugin to discover and load
external query files from the bundled redhat.indirect_accounting collection
when embedded queries are not present in the target collection.

Changes:
- Add external query discovery with precedence (embedded queries first)
- External query path: redhat.indirect_accounting/extensions/audit/
  external_queries/{namespace}.{name}.{version}.yml
- Use self._display.v() for external query messages (visible with -v)
- Use self._display.vv() for embedded query messages (visible with -vv)
- Fix: Change .exists() to .is_file() per Traversable ABC
- Handle missing external query collection gracefully (ModuleNotFoundError)

Note: This implements exact version match only. Version fallback logic
is covered in AAP-58452.

* fix CI error when using Traversable.is_file

* Add minimal implementation for AAP-58451

* Fix formatting

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Add unit tests for external query handling

* Refactor unit tests for external query handling

* Refactor indirect node counting callback code to improve testing code

* Refactor unit tests for external query handling for improved callback code

* Fix test for majore version boundary check

* Fix weaknesses in some unit tests

* Make callback plugin module self contained, independent from awx
* Add collection for testing external queries

* Add query files for testing external query file runtime integration

* Add live tests for external query file runtime integration

* Remove redundant wait for events and refactor test data folders
The AAP-58441 cherry-pick added a flag_enabled() call in
BaseTask.build_private_data_files(), which is called by all task types.
Tests for RunInventoryUpdate and RunJob credentials now hit this code
path and need the flag mocked to avoid database access in unit tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The constant was defined but never referenced — the path is constructed
inline via Traversable's `/` operator which requires individual segments,
not a slash-separated string.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
)

The exact-version filename check does not require PEP440 parsing, but
Version() was called first, causing early return on non-PEP440 version
strings even when an exact file exists on disk. Move the exact file
check before Version parsing so fallback logic only parses when needed.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The _get_query_file_dir() guard from ansible#7338 calls is_dir() on the
queries directory, but QueriesDirMock didn't implement it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fosterseth fosterseth force-pushed the forward-port/external-query-files branch from 35e75a3 to da723ac Compare March 3, 2026 23:06
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

installed_version_object = Version(installed_version)
except InvalidVersion:
# Can't do version comparison for fallback
return None, False, None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant _get_query_file_dir call in fallback path

Low Severity

find_external_query_with_fallback calls _get_query_file_dir() at line 108, obtains queries_dir, and uses it for exact-match lookup. It then calls list_external_queries() at line 123, which internally calls _get_query_file_dir() again — performing a second files(EXTERNAL_QUERY_COLLECTION) import resolution and is_dir() filesystem check for the same directory. list_external_queries could accept an optional queries_dir parameter to avoid the redundant lookup.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
awx/main/tests/unit/test_indirect_query_discovery.py (1)

133-150: ⚠️ Potential issue | 🟠 Major

Add non-PEP440 exact-match regression coverage.

test_exact_match_preferred currently validates only 2.5.0. The exact-file-before-parse behavior can regress for non-PEP440 version strings (for example 7.4.z) without a dedicated test.

Proposed test addition
 `@mock.patch`('awx.playbooks.library.indirect_instance_count.files')
 def test_exact_match_preferred(self, mock_files):
@@
     assert content == 'exact_version_query'
     assert fallback_used is False
     assert version == '2.5.0'
+
+@mock.patch('awx.playbooks.library.indirect_instance_count.files')
+def test_exact_match_with_non_pep440_version(self, mock_files):
+    from awx.playbooks.library.indirect_instance_count import find_external_query_with_fallback
+
+    mock_exact_file = mock.Mock()
+    mock_exact_file.exists.return_value = True
+    mock_exact_file.open.return_value.__enter__ = mock.Mock(return_value=StringIO('non_pep440_exact'))
+    mock_exact_file.open.return_value.__exit__ = mock.Mock(return_value=False)
+
+    queries_dir = create_queries_dir_mock(lambda _filename: mock_exact_file)
+    mock_files.return_value = create_chainable_path_mock(queries_dir)
+
+    content, fallback_used, version = find_external_query_with_fallback('demo', 'external', '7.4.z')
+
+    assert content == 'non_pep440_exact'
+    assert fallback_used is False
+    assert version == '7.4.z'

Verification (expect at least one non-PEP440-like literal in find_external_query_with_fallback(...) test calls):

#!/bin/bash
set -euo pipefail

python - <<'PY'
import ast, pathlib, re

path = pathlib.Path("awx/main/tests/unit/test_indirect_query_discovery.py")
tree = ast.parse(path.read_text())

calls = []
for node in ast.walk(tree):
    if isinstance(node, ast.Call):
        fn = node.func
        name = fn.id if isinstance(fn, ast.Name) else (fn.attr if isinstance(fn, ast.Attribute) else None)
        if name == "find_external_query_with_fallback" and len(node.args) >= 3:
            arg = node.args[2]
            if isinstance(arg, ast.Constant) and isinstance(arg.value, str):
                v = arg.value
                non_pep440_like = bool(re.search(r"[A-Za-z+]", v))
                calls.append((node.lineno, v, non_pep440_like))

print("find_external_query_with_fallback version literals:")
for line, version, non_pep440_like in sorted(calls):
    marker = " <- non-PEP440-like" if non_pep440_like else ""
    print(f"Line {line}: {version}{marker}")
PY

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/tests/unit/test_indirect_query_discovery.py` around lines 133 - 150,
Add a regression test to ensure find_external_query_with_fallback prefers an
exact file for non-PEP440 version strings: create a new test (or extend
test_exact_match_preferred) that mirrors the mock setup using mock_exact_file
and create_queries_dir_mock but calls find_external_query_with_fallback with a
non-PEP440 literal (e.g. "7.4.z") as the version argument, then assert content
equals the exact file contents, fallback_used is False, and version equals the
literal; reference the existing test name test_exact_match_preferred and the
function find_external_query_with_fallback to locate where to add this case.
🧹 Nitpick comments (1)
awx/main/tasks/jobs.py (1)

380-384: Avoid blind exception handling in vendor collection copy path.

Catching Exception here can hide non-I/O bugs and makes failures hard to diagnose for a feature-gated path. Narrow this to filesystem errors and log traceback.

Suggested change
-                except Exception as e:
-                    logger.warning(f"Failed to copy vendor collections: {e}")
+                except OSError:
+                    logger.exception("Failed to copy vendor collections from %s to %s", vendor_src, vendor_dest)

As per coding guidelines, "-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/tasks/jobs.py` around lines 380 - 384, The current broad except
around shutil.copytree(vendor_src, vendor_dest) swallows non-filesystem errors;
narrow it to filesystem-related exceptions (e.g., catch OSError and
shutil.Error) and log the full traceback via logger.exception or
logger.warning(..., exc_info=True) so failures are diagnosable; update the
try/except block that wraps shutil.copytree and the logger call referencing
vendor_src and vendor_dest accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@awx/main/tests/live/tests/test_indirect_counting_external_queries.py`:
- Around line 46-47: The teardown currently removes the global
VENDOR_COLLECTIONS_BASE directory wholesale; change the teardown to only remove
paths created by the fixture under VENDOR_COLLECTIONS_BASE (e.g., specific
subdirectories or files tracked by the fixture) instead of deleting
VENDOR_COLLECTIONS_BASE itself. Locate the teardown logic that references
VENDOR_COLLECTIONS_BASE (in the test or fixture cleanup between lines ~62-108)
and modify it to iterate over and remove only the fixture-owned subpaths you
created (or keep a list of created paths and remove those), leaving the global
VENDOR_COLLECTIONS_BASE root intact.

In `@awx/playbooks/library/indirect_instance_count.py`:
- Around line 179-183: The embedded query lookup using
files(f'ansible_collections.{candidate.namespace}.{candidate.name}') can raise
ModuleNotFoundError and abort the whole callback; wrap the files(...) call and
subsequent .exists()/.open() usage in a try/except that catches
ModuleNotFoundError (and optionally ImportError) and skips setting
collection_print['host_query'] for that candidate so the fallback query logic
remains reachable; keep references to embedded_query_file, candidate,
collection_print, and self._display.vv so the exception handler logs a verbose
message and continues processing other candidates.
- Around line 147-149: The code uses set() to combine C.COLLECTIONS_PATHS and
AnsibleCollectionConfig.collection_paths which loses insertion order and causes
nondeterministic iteration; change to an order-preserving merge (e.g., build
collections_search_paths by concatenating the two lists and removing duplicates
while preserving first-seen order using dict.fromkeys or an OrderedDict) so
precedence of C.COLLECTIONS_PATHS over AnsibleCollectionConfig.collection_paths
is preserved, then call
find_existing_collections(list(collections_search_paths), artifacts_manager,
dedupe=False) as before; update default_collections_path to be a list (or
preserve it only for display) and reference collections_search_paths in that
order-preserving form.

---

Duplicate comments:
In `@awx/main/tests/unit/test_indirect_query_discovery.py`:
- Around line 133-150: Add a regression test to ensure
find_external_query_with_fallback prefers an exact file for non-PEP440 version
strings: create a new test (or extend test_exact_match_preferred) that mirrors
the mock setup using mock_exact_file and create_queries_dir_mock but calls
find_external_query_with_fallback with a non-PEP440 literal (e.g. "7.4.z") as
the version argument, then assert content equals the exact file contents,
fallback_used is False, and version equals the literal; reference the existing
test name test_exact_match_preferred and the function
find_external_query_with_fallback to locate where to add this case.

---

Nitpick comments:
In `@awx/main/tasks/jobs.py`:
- Around line 380-384: The current broad except around
shutil.copytree(vendor_src, vendor_dest) swallows non-filesystem errors; narrow
it to filesystem-related exceptions (e.g., catch OSError and shutil.Error) and
log the full traceback via logger.exception or logger.warning(...,
exc_info=True) so failures are diagnosable; update the try/except block that
wraps shutil.copytree and the logger call referencing vendor_src and vendor_dest
accordingly.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 35e75a3 and da723ac.

📒 Files selected for processing (18)
  • awx/main/tasks/jobs.py
  • awx/main/tests/data/collections/host_query_external_v1_0_0/galaxy.yml
  • awx/main/tests/data/collections/host_query_external_v1_0_0/plugins/modules/example.py
  • awx/main/tests/data/collections/host_query_external_v1_5_0/galaxy.yml
  • awx/main/tests/data/collections/host_query_external_v1_5_0/plugins/modules/example.py
  • awx/main/tests/data/collections/host_query_external_v3_0_0/galaxy.yml
  • awx/main/tests/data/collections/host_query_external_v3_0_0/plugins/modules/example.py
  • awx/main/tests/data/projects/test_host_query_external_v1_0_0/collections/requirements.yml
  • awx/main/tests/data/projects/test_host_query_external_v1_0_0/run_task.yml
  • awx/main/tests/data/projects/test_host_query_external_v1_5_0/collections/requirements.yml
  • awx/main/tests/data/projects/test_host_query_external_v1_5_0/run_task.yml
  • awx/main/tests/data/projects/test_host_query_external_v3_0_0/collections/requirements.yml
  • awx/main/tests/data/projects/test_host_query_external_v3_0_0/run_task.yml
  • awx/main/tests/live/tests/conftest.py
  • awx/main/tests/live/tests/test_indirect_counting_external_queries.py
  • awx/main/tests/unit/test_indirect_query_discovery.py
  • awx/main/tests/unit/test_tasks.py
  • awx/playbooks/library/indirect_instance_count.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • awx/main/tests/data/projects/test_host_query_external_v1_0_0/run_task.yml
  • awx/main/tests/data/projects/test_host_query_external_v3_0_0/collections/requirements.yml
  • awx/main/tests/data/projects/test_host_query_external_v1_0_0/collections/requirements.yml
  • awx/main/tests/data/collections/host_query_external_v1_0_0/plugins/modules/example.py
  • awx/main/tests/data/projects/test_host_query_external_v1_5_0/collections/requirements.yml
  • awx/main/tests/live/tests/conftest.py
  • awx/main/tests/data/collections/host_query_external_v1_0_0/galaxy.yml
  • awx/main/tests/data/projects/test_host_query_external_v3_0_0/run_task.yml

Comment on lines +46 to +47
VENDOR_COLLECTIONS_BASE = '/var/lib/awx/vendor_collections'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Limit teardown to fixture-owned paths; don’t delete the global root.

Line 107 currently removes /var/lib/awx/vendor_collections wholesale. That can erase unrelated collections/fixtures and create cross-test interference in shared environments.

Safer teardown scope
-    # Cleanup
-    shutil.rmtree(VENDOR_COLLECTIONS_BASE, ignore_errors=True)
+    # Cleanup only what this fixture created
+    shutil.rmtree(base, ignore_errors=True)

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 62-108

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/tests/live/tests/test_indirect_counting_external_queries.py` around
lines 46 - 47, The teardown currently removes the global VENDOR_COLLECTIONS_BASE
directory wholesale; change the teardown to only remove paths created by the
fixture under VENDOR_COLLECTIONS_BASE (e.g., specific subdirectories or files
tracked by the fixture) instead of deleting VENDOR_COLLECTIONS_BASE itself.
Locate the teardown logic that references VENDOR_COLLECTIONS_BASE (in the test
or fixture cleanup between lines ~62-108) and modify it to iterate over and
remove only the fixture-owned subpaths you created (or keep a list of created
paths and remove those), leaving the global VENDOR_COLLECTIONS_BASE root intact.

Comment on lines +147 to +149
default_collections_path = set(C.COLLECTIONS_PATHS)
collections_search_paths = default_collections_path | set(AnsibleCollectionConfig.collection_paths)
collections = list(find_existing_collections(list(collections_search_paths), artifacts_manager, dedupe=False))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /repo && git ls-files | grep -E 'indirect_instance_count|collections' | head -20

Repository: ansible/awx

Length of output: 110


🏁 Script executed:

cat -n awx/playbooks/library/indirect_instance_count.py | head -160

Repository: ansible/awx

Length of output: 6755


🏁 Script executed:

# Search for find_existing_collections function definition
rg "def find_existing_collections" --type py

Repository: ansible/awx

Length of output: 37


🏁 Script executed:

# Check imports and understand where find_existing_collections comes from
head -50 awx/playbooks/library/indirect_instance_count.py

Repository: ansible/awx

Length of output: 1630


🏁 Script executed:

rg "list_collections" awx/playbooks/library/indirect_instance_count.py

Repository: ansible/awx

Length of output: 145


🏁 Script executed:

# Search for other collection discovery patterns in the codebase
rg "find_existing_collections|COLLECTIONS_PATHS|collection_paths" awx/ --type py -A 3 -B 1 | head -100

Repository: ansible/awx

Length of output: 1110


🏁 Script executed:

# Check if there are tests for the list_collections function
fd "test.*indirect_instance_count|indirect_instance_count.*test" --type f

Repository: ansible/awx

Length of output: 37


🏁 Script executed:

# Search for documentation or comments about collection path ordering in AWX
rg "path.*order|dedupe|collection.*discovery" awx/ --type py -i | head -30

Repository: ansible/awx

Length of output: 317


🏁 Script executed:

# Find where list_collections is called and how it's used
rg "list_collections" awx/playbooks/library/indirect_instance_count.py -A 5 -B 2

Repository: ansible/awx

Length of output: 779


🏁 Script executed:

# Check if there's any precedence or priority logic based on path order
rg "collection.*priority|precedence|first.*collection|collection.*first" awx/playbooks/library/ --type py -i

Repository: ansible/awx

Length of output: 169


🏁 Script executed:

# Search the Ansible documentation or comments in the repo about collection path precedence
rg "collection.*path.*order|path.*collection.*precedence" --type py -i | head -20

Repository: ansible/awx

Length of output: 37


🏁 Script executed:

# Look for actual usage context in larger section around list_collections
cat -n awx/playbooks/library/indirect_instance_count.py | sed -n '143,200p'

Repository: ansible/awx

Length of output: 3151


🏁 Script executed:

# Look for any comments or documentation about collection discovery order
rg "dedupe|find_existing_collections" awx/ --type py -B 2 -A 2

Repository: ansible/awx

Length of output: 1049


🏁 Script executed:

# Search for examples of how C.COLLECTIONS_PATHS is used elsewhere in AWX
rg "C\.COLLECTIONS_PATHS|AnsibleCollectionConfig\.collection_paths" awx/ --type py -B 2 -A 2

Repository: ansible/awx

Length of output: 700


🏁 Script executed:

# Check if there's any handling for duplicate collections based on path precedence
rg "candidate\.fqcn|collections_print\[" awx/playbooks/library/indirect_instance_count.py -B 1 -A 1

Repository: ansible/awx

Length of output: 607


🏁 Script executed:

# Search online documentation or test cases about Ansible collection path precedence
web_search Ansible find_existing_collections path order precedence

Repository: ansible/awx

Length of output: 105


🌐 Web query:

Ansible find_existing_collections function path order precedence dedupe parameter

💡 Result:

  • Where find_existing_collections() lives / signature: ansible.galaxy.collection.find_existing_collections(path_filter, artifacts_manager, namespace_filter=None, collection_filter=None, dedupe=True); dedupe defaults to True. [1]

  • What path_filter actually does (and doesn’t do):

    • If path_filter is a scalar, it’s coerced to a list. Same for namespace_filter / collection_filter. [1]
    • The function scans all discovered collection directories and keeps only those whose path is under any of the path_filter roots (using Path.relative_to(...)). [1]
    • It collects matches into a set() of paths, which means the original “path order” is not preserved, so there is no guaranteed left-to-right precedence between multiple path_filter entries (or between multiple configured collection roots) once they’re in that set. [1]
  • dedupe parameter behavior (high level):

    • With dedupe=True (default), it tracks a seen set and will skip additional instances of the “same” collection (same namespace/name) after the first one encountered. [1]
    • Because iteration is over a set() of paths, which copy “wins” when deduping is not reliably tied to collection search-path precedence. [1]
  • How Ansible CLI works around this when it must show duplicates:

    • ansible-galaxy collection list explicitly calls find_existing_collections(..., dedupe=False) so it can show all instances across paths. [2]
  • Related precedence expectation / known pain point:

    • Docs for ansible-galaxy collection list -p say extra -p paths are added “to the beginning” of the configured search paths (i.e., they should have priority). [3]
    • There’s been a reported bug/behavior mismatch where collections in configured collections_paths did not take priority over site-packages/ansible_collections in some contexts. [4]

Sources: [1] [2] [3] [4]


🏁 Script executed:

# Check the logic: collections_print uses fqcn as key, so if there are duplicate fqcn entries,
# the last one would overwrite. Let's see if this is the actual behavior
rg "collections_print\[candidate\.fqcn\]" awx/playbooks/library/indirect_instance_count.py -B 15 -A 2

Repository: ansible/awx

Length of output: 1165


🏁 Script executed:

# Verify Python dict.fromkeys preserves order (should be 3.7+ standard)
python3 << 'EOF'
# Check that dict.fromkeys preserves insertion order
paths = ["/a", "/b", "/c", "/b", "/d"]
result = list(dict.fromkeys(paths))
print(f"Original: {paths}")
print(f"Deduplicated: {result}")
print(f"Order preserved: {result == ['/a', '/b', '/c', '/d']}")
EOF

Repository: ansible/awx

Length of output: 156


🏁 Script executed:

# Verify the suggested fix compiles/is syntactically correct
python3 << 'EOF'
# Simulate the suggested change
C_COLLECTIONS_PATHS = ["/path1", "/path2"]
AnsibleCollectionConfig_collection_paths = ["/path3", "/path2"]

# Original approach (loses order)
default_collections_path = set(C_COLLECTIONS_PATHS)
collections_search_paths_old = default_collections_path | set(AnsibleCollectionConfig_collection_paths)
print(f"Original (set union): {list(collections_search_paths_old)}")

# Suggested approach (preserves order and dedupes)
collections_search_paths_new = list(dict.fromkeys([*C_COLLECTIONS_PATHS, *AnsibleCollectionConfig_collection_paths]))
print(f"Suggested (dict.fromkeys): {collections_search_paths_new}")
EOF

Repository: ansible/awx

Length of output: 165


Preserve collection path precedence with insertion order.

Using set() to combine paths loses ordering, causing nondeterministic selection when duplicate collections exist in multiple search paths. With dedupe=False, all instances are returned but iteration order becomes unpredictable.

Replace with order-preserving deduplication:

Suggested change
-    default_collections_path = set(C.COLLECTIONS_PATHS)
-    collections_search_paths = default_collections_path | set(AnsibleCollectionConfig.collection_paths)
-    collections = list(find_existing_collections(list(collections_search_paths), artifacts_manager, dedupe=False))
+    collections_search_paths = list(dict.fromkeys([*C.COLLECTIONS_PATHS, *AnsibleCollectionConfig.collection_paths]))
+    collections = list(find_existing_collections(collections_search_paths, artifacts_manager, dedupe=False))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/playbooks/library/indirect_instance_count.py` around lines 147 - 149, The
code uses set() to combine C.COLLECTIONS_PATHS and
AnsibleCollectionConfig.collection_paths which loses insertion order and causes
nondeterministic iteration; change to an order-preserving merge (e.g., build
collections_search_paths by concatenating the two lists and removing duplicates
while preserving first-seen order using dict.fromkeys or an OrderedDict) so
precedence of C.COLLECTIONS_PATHS over AnsibleCollectionConfig.collection_paths
is preserved, then call
find_existing_collections(list(collections_search_paths), artifacts_manager,
dedupe=False) as before; update default_collections_path to be a list (or
preserve it only for display) and reference collections_search_paths in that
order-preserving form.

Comment on lines +179 to 183
embedded_query_file = files(f'ansible_collections.{candidate.namespace}.{candidate.name}') / 'extensions' / 'audit' / 'event_query.yml'
if embedded_query_file.exists():
with embedded_query_file.open('r') as f:
collection_print['host_query'] = f.read()
self._display.vv(f"Using embedded query for {candidate.fqcn} v{candidate.ver}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

git ls-files | grep -E "indirect_instance_count"

Repository: ansible/awx

Length of output: 103


🏁 Script executed:

head -n 200 awx/playbooks/library/indirect_instance_count.py | tail -n 30

Repository: ansible/awx

Length of output: 1736


🌐 Web query:

importlib.resources.files ModuleNotFoundError behavior Python documentation

💡 Result:

importlib.resources.files(anchor) accepts an anchor that is either a module object or a module name string (str) and returns a Traversable for that package/module’s resources. If anchor is omitted, it defaults to the caller’s module. [1]

Where ModuleNotFoundError comes from

  • If you pass a string anchor (e.g. files("my_pkg.data")), that string must be an importable module name and is imported before any resource access. If it can’t be imported, the import step will raise ModuleNotFoundError (just like import my_pkg.data would). [2]
  • Separately, if your Python is old enough that the stdlib importlib.resources isn’t available, then import importlib.resources itself can fail; in that case you typically use the backport package importlib_resources. [2]

Practical fixes

  • Prefer passing the module object you already imported: import my_pkg; files(my_pkg) ... (avoids string-name mistakes). [2]
  • If calling from inside the package, use files(__package__) (string) or import your package and pass the module object. [1][2]

Sources: Python importlib.resources docs (files/anchor semantics) [1]; importlib_resources “Using” docs (string anchors are imported first) [2].

[1] (docs.python.org)
[2] (importlib-resources.readthedocs.io)

Citations:


🏁 Script executed:

head -n 50 awx/playbooks/library/indirect_instance_count.py

Repository: ansible/awx

Length of output: 1630


Guard embedded query lookup so one bad import does not fail the callback.

The files() function with a string anchor imports the module as a prerequisite. If a discovered collection cannot be imported (e.g., partially installed or missing dependencies), ModuleNotFoundError is raised unhandled, aborting callback processing and preventing ansible_data output. The fallback query logic exists but becomes unreachable.

Suggested change
-            embedded_query_file = files(f'ansible_collections.{candidate.namespace}.{candidate.name}') / 'extensions' / 'audit' / 'event_query.yml'
-            if embedded_query_file.exists():
+            embedded_query_file = None
+            try:
+                embedded_query_file = files(f'ansible_collections.{candidate.namespace}.{candidate.name}') / 'extensions' / 'audit' / 'event_query.yml'
+            except ModuleNotFoundError:
+                pass
+            if embedded_query_file and embedded_query_file.exists():
                 with embedded_query_file.open('r') as f:
                     collection_print['host_query'] = f.read()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
embedded_query_file = files(f'ansible_collections.{candidate.namespace}.{candidate.name}') / 'extensions' / 'audit' / 'event_query.yml'
if embedded_query_file.exists():
with embedded_query_file.open('r') as f:
collection_print['host_query'] = f.read()
self._display.vv(f"Using embedded query for {candidate.fqcn} v{candidate.ver}")
embedded_query_file = None
try:
embedded_query_file = files(f'ansible_collections.{candidate.namespace}.{candidate.name}') / 'extensions' / 'audit' / 'event_query.yml'
except ModuleNotFoundError:
pass
if embedded_query_file and embedded_query_file.exists():
with embedded_query_file.open('r') as f:
collection_print['host_query'] = f.read()
self._display.vv(f"Using embedded query for {candidate.fqcn} v{candidate.ver}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/playbooks/library/indirect_instance_count.py` around lines 179 - 183, The
embedded query lookup using
files(f'ansible_collections.{candidate.namespace}.{candidate.name}') can raise
ModuleNotFoundError and abort the whole callback; wrap the files(...) call and
subsequent .exists()/.open() usage in a try/except that catches
ModuleNotFoundError (and optionally ImportError) and skips setting
collection_print['host_query'] for that candidate so the fallback query logic
remains reachable; keep references to embedded_query_file, candidate,
collection_print, and self._display.vv so the exception handler logs a verbose
message and continues processing other candidates.

- Add conftest.py to tests/data/ to prevent pytest from collecting
  Ansible module files (example.py) as test modules
- Add hashivault-kv-oidc and hashivault-ssh-oidc to expected
  credential types list to match current devel

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants