Skip to content

Proposal: Removing db.session Usage from core/workflow Module #20449

@laipz8200

Description

@laipz8200

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

Part of #19429

Background

The core/workflow module currently relies on Flask-SQLAlchemy's db.session for database operations. This approach has several drawbacks:

  1. Hard to manage session lifecycle: Flask-SQLAlchemy's session is tied to the Flask request context, making it difficult to control in background tasks or multi-threaded environments.

  2. Causes bugs in complex workflows: The shared session can lead to unexpected behavior when multiple operations are performed concurrently, especially in the workflow engine which uses threading.

  3. Tight coupling to Flask: Direct usage of db.session creates a dependency on Flask, making the core workflow module less portable and harder to test.

Current Implementation Analysis

After examining the codebase, I found:

  1. The core/workflow module defines repository interfaces that abstract database operations:

    • WorkflowExecutionRepository
    • WorkflowNodeExecutionRepository
  2. The implementations of these interfaces are in core/repositories:

    • SQLAlchemyWorkflowExecutionRepository
    • SQLAlchemyWorkflowNodeExecutionRepository
  3. These implementations already use a better approach - they accept a session_factory or engine in their constructor and create sessions when needed:

    def __init__(self, session_factory: sessionmaker | Engine, ...):
        # ...
    
    def save(self, execution: NodeExecution) -> None:
        # ...
        with self._session_factory() as session:
            session.merge(db_model)
            session.commit()
  4. However, there are still places in the workflow module that import and potentially use db.session directly, such as in graph_engine.py.

Proposed Solution

  1. Complete the Repository Pattern Implementation:

    • Ensure all database operations in the workflow module go through the repository interfaces
    • Remove any direct imports of db from extensions.ext_database
  2. Session Factory Injection:

    • Modify any remaining components that use db.session directly to accept a session factory
    • Create sessions only when needed and ensure they're properly closed
  3. Dependency Inversion:

    • Make the workflow module completely agnostic of the specific database implementation
    • Allow the application layer to provide the appropriate session factory

Conclusion

Removing direct db.session usage from the core/workflow module will significantly improve its maintainability, testability, and reliability. The current architecture already has most of the pieces in place with the repository pattern, but completing this refactoring will eliminate a source of bugs and make the codebase more robust.

2. Additional context or comments

No response

3. Can you help us with this feature?

  • I am interested in contributing to this feature.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions