Skip to content

Refactor: Unify Child Column Access Methods and Remove Deprecated/Inefficient APIs #65892

@LiShuMing

Description

@LiShuMing

Refactoring request

Is your refactoring request related to a problem? Please describe.

This issue proposes a necessary refactoring to clean up redundant, inefficient, and inconsistently named methods for accessing child columns within complex data structures like ConstColumn, NullableColumn, ArrayColumn, StructColumn, and MapColumn.

The lack of uniformity and the existence of inefficient accessors complicate maintenance and increase the risk of developer error, especially when dealing with modern COW semantics.

Current Problems

  1. Inconsistent Naming: Various composite column types use different naming conventions (e.g., get_child_column(), column(), data_column()) to return a reference or pointer to a child column.
  2. Method Redundancy: There are too many ways to access the same child column, including raw pointers, references, and smart pointers.
  3. Inefficiency and Deprecation: Some methods return raw pointers or create unnecessary overhead compared to direct reference or smart pointer access, and should be removed.

Describe the solution you'd like

The goal is to enforce a consistent and clear API that distinguishes between accessing the column reference (for read-only/immediate use) and accessing the smart pointer (for ownership transfer or COW manipulation).

Step 1: Unify Method Naming

Introduce consistent naming conventions for child-column access across all column types:

  1. Return child column by reference
// Mutable reference
Column& xx_column();

// Const reference
const Column& xx_column() const;
  1. Return child column’s (immutable) ColumnPtr reference
ColumnPtr& xx_column_ptr();
const ColumnPtr& xx_column_ptr() const;
  1. Deprecated: raw pointer accessors
// Deprecated: equivalent to &xx_column(); unnecessary overhead
Column* xx_column_raw_ptr();
  1. Deprecated: MutablePtr accessors
// Deprecated: mutable child pointer can be obtained via &xx_column()
// and this method introduces extra cost
MutablePtr xx_column_mutable_ptr();

Once unified reference-based accessors exist, these pointer-based methods should be removed.

Step 2: Remove Deprecated & Inefficient Methods

After migrating usages:

  • Remove all raw-pointer accessors (xx_column_raw_ptr())
  • Remove all mutable-pointer accessors (xx_column_mutable_ptr())

This will:

  • reduce code duplication
  • improve clarity of ownership and mutability
  • remove unnecessary pointer indirections
  • make Column API easier to maintain and extend

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions