-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Refactoring request
Is your refactoring request related to a problem? Please describe.
This issue proposes a necessary refactoring to clean up redundant, inefficient, and inconsistently named methods for accessing child columns within complex data structures like ConstColumn, NullableColumn, ArrayColumn, StructColumn, and MapColumn.
The lack of uniformity and the existence of inefficient accessors complicate maintenance and increase the risk of developer error, especially when dealing with modern COW semantics.
Current Problems
- Inconsistent Naming: Various composite column types use different naming conventions (e.g.,
get_child_column(),column(),data_column()) to return a reference or pointer to a child column. - Method Redundancy: There are too many ways to access the same child column, including raw pointers, references, and smart pointers.
- Inefficiency and Deprecation: Some methods return raw pointers or create unnecessary overhead compared to direct reference or smart pointer access, and should be removed.
Describe the solution you'd like
The goal is to enforce a consistent and clear API that distinguishes between accessing the column reference (for read-only/immediate use) and accessing the smart pointer (for ownership transfer or COW manipulation).
Step 1: Unify Method Naming
Introduce consistent naming conventions for child-column access across all column types:
- Return child column by reference
// Mutable reference
Column& xx_column();
// Const reference
const Column& xx_column() const;
- Return child column’s (immutable) ColumnPtr reference
ColumnPtr& xx_column_ptr();
const ColumnPtr& xx_column_ptr() const;
- Deprecated: raw pointer accessors
// Deprecated: equivalent to &xx_column(); unnecessary overhead
Column* xx_column_raw_ptr();
- Deprecated: MutablePtr accessors
// Deprecated: mutable child pointer can be obtained via &xx_column()
// and this method introduces extra cost
MutablePtr xx_column_mutable_ptr();
Once unified reference-based accessors exist, these pointer-based methods should be removed.
Step 2: Remove Deprecated & Inefficient Methods
After migrating usages:
- Remove all raw-pointer accessors (xx_column_raw_ptr())
- Remove all mutable-pointer accessors (xx_column_mutable_ptr())
This will:
- reduce code duplication
- improve clarity of ownership and mutability
- remove unnecessary pointer indirections
- make Column API easier to maintain and extend
Describe alternatives you've considered
Additional context