Skip to content

chore(migration): Migrate code from googleapis/python-bigquery-dataframes into packages/bigframes#16493

Draft
chalmerlowe wants to merge 2223 commits intomainfrom
migration.python-bigquery-dataframes.migration.2026-03-31_18-48-47.migrate
Draft

chore(migration): Migrate code from googleapis/python-bigquery-dataframes into packages/bigframes#16493
chalmerlowe wants to merge 2223 commits intomainfrom
migration.python-bigquery-dataframes.migration.2026-03-31_18-48-47.migrate

Conversation

@chalmerlowe
Copy link
Copy Markdown
Contributor

See #15999.

This PR should be merged with a merge-commit, not a squash-commit, in order to preserve the git history.

TrevorBergeron and others added 30 commits November 17, 2025 16:29
Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕
…when `connection_id` is not present (#2272)

Fixes #460856043 🦕
…2258)

Previously, when the total number of rows (row_count) was unknown (e.g.,
due to deferred computation or errors), it would incorrectly default to
0. This resulted in confusing UI, such as displaying "Page 1 of 0", and
allowed users to navigate to empty pages without automatically returning
to valid data.

current display strategy for the interactive table widget:

   * When `row_count` is a positive number (e.g., 50):
       * Total Rows Display: Shows the exact count, like 50 total rows.
* Pagination Display: Shows the page relative to the total rows, like
Page 1 of 50.
* Navigation: The "Next" button is disabled only on the final page.

   * When `row_count` is `None` (unknown):
       * Total Rows Display: Shows Total rows unknown.
* Pagination Display: Shows the page relative to an unknown total, like
Page 1 of many.
* Navigation: The "Next" button is always enabled, allowing you to page
forward until the backend determines there is no more data.

Fixes #<428238610> 🦕
Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕
Fixes internal issue 445774480🦕

---------

Co-authored-by: Shenyang Cai <sycai@users.noreply.github.com>
Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes b/447388852 🦕
Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

---------

Co-authored-by: Shenyang Cai <sycai@users.noreply.github.com>
#2287)

This pull request addresses a pagination display bug in the `anywidget`
table where a small DataFrame (e.g., 5 rows) would incorrectly show
"Page 1 of 5" instead of "Page 1 of 1".

* **Fixed `table_widget.js` pagination logic:** Corrected the JavaScript
to accurately calculate total pages, ensuring "Page 1 of 1" is displayed
for datasets smaller than the page size.
* **Added comprehensive system test:** Enhanced `test_anywidget.py` by
improving the `test_widget_with_few_rows_should_have_only_one_page`
test. This test now explicitly asserts the correct `row_count` and
verifies that page navigation is correctly clamped to the first page,
thus confirming the backend conditions for the "Page 1 of 1" frontend
display.


Fixes #<issue_number_goes_here> 🦕
…2292)

This change aims to fix the tests failing in #2248 because of a 1-based
indexing error.

Fixes internal issue 417774347 🦕
Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes b/447388852 🦕
…2293)

Also:

- include link to `bigframes.bigquery.ai` in README
- add partial ordering mode recommendation to starter sample
- remove 2.0 warning

Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Towards b/454350869 🦕
Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕
…2255)

This PR introduces single-column sorting functionality to the
interactive table widget.

1) **Three-State Sorting UI**

1.1) The sort indicator dot (●) is now hidden by default and only
appears when the user hovers the mouse over a column header
1.2) Implemented a sorting cycle: unsorted (●) → ascending (▲) →
descending (▼) → unsorted (●).
1.3) Visual indicators (●, ▲, ▼) are displayed in column headers to
reflect the current sort state.
1.4) Sorting controls are now only enabled for columns with orderable
data types.

2) **Tests**
2.1) Updated `paginated_pandas_df` fixture for better sorting test
coverage
2.2) Added new system tests to verify ascending, descending, and
multi-column sorting.

**3. Frontend Unit Tests**
JavaScript-level unit tests have been added to validate the widget's
frontend logic, specifically the new sorting functionality and UI
interactions.

**How to Run Frontend Unit Tests**:
To execute these tests from the project root directory:
```bash
cd tests/js
npm install  # Only needed if dependencies haven't been installed or have changed
npm test
```

Docs has been updated to document the new features. The main description
now mentions column sorting and adjustable widths, and a new section has
been added to explain how to use the column resizing feature. The
sorting section was also updated to mention that the indicators are only
visible on hover.

Fixes #<459835971> 🦕

---------

Co-authored-by: Tim Sweña (Swast) <swast@google.com>
Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes b/447388852 🦕
…sqlglot compiler (#2297)

This change aims to fix the `to_datetime` related tests failing in
#2248.

Fixes internal issue 417774347 🦕
This change aims to fix the `test_timestamp_series_diff_agg` test
failing in #2248.

Fixes internal issue 417774347 🦕
This change aims to fix some string-related tests failing in #2248.

Fixes internal issue 417774347🦕
The default maximum instances for cloud functions is 100, not 0. Updated
the `expected_max_instances` in the `parametrize` decorator to 100 for
the 'no-set' and 'set-None' test cases to accurately reflect the runtime
behavior.

Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes b/465212379 🦕
Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕
See instructions at
https://pydata-sphinx-theme.readthedocs.io/en/latest/user_guide/analytics.html#google-analytics

Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Co-authored-by: Shuowei Li <shuowei@google.com>
tswast and others added 12 commits March 26, 2026 11:47
…es (#2533)

Thank you for opening a Pull Request! Before submitting your PR, there
are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a
[bug/issue](https://github.com/googleapis/python-bigquery-dataframes/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕
Updates documentation and internal comments to use the term "ObjectRef
column" instead of "Blob column", as per the official BigQuery
documentation. Links to the documentation are included in user-facing
docstrings.

---
*PR created automatically by Jules for task
[15739234298342142432](https://jules.google.com/task/15739234298342142432)
started by @tswast*

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: tswast <247555+tswast@users.noreply.github.com>
This will be the reference notebook to be used by the tech blog on AI
functions in BigFrames
SQL generator output fallbacks (SELECT 1 placeholder).

Fixes #<452681068> 🦕
Fixes internal issue 497970577🦕
…rames/main' into migration.python-bigquery-dataframes.migration.2026-03-31_18-48-47.migrate
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the foundational codebase for the bigframes package, including core execution nodes, BigQuery operation compilers, and extensive CI/CD configurations. The review feedback identifies a bug in the exponentiation logic for negative bases and highlights opportunities to improve behavioral parity with Pandas by returning null instead of zero for division and modulo by zero operations. Additionally, a suggested fix addresses a potential infinite loop in the repository root detection script used during environment setup.

I am having trouble creating individual review comments. Click here to see my feedback.

packages/bigframes/bigframes/core/compile/ibis_compiler/scalar_op_registry.py (1548)

high

The odd_exponent calculation for negative bases is incorrect for negative odd integers. In BigQuery (and many other SQL dialects), the MOD operator returns a negative value for negative inputs (e.g., MOD(-3, 2) returns -1). As a result, the condition == _ibis_num(1) will fail for negative odd exponents, leading to an incorrect sign in the final result when overflow_cond is true. Using the absolute value of the exponent or checking for a non-zero remainder would fix this.

    odd_exponent = (x_val < _ZERO) & (y_val.cast(ibis_dtypes.int64).abs() % _ibis_num(2) == _ibis_num(1))

packages/bigframes/.kokoro/trampoline_v2.sh (232-238)

medium

The repo_root function can enter an infinite loop if it is executed in a directory that is not part of a git repository. This happens because dirname "/" returns /, so the while loop condition [[ ! -d "${dir}/.git" ]] will never become false if /.git does not exist. Adding a check for the root directory or a maximum depth would make the script more robust.

function repo_root() {
    local dir="$1"
    while [[ ! -d "${dir}/.git" && "$dir" != "/" ]]; do
	dir="$(dirname "$dir")"
    done
    if [[ ! -d "${dir}/.git" ]]; then
        echo "Error: Could not find .git directory in any parent of $1" >&2
        exit 1
    fi
    echo "${dir}"
}

packages/bigframes/bigframes/core/compile/ibis_compiler/scalar_op_registry.py (1634)

medium

In floordiv_op, integer division by zero currently returns 0 (via _ZERO * x_numeric at line 1637). This is inconsistent with Pandas behavior for nullable integers (Int64), where division by zero should result in a null/NA value. Returning 0 can lead to silent errors in data processing. It is recommended to return null for the integer case.

    zero_result = _INF if (x.type().is_floating() or y.type().is_floating()) else ibis.null().cast(x.type())

packages/bigframes/bigframes/core/compile/ibis_compiler/scalar_op_registry.py (1741-1744)

medium

In _int_mod, the modulo operation with a zero divisor returns 0 (via _ZERO * x). To maintain consistency with Pandas and avoid mathematically incorrect results, this should return null instead. Returning 0 hides the division-by-zero error and produces an incorrect value.

        .when(
            y == _ZERO,
            ibis.null().cast(x.type()),
        )  # Return NULL for division by zero to match pandas behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.