Skip to content

Fix ALiBi distance matrix to use a proper (q, k) outer difference#419

Closed
systemblueio wants to merge 1 commit into
ml-explore:mainfrom
systemblueio:fix/alibi-outer-difference
Closed

Fix ALiBi distance matrix to use a proper (q, k) outer difference#419
systemblueio wants to merge 1 commit into
ml-explore:mainfrom
systemblueio:fix/alibi-outer-difference

Conversation

@systemblueio

@systemblueio systemblueio commented Jun 7, 2026

Copy link
Copy Markdown

Proposed changes

ALiBi.alibiMatrix builds the relative-distance matrix like this:

let x1 = MLXArray(key.offset ..< key.qSequenceLength).expandedDimensions(axis: 1)
let x2 = MLXArray(0 ..< key.kSequenceLength).expandedDimensions(axis: 1)
let distanceMatrix = -abs(expandedDimensions((x1 - x2), axes: [0, 1]))

Both vectors are expanded on axis 1, so the shapes are (q, 1) and (k, 1) and x1 - x2 is (q, 1) - (k, 1). That broadcast throws whenever the query and key lengths differ, and when they are equal it collapses to a constant column ((offset + i) - i) instead of the (i - j) outer difference, so the ALiBi bias is wrong in the cases it is meant to handle.

This expands the key vector on axis 0 instead, so the difference is (q, 1) - (1, k) and broadcasts to the intended (q, k) matrix, matching the reference Python implementation in mlx (x1[:, None] - x2[None, :]). Adds MLXNNPositionalEncodingTests covering the relative-distance values and a differing query/key length case. There is no existing issue for this; the bug was found while reading the layer.

Note: I verified swift build --build-tests and swift format lint --strict locally, but could not run the suite here because SwiftPM on the command line cannot build the Metal shaders (per the README), so the new tests are scoped to the CPU stream and exercise on CI.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

The ALiBi position-bias matrix expanded both index vectors on axis 1, so
`x1 - x2` was `(q, 1) - (k, 1)`. That throws for differing query and key
lengths and collapses to a constant column when they match, so the bias was
wrong wherever it mattered.

Expand the key vector on axis 0 instead, matching the reference Python
implementation (`x1[:, None] - x2[None, :]`), so the difference broadcasts to
the intended `(q, k)` relative-distance matrix.

Adds positional-encoding tests covering the relative-distance values and
differing query/key lengths.
@systemblueio systemblueio closed this by deleting the head repository Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant