Implement-multi-linear-attention-full-model-LLM-ablation


# Objective

Implement the base_model Keras Functional Model for the core pre-attention and attention stack that will be passed to the Cerebros Neural Architecture Search (NAS) algorithm to complete the model with the feed forward block and output head. The goal is to integrate the existing component layers (SingleHeadChunkedAttention, VoxelAttentionLayer), implement known layers (MAMBA, Linformer, Adapter), and wire them together according to the architecture most likely to leverage their strongest synergy.

This is the critical integration step that bridges our individual, novel components with the Cerebros NAS to create the final, trainable LLM.

# Context

We have successfully prototyped several key ablations on linear complexity Cerebros LLMs:

- An embedding -> iRoPE -> -> Cerebrtos NAS feed forward block -> head layer 
- An embedding -> iRoPE -> A SingleHeadChunkedAttention -> Cerebrtos NAS feed forward block -> head layer 
- An embedding -> iRoPE -> A VoxelAttentionLayer block -> Cerebrtos NAS feed forward block -> head layer 


The next task is to assemble a composite model with the first three items, along with yet-to-be-implemented MAMBA and Linformer layers, into a single, cohesive Keras Functional model (base_model). This base_model will then be fed into the Cerebros NAS to complete the architecture.

# Task Breakdown

Use the existing LLM code as a template:

1. Define Global Constants and Hyperparameters, rename any that need renamed to disambiguate from incumbent parameters, etc:

- [x] 1.1. Parameters:

- MAX_SEQ_LENGTH
- EMBEDDING_DIM
- VOCABULARY_SIZE
- K_PROJ (for SingleHeadChunkedAttention)
- DROPOUT_RATE_CHUNKED_ATTENTION
- MAMBA_STATE_DIM, MAMBA_PROJ_DIM, DROPOUT_RATE_MAMBA
- VOXEL_GRID_SIZE, 
- DROPOUT_RATE_VOXEL
- LINFORMER_K (projection dimension) 
- DROPOUT_RATE_LINFORMER
- likely others

- [ ] 1.2. Try to determine an optimal range of ablations to consider in first pass hyperparameter tuning


2. Implement Missing Keras Layers

Create Keras Layer classes for the components that do not yet have a final implementation.

- [x] MambaBlock: Generate a single-head MAMBA block. It should inherit from tf.keras.layers.Layer and be compatible with the model graph. Ensure it includes a skip connection and stream merging as per the universal principles.
- [x] LinformerLayer: Generate a Linformer layer implementation. It must project Key and Value matrices to a lower dimension (k) to achieve linear complexity. Ensure it includes a skip connection and stream merging.
- [x] AdapterBlock: This layer takes an input of shape (BATCH_SIZE, SEQUENCE_LENGTH, EMBEDDING_DIM) and reduces it to (BATCH_SIZE, SEQUENCE_LENGTH). Implement this using an optimal approach like a gating mechanism or a linear projection along the feature dimension.
- [x] Attention blocks satisfying the constraints for each of the existing layers

3. Construct the base_model Keras Functional Model

- [x] Assemble the full model using the Keras Functional API, strictly following the specified order and architectural rules.

- Input Layer: Create an input layer with shape [(MAX_SEQ_LENGTH,)]. Remember the input tensor must be nested within a list.
- Embedding: Add an Embedding layer
- Parallel Embedding Streams:
  - Stream 1: Pass the embedding output through an iRoPE layer.
  - Stream 2: Create a skip connection by passing the embedding output directly.
- Stream Merging: Merge the iRoPE and skip connection streams correctly in the first attention block.
- Attention Stack (Sequential Order): 
  - For each block below, implement a skip connection around the core layer and merging the block's input with the block's output along the sequence dimension. 
  - Add LayerNormalization and Dropout in appropriate positions for: SingleHeadChunkedAttention Block, MambaBlock, VoxelAttentionLayer,  LinformerLayer
  - Final Adapter: Pass the output of the attention stack to the AdapterBlock to reduce the feature dimension from (BATCH_SIZE, SEQUENCE_LENGTH, EMBEDDING_DIM) to (BATCH_SIZE, SEQUENCE_LENGTH * BASE_MODEL_OUTPUT_PROJECTION_MULTIPLIER)

4. Create composite model with this base model, Cerebros feed forward block, and a head layer, and run some ablations on the pico - scale bible data set.



Acceptance Criteria


- The script defines all necessary global constants to allow ablations to be run on each blcok's parameters independently.
- The script contains working Keras Layer implementations for MambaBlock, LinformerLayer, and AdapterBlock.
- The script successfully builds the base_model Keras Functional Model, adhering to the specified layer order and architectural principles (skip connections, merging, normalization).
- Running the script prints a model.summary() that reflects the correct architecture, with the final output shape being (batch_size, MAX_SEQ_LENGTH, 1).
- The model is able to successfully train and show potential for synergistic results with fine tuning

Follow - up:

- [ ] Create a separate issue to construct and run a hyperparameter optimization run on this model.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement-multi-linear-attention-full-model-LLM-ablation #288

Objective

Context

Task Breakdown

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement-multi-linear-attention-full-model-LLM-ablation #288

Description

Objective

Context

Task Breakdown

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions