# From #288 # The problem: - The one - hot - encoded labels with CategoricalCrossEntropy are more memory heavy than an integer label with SparseCategoricalCrossEntropy. # The solution: Replace CategoricalCrossEntropy with SparseCategoricalCrossEntropy # Tasks: - [ ] Refactor prepare_data to return a singleton integer label for each text token. - [ ] Refactor the generate loop to use logits not probs. - [ ] Refactor stage 1-a to use the new label format and use SparseCategoricalCrossEntropy - [ ] Refactor the Dataset object for Stage 1-b training, to feed batches of integer labels, not one hot encoded labels . - [ ] Verify the distributions in the final outputs are the same (e.g. number before top_... sampling, penalties, and equivalent results after ... ).