lower validation&train loss with  poorer performance

### Bug description

Thanks for this excellent tutorial, learned a lot from this repo.

---

I followed the chapter 5 's **03_bonus_pretraining_on_gutenberg** with fully gutenberg's data.
```shell
ncdu 1.15.1 ~ Use the arrow keys to navigate, press ? for help                                                                                                                                                                               
--- /opt/repository/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg ----------------------------------------------------------------------------------------------------------------------------------------------------------------
   89.7 GiB [##########] /gutenberg                                                                                                                                                                                                          
   87.8 GiB [######### ] /gutenberg_preprocessed
   15.0 GiB [#         ] /model_checkpoints
   24.0 KiB [          ] /__pycache__
   12.0 KiB [          ]  previous_chapters.py
   12.0 KiB [          ]  pretraining_simple.py
   12.0 KiB [          ]  README.md
    4.0 KiB [          ]  prepare_dataset.py
    4.0 KiB [          ]  tests.py
```

and the model performed well at the first 70 thousand steps. The word sequence that appends to **Every effort moves** seems reasonable and readable.
```shell
(base) @l40s:/opt/repository/LLMs-from-scratch/ch05/03_bonus_pretraining_on_gutenberg/model_checkpoints$ ll
total 15777456
drwxrwxr-x 2 ubuntu ubuntu      4096 Aug  4 02:53 ./
drwxrwxr-x 6 ubuntu ubuntu      4096 Aug  1 12:43 ../
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  2 11:08 model_pg_110263.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434735 Aug  1 15:01 model_pg_11338.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  2 13:27 model_pg_121652.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  2 16:11 model_pg_135104.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  2 18:26 model_pg_146207.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  2 20:45 model_pg_157532.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  3 00:47 model_pg_177488.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  3 03:07 model_pg_188954.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  3 05:24 model_pg_200250.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  3 06:02 model_pg_203300.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  3 08:20 model_pg_214703.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  3 10:40 model_pg_226160.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  3 12:58 model_pg_237489.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  3 15:23 model_pg_249333.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  3 19:23 model_pg_269109.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434735 Aug  1 18:28 model_pg_28314.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  3 22:52 model_pg_286227.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434912 Aug  4 02:53 model_pg_306036.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434735 Aug  1 20:47 model_pg_39673.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434735 Aug  2 00:13 model_pg_56588.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434735 Aug  2 03:40 model_pg_73617.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434735 Aug  2 05:58 model_pg_84892.pth
-rw-rw-r-- 1 ubuntu ubuntu 702434735 Aug  2 08:26 model_pg_97054.pth
```

but just now I login to this server which has a single L40s GPU. the model's loss is much lower but the text sequence is weird.
```shell
Ep 1 (Step 140900): Train loss 3.190, Val loss 3.301                                                                                                                                                                                                                 
Ep 1 (Step 141000): Train loss 2.814, Val loss 3.307                                                                                                                                                                                                                 
Every effort moves you, and I will not be able to help you. You are not going to be troubled with the idea of a new life. You are not going to be troubled with the idea of a new life. You are                                                                      
Ep 1 (Step 141100): Train loss 2.836, Val loss 3.298                                                                                                                                                                                                                 
Ep 1 (Step 141200): Train loss 3.174, Val loss 3.303                                                                                                                                                                                                                 
Ep 1 (Step 141300): Train loss 2.953, Val loss 3.305                                                                                                                                                                                                                 
Ep 1 (Step 141400): Train loss 3.290, Val loss 3.294                                                                                                                                                                                                                 
Ep 1 (Step 141500): Train loss 2.784, Val loss 3.306                                                                                                                                                                                                                 
Ep 1 (Step 141600): Train loss 2.707, Val loss 3.316                                                                                                                                                                                                                 
Ep 1 (Step 141700): Train loss 3.126, Val loss 3.293                                                                                                                                                                                                                 
Ep 1 (Step 141800): Train loss 2.819, Val loss 3.317                                                                                                                                                                                                                 
Ep 1 (Step 141900): Train loss 2.922, Val loss 3.302                                                                                                                                                                                                                 
Ep 1 (Step 142000): Train loss 2.770, Val loss 3.311 
....
Ep 1 (Step 303600): Train loss 1.942, Val loss 1.533
Ep 1 (Step 303700): Train loss 1.991, Val loss 1.545
Ep 1 (Step 303800): Train loss 2.034, Val loss 1.540
Ep 1 (Step 303900): Train loss 1.960, Val loss 1.539
Ep 1 (Step 304000): Train loss 1.966, Val loss 1.539
Every effort moves you 髫       1 髫    1 髫    1 髫    1 髫    1 髫    1 髫    1 髫    1 �
Ep 1 (Step 304100): Train loss 1.872, Val loss 1.533
Ep 1 (Step 304200): Train loss 2.053, Val loss 1.535
Ep 1 (Step 304300): Train loss 1.974, Val loss 1.536
Ep 1 (Step 304400): Train loss 1.944, Val loss 1.544
Ep 1 (Step 304500): Train loss 1.923, Val loss 1.539
Ep 1 (Step 304600): Train loss 1.891, Val loss 1.551
Ep 1 (Step 304700): Train loss 1.998, Val loss 1.545
Ep 1 (Step 304800): Train loss 1.892, Val loss 1.544
Ep 1 (Step 304900): Train loss 1.888, Val loss 1.543
Ep 1 (Step 305000): Train loss 2.020, Val loss 1.537
Every effort moves you  1 susceptible   1 susceptible   1 susceptible   1 susceptible   1 susceptible   1 susceptible   1 susceptible
Ep 1 (Step 305100): Train loss 1.906, Val loss 1.537
Ep 1 (Step 305200): Train loss 1.842, Val loss 1.542
Ep 1 (Step 305300): Train loss 2.080, Val loss 1.539
Ep 1 (Step 305400): Train loss 1.993, Val loss 1.536
Ep 1 (Step 305500): Train loss 2.016, Val loss 1.537
Ep 1 (Step 305600): Train loss 2.001, Val loss 1.533
Ep 1 (Step 305700): Train loss 1.844, Val loss 1.536
Ep 1 (Step 305800): Train loss 1.988, Val loss 1.533
Ep 1 (Step 305900): Train loss 1.590, Val loss 1.536
Ep 1 (Step 306000): Train loss 1.879, Val loss 1.536
Every effort moves you héré     1 dépouillé     1 dépouillé     1 dépouillé     1 dépouillé     1 dépouillé
```

### What operating system are you using?

Linux

### Where do you run your code?

Other cloud environment (AWS, Azure, GCP)

### Environment

```
[OK] Your Python version is 3.11.5
2024-08-04 03:26:08.627305: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-04 03:26:09.063873: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-04 03:26:09.194486: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-04 03:26:10.025488: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-04 03:26:13.318814: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/home/ubuntu/anaconda3/lib/python3.11/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).
  from pandas.core import (
[OK] torch 2.4.0+cu121
[OK] jupyterlab 4.2.4
[OK] tiktoken 0.7.0
[OK] matplotlib 3.7.2
[OK] tensorflow 2.17.0
[OK] tqdm 4.66.4
[OK] numpy 1.26.4
[OK] pandas 2.2.2
[OK] psutil 6.0.0
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

lower validation&train loss with poorer performance #292

Bug description

What operating system are you using?

Where do you run your code?

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

lower validation&train loss with poorer performance #292

Description

Bug description

What operating system are you using?

Where do you run your code?

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions