Possible to change starting epoch? #17396

Lordmau5 · 2023-04-16T17:03:22Z

Lordmau5
Apr 16, 2023

Hey everyone!

So a project I've been helping out a bit on, https://github.com/34j/so-vits-svc-fork , is manually loading model checkpoints upon training start / initialization.

Currently, it uses a bit of a hacky way to set the current epoch to continue from:
https://github.com/34j/so-vits-svc-fork/blob/main/src/so_vits_svc_fork/train.py#L170

def on_train_start(self) -> None:
    if not self.tuning:
        self.set_current_epoch(self._temp_epoch) # This is being loaded from the model
        total_batch_idx = self.current_epoch * len(self.trainer.train_dataloader)
        self.set_total_batch_idx(total_batch_idx)
        global_step = total_batch_idx * self.optimizers_count
        self.set_global_step(global_step)

def set_current_epoch(self, epoch: int):
    LOG.info(f"Setting current epoch to {epoch}")
    self.trainer.fit_loop.epoch_progress.current.completed = epoch
    assert self.current_epoch == epoch, f"{self.current_epoch} != {epoch}"

def set_global_step(self, global_step: int):
    LOG.info(f"Setting global step to {global_step}")
    self.trainer.fit_loop.epoch_loop.manual_optimization.optim_step_progress.total.completed = (
        global_step
    )
    self.trainer.fit_loop.epoch_loop.automatic_optimization.optim_progress.optimizer.step.total.completed = (
        global_step
    )
    assert self.global_step == global_step, f"{self.global_step} != {global_step}"

def set_total_batch_idx(self, total_batch_idx: int):
    LOG.info(f"Setting total batch idx to {total_batch_idx}")
    self.trainer.fit_loop.epoch_loop.batch_progress.total.ready = (
        total_batch_idx + 1
    )
    self.trainer.fit_loop.epoch_loop.batch_progress.total.completed = (
        total_batch_idx
    )
    assert (
        self.total_batch_idx == total_batch_idx + 1
    ), f"{self.total_batch_idx} != {total_batch_idx + 1}"

@property
def total_batch_idx(self) -> int:
    return self.trainer.fit_loop.epoch_loop.total_batch_idx + 1

However, this breaks support for max_epochs, as I feel it's not overriding all necessary variables internally.

Is there a way to properly override the current epoch it "starts" (or in this case, continues) training from?

Answered by Lordmau5

Apr 19, 2023

After looking at the Lightning code, I managed to fix it by also setting self.trainer.fit_loop.epoch_progress.current.processed to a specific epoch 😁

View full answer

Lordmau5 · 2023-04-19T11:39:53Z

Lordmau5
Apr 19, 2023
Author

After looking at the Lightning code, I managed to fix it by also setting self.trainer.fit_loop.epoch_progress.current.processed to a specific epoch 😁

1 reply

Parisa-Bagherzadeh Jun 27, 2023

what's the best value for epochs?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible to change starting epoch? #17396

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Possible to change starting epoch? #17396

Uh oh!

Uh oh!

Lordmau5 Apr 16, 2023

Replies: 1 comment · 1 reply

Uh oh!

Lordmau5 Apr 19, 2023 Author

Uh oh!

Parisa-Bagherzadeh Jun 27, 2023

Lordmau5
Apr 16, 2023

Replies: 1 comment 1 reply

Lordmau5
Apr 19, 2023
Author