MAISI Mask Autoencoder #2026

cbe135 · 2025-09-04T10:40:34Z

cbe135
Sep 4, 2025

During the implementation of our own Mask Autoencoder, we encountered two questions.
Specifically, our dataset masks have the characteristic of input channel of 1 and output channel of 1.
Our question relates to the different input channel and output channel setting used in pretraining the provided model weights.
The output channel count of 128 makes sense as there are 128 possible option choices for organs and disease phenomena.
In terms of the input channel count, we were wondering why the input channel was 7?
Also, in terms of fine tuning it to our dataset, we were thinking of averaging/modifying the first layer and last layer of the model into the desired input or output. Are there other recommended suggestions?
Thank you.

Answered by guopengf

Sep 6, 2025

Hi @cbe135, the input channel of the mask autoencoder is 8 (see

tutorials/generation/maisi/configs/config_maisi3d-rflow.json

Line 94 in 8b90a16

"in_channels": 8,

).
This is mainly because we use binary representation to encode the input mask, which saves memory. 8 channels can represent 2**8 (0~255) labels. Each channel represents a bit (see the following function).

tutorials/generation/maisi/scripts/utils.py

Lines 175 to 190 in 8b90a16

     def binarize_labels(x: Tensor, bits: int = 8) -> Tensor:  
   """  
    Convert input tensor to binary representation.  
     
    This function takes an input tensor and converts it to a binary representation  
    using the s…

View full answer

Can-Zhao · 2025-09-05T18:17:57Z

Can-Zhao
Sep 5, 2025
Collaborator

@guopengf

0 replies

guopengf · 2025-09-06T02:38:53Z

guopengf
Sep 6, 2025
Collaborator

Hi @cbe135, the input channel of the mask autoencoder is 8 (see

tutorials/generation/maisi/configs/config_maisi3d-rflow.json

Line 94 in 8b90a16

"in_channels": 8,

).
This is mainly because we use binary representation to encode the input mask, which saves memory. 8 channels can represent 2**8 (0~255) labels. Each channel represents a bit (see the following function).

tutorials/generation/maisi/scripts/utils.py

Lines 175 to 190 in 8b90a16

    
           def binarize_labels(x: Tensor, bits: int = 8) -> Tensor: 
        
               """ 
        
               Convert input tensor to binary representation. 
        
               This function takes an input tensor and converts it to a binary representation 
        
               using the specified number of bits. 
        
               Args: 
        
                   x (Tensor): Input tensor with shape (B, 1, H, W, D). 
        
                   bits (int, optional): Number of bits to use for binary representation. Defaults to 8. 
        
               Returns: 
        
                   Tensor: Binary representation of the input tensor with shape (B, bits, H, W, D). 
        
               """ 
        
               mask = 2 ** torch.arange(bits).to(x.device, x.dtype) 
        
               return x.unsqueeze(-1).bitwise_and(mask).ne(0).byte().squeeze(1).permute(0, 4, 1, 2, 3)

For example, label 1 is encoded as [0, 0, 0, 0, 0, 0, 0, 1].

For your use case, is the label of your dataset covered in the pre-defined label dict?

0 replies

cbe135 · 2025-09-06T10:05:35Z

cbe135
Sep 6, 2025
Author

Hi @guopengf, thank you.
That makes sense.
Our mask target is hypo tumor so it is not on the list.
Would it make sense to use one of the dummies?
Does it matter which dummy is used?
Looking specifically at the label dict, is there any specific logic on how the different labels are ordered?
Thank you.

1 reply

guopengf Sep 9, 2025
Collaborator

The order of labels does not matter, so you can use any dummy label.

mar-cry · 2025-09-10T08:31:50Z

mar-cry
Sep 10, 2025

Hello @guopengf , my dataset has three categories that are not included in the 132 predefined categories in the label_dict, so I mapped these three categories to dummy6, dummy7, and dummy8. After converting them into the input format compatible with the Mask Autoencoder, encoding the input mask, and then decoding to reconstruct, I found that the reconstructed result differs greatly from the input. Is this normal? Here are illustrative examples of my input and output.

2 replies

guopengf Sep 10, 2025
Collaborator

Have you trained the AE with dummy labels? If not, this result should be expected. Please try to fine-tune AE with your dataset.

mar-cry Sep 10, 2025

I don't fine-tune the mask autoencoder on our dataset. I notice that the input and output channels of mask autoencoder are different from the common image autoencoder, is there any difference between the training pipeline of two autoencoders, such as the training loss?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MAISI Mask Autoencoder #2026

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

	def binarize_labels(x: Tensor, bits: int = 8) -> Tensor:
	"""
	Convert input tensor to binary representation.

	This function takes an input tensor and converts it to a binary representation
	using the s…

MAISI Mask Autoencoder #2026

Uh oh!

cbe135 Sep 4, 2025

Replies: 4 comments · 3 replies

Uh oh!

Can-Zhao Sep 5, 2025 Collaborator

Uh oh!

guopengf Sep 6, 2025 Collaborator

Uh oh!

cbe135 Sep 6, 2025 Author

Uh oh!

guopengf Sep 9, 2025 Collaborator

Uh oh!

mar-cry Sep 10, 2025

Uh oh!

guopengf Sep 10, 2025 Collaborator

Uh oh!

Uh oh!

mar-cry Sep 10, 2025

cbe135
Sep 4, 2025

Replies: 4 comments 3 replies

Can-Zhao
Sep 5, 2025
Collaborator

guopengf
Sep 6, 2025
Collaborator

cbe135
Sep 6, 2025
Author

guopengf Sep 9, 2025
Collaborator

mar-cry
Sep 10, 2025

guopengf Sep 10, 2025
Collaborator