Skip to content

Mini-batch loading doesn't prevent memory allocation errors in DOMINANT #118

@joshred83

Description

@joshred83

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

from pygod.detector import DOMINANT
from torch_geometric.datasets import EllipticBitcoinDataset
dataset = EllipticBitcoinDataset(root="data/elliptic")
data = dataset[0]
model = DOMINANT(batch_size=1024, epochs=2)
model.fit(data)

Expected behavior
Digging through the code implies that the model should be operating on minibatches produced by DeepDetector's

NeighborLoader routine, but the adjacency matrix is calculated using the full graph resulting in memory errors:

  File "/home/red/dl-graph/simplest.py", line 12, in <module>
    model.fit(data)
  File "/home/red/miniforge3/envs/dl-graph-gpu/lib/python3.10/site-packages/pygod/detector/base.py", line 431, in fit
    self.process_graph(data)
  File "/home/red/miniforge3/envs/dl-graph-gpu/lib/python3.10/site-packages/pygod/detector/dominant.py", line 139, in process_graph
    DOMINANTBase.process_graph(data)
  File "/home/red/miniforge3/envs/dl-graph-gpu/lib/python3.10/site-packages/pygod/nn/dominant.py", line 132, in process_graph
    data.s = to_dense_adj(data.edge_index)[0]
  File "/home/red/miniforge3/envs/dl-graph-gpu/lib/python3.10/site-packages/torch_geometric/utils/_to_dense_adj.py", line 97, in to_dense_adj
    adj = scatter(edge_attr, idx, dim=0, dim_size=flattened_size, reduce='sum')
  File "/home/red/miniforge3/envs/dl-graph-gpu/lib/python3.10/site-packages/torch_geometric/utils/_scatter.py", line 75, in scatter
    return src.new_zeros(size).scatter_add_(dim, index, src)
RuntimeError: [enforce fail at alloc_cpu.cpp:118] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 166087221444 bytes. Error code 12 (Cannot allocate memory)```

I'm not sure whether this is a limitation of the algorithm, or a bug. If it's a limitation of the algorithm, the documentation and error messages don't really explain what to expect or how to avoid it. 

Here's some system information:
## System Information

- **PyTorch version:**  
  `2.6.0+cu124`

- **PyTorch Geometric version:**  
  `2.6.1`

- **PyGOD version:**  
  `1.1.0`

- **Python version:**  
  `Python 3.10.17`

- **OS:**  
  `Linux The-Tarrasque 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux`

- **CUDA version:**  

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
12.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions