-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Description
To report a problem with TensorBoard itself, please fill out the
remainder of this template.
Environment information (required)
Please run diagnose_tensorboard.py
(link below) in the same
environment from which you normally run TensorFlow/TensorBoard, and
paste the output here:
/JAX/xla/xla/service/cpu/benchmarks/e2e/gemma2/keras$ python diagnose_tensorboard.py
Diagnostics
Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version c6ca9f1d004e2a1bc7c160abc43be229b82cad7e
--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='ip-10-252-30-225', release='6.8.0-1021-aws', version='#23~22.04.1-Ubuntu SMP Tue Dec 10 16:50:46 UTC 2024', machine='x86_64')
INFO: sys.getwindowsversion(): N/A
--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: '/home/../venv/gemma2-keras'
--- check: installed_packages
INFO: installed: tensorboard==2.18.0
INFO: installed: tensorflow==2.18.0
WARNING: no installation among: ['tensorflow-estimator', 'tensorflow-estimator-2.0-preview', 'tf-estimator-nightly']
INFO: installed: tensorboard-data-server==0.7.2
--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.18.0'
--- check: tensorflow_python_version
2025-02-17 17:43:29.606821: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-02-17 17:43:29.616812: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1739814209.629483 7716 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739814209.632905 7716 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-17 17:43:29.644947: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO: tensorflow.__version__: '2.18.0'
INFO: tensorflow.__git_version__: 'v2.18.0-rc2-4-g6550e4bd802'
--- check: tensorboard_data_server_version
INFO: data server binary: '/home/.../venv/gemma2-keras/lib/python3.10/site-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.7.2'
--- check: tensorboard_binary_path
INFO: which tensorboard: b'/home/../venv/gemma2-keras/bin/tensorboard\n'
--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]
--- check: readable_fqdn
INFO: socket.getfqdn(): 'ip-10-252-30-225.eu-west-1.compute.internal'
--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=8278350, st_dev=66305, st_nlink=2, st_uid=1007, st_gid=1008, st_size=4096, st_atime=1739813704, st_mtime=1739814201, st_ctime=1739814201)
INFO: mode: 0o40777
--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/home/.../venv/gemma2-keras/lib/python3.10/site-packages']; bad_roots (0): []
--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==2.1.0
astunparse==1.6.3
certifi==2024.12.14
charset-normalizer==3.4.1
etils==1.12.0
filelock==3.16.1
flatbuffers==24.12.23
fsspec==2024.12.0
gast==0.6.0
google-pasta==0.2.0
grpcio==1.69.0
gviz-api==1.10.0
h5py==3.12.1
idna==3.10
importlib_resources==6.5.2
jax==0.4.38
jaxlib==0.4.38
Jinja2==3.1.5
kagglehub==0.3.6
keras==3.8.0
keras-hub==0.18.1
keras-nlp==0.18.1
libclang==18.1.1
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==3.0.2
mdurl==0.1.2
ml-dtypes==0.4.1
mpmath==1.3.0
namex==0.0.8
networkx==3.4.2
numpy==2.0.2
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
opt_einsum==3.4.0
optree==0.13.1
packaging==24.2
pip==22.0.2
protobuf==4.25.6
Pygments==2.19.1
regex==2024.11.6
requests==2.32.3
rich==13.9.4
scipy==1.15.0
setuptools==59.6.0
six==1.17.0
sympy==1.13.1
tensorboard==2.18.0
tensorboard-data-server==0.7.2
tensorboard-plugin-profile==2.19.0
tensorflow==2.18.0
tensorflow-io-gcs-filesystem==0.37.1
tensorflow-text==2.18.1
termcolor==2.5.0
torch==2.5.1
tqdm==4.67.1
triton==3.1.0
typing_extensions==4.12.2
urllib3==2.3.0
Werkzeug==3.1.3
wheel==0.45.1
wrapt==1.17.0
zipp==3.21.0
Next steps
No action items identified. Please copy ALL of the above output,
including the lines containing only backticks, into your GitHub issue
or comment. Be sure to redact any sensitive information.
Issue description
I am running the example on the CPU provided here https://docs.jax.dev/en/latest/profiling.html
import jax
jax.profiler.start_trace("/tmp/tensorboard")
# Run the operations to be profiled
key = jax.random.key(0)
x = jax.random.normal(key, (5000, 5000))
y = x @ x
y.block_until_ready()
jax.profiler.stop_trace()
However I see no trace capture for the default example:

Metadata
Metadata
Assignees
Labels
No labels