Add support for symbol maps to emsymbolizer #24735

dschuff · 2025-07-17T22:28:30Z

Read symbol information from the symbol map, and function offset info
from the binary, and match them up.

Read symbol information from the symbol map, and function offset info from the binary, and match them up.

test/test_other.py

kripken · 2025-07-18T03:41:00Z

test/test_other.py

@@ -10950,6 +10950,26 @@ def check_func_info(filename, address, func):
    # The name section will not show bar, as it's inlined into main
    check_func_info('test_dwarf.wasm', unreachable_addr, '__original_main')

+    # 2. Test symbol map


Maybe make this a separate test?

We currently have all emsymbolizer modes that print function + lines within a function:

emscripten/test/test_other.py

Lines 10867 to 10930 in a524013

def test_emsymbolizer_srcloc(self):

'Test emsymbolizer use cases that provide src location granularity info'

def check_dwarf_loc_info(address, funcs, locs):

out = self.run_process(

[emsymbolizer, '-s', 'dwarf', 'test_dwarf.wasm', address],

stdout=PIPE).stdout

for func in funcs:

self.assertIn(func, out)

for loc in locs:

self.assertIn(loc, out)

def check_source_map_loc_info(address, loc):

out = self.run_process(

[emsymbolizer, '-s', 'sourcemap', 'test_dwarf.wasm', address],

stdout=PIPE).stdout

self.assertIn(loc, out)

# We test two locations within test_dwarf.c:

# out_to_js(0); // line 6

# __builtin_trap(); // line 13

self.run_process([EMCC, test_file('core/test_dwarf.c'),

'-g', '-gsource-map', '-O1', '-o', 'test_dwarf.js'])

# Address of out_to_js(0) within foo(), uninlined

out_to_js_call_addr = self.get_instr_addr('call\t0', 'test_dwarf.wasm')

# Address of __builtin_trap() within bar(), inlined into main()

unreachable_addr = self.get_instr_addr('unreachable', 'test_dwarf.wasm')

# Function name of out_to_js(0) within foo(), uninlined

out_to_js_call_func = ['foo']

# Function names of __builtin_trap() within bar(), inlined into main(). The

# first one corresponds to the innermost inlined function.

unreachable_func = ['bar', 'main']

# Source location of out_to_js(0) within foo(), uninlined

out_to_js_call_loc = ['test_dwarf.c:6:3']

# Source locations of __builtin_trap() within bar(), inlined into main().

# The first one corresponds to the innermost inlined location.

unreachable_loc = ['test_dwarf.c:13:3', 'test_dwarf.c:18:3']

# 1. Test DWARF + source map together

# For DWARF, we check for the full inlined info for both function names and

# source locations. Source maps provide neither function names nor inlined

# info. So we only check for the source location of the outermost function.

check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,

out_to_js_call_loc)

check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_loc[0])

check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)

check_source_map_loc_info(unreachable_addr, unreachable_loc[0])

# 2. Test source map only

# The addresses, function names, and source locations are the same across

# the builds because they are relative offsets from the code section, so we

# don't need to recompute them

self.run_process([EMCC, test_file('core/test_dwarf.c'),

'-gsource-map', '-O1', '-o', 'test_dwarf.js'])

check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_loc[0])

check_source_map_loc_info(unreachable_addr, unreachable_loc[0])

# 3. Test DWARF only

self.run_process([EMCC, test_file('core/test_dwarf.c'),

'-g', '-O1', '-o', 'test_dwarf.js'])

check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,

out_to_js_call_loc)

check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)

And this function contains emsymbolizer modes that only print functions. So I think it'd be consistent to keep it this way or also split test_emsymbolizer_srcloc into three different functions.

Yes, that (function-only vs src location granularity) was my rationale for keeping this together with the name section tests. The test assertions are basically duplicated (DAMP-style) but it seemed best to keep them togeher. But if you feel strongly we can split both of these tests up.

…omment

Read symbol information from the symbol map, and function offset info from the binary, and match them up.

Add support for symbol maps to emsymbolizer

3428db9

Read symbol information from the symbol map, and function offset info from the binary, and match them up.

dschuff requested review from kripken and aheejin July 17, 2025 22:28

aheejin approved these changes Jul 18, 2025

View reviewed changes

test/test_other.py Outdated Show resolved Hide resolved

kripken approved these changes Jul 18, 2025

View reviewed changes

dschuff added 3 commits July 18, 2025 17:34

remove unneeded debug print, add debug print behind DEBUG flag, fix c…

feb893c

…omment

fix strings, ruff check

8ac6232

Merge branch 'main' into symbolmap

53c0107

dschuff merged commit cbc71d9 into emscripten-core:main Jul 18, 2025
30 checks passed

cwoffenden pushed a commit to cwoffenden/emscripten that referenced this pull request Jul 19, 2025

Add support for symbol maps to emsymbolizer (emscripten-core#24735)

6803545

Read symbol information from the symbol map, and function offset info from the binary, and match them up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for symbol maps to emsymbolizer #24735

Add support for symbol maps to emsymbolizer #24735

Uh oh!

dschuff commented Jul 17, 2025

Uh oh!

Uh oh!

kripken Jul 18, 2025

Uh oh!

aheejin Jul 18, 2025

Uh oh!

dschuff Jul 18, 2025

Uh oh!

Uh oh!

Uh oh!

	def test_emsymbolizer_srcloc(self):
	'Test emsymbolizer use cases that provide src location granularity info'
	def check_dwarf_loc_info(address, funcs, locs):
	out = self.run_process(
	[emsymbolizer, '-s', 'dwarf', 'test_dwarf.wasm', address],
	stdout=PIPE).stdout
	for func in funcs:
	self.assertIn(func, out)
	for loc in locs:
	self.assertIn(loc, out)

	def check_source_map_loc_info(address, loc):
	out = self.run_process(
	[emsymbolizer, '-s', 'sourcemap', 'test_dwarf.wasm', address],
	stdout=PIPE).stdout
	self.assertIn(loc, out)

	# We test two locations within test_dwarf.c:
	# out_to_js(0); // line 6
	# __builtin_trap(); // line 13
	self.run_process([EMCC, test_file('core/test_dwarf.c'),
	'-g', '-gsource-map', '-O1', '-o', 'test_dwarf.js'])
	# Address of out_to_js(0) within foo(), uninlined
	out_to_js_call_addr = self.get_instr_addr('call\t0', 'test_dwarf.wasm')
	# Address of __builtin_trap() within bar(), inlined into main()
	unreachable_addr = self.get_instr_addr('unreachable', 'test_dwarf.wasm')

	# Function name of out_to_js(0) within foo(), uninlined
	out_to_js_call_func = ['foo']
	# Function names of __builtin_trap() within bar(), inlined into main(). The
	# first one corresponds to the innermost inlined function.
	unreachable_func = ['bar', 'main']

	# Source location of out_to_js(0) within foo(), uninlined
	out_to_js_call_loc = ['test_dwarf.c:6:3']
	# Source locations of __builtin_trap() within bar(), inlined into main().
	# The first one corresponds to the innermost inlined location.
	unreachable_loc = ['test_dwarf.c:13:3', 'test_dwarf.c:18:3']

	# 1. Test DWARF + source map together
	# For DWARF, we check for the full inlined info for both function names and
	# source locations. Source maps provide neither function names nor inlined
	# info. So we only check for the source location of the outermost function.
	check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
	out_to_js_call_loc)
	check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_loc[0])
	check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)
	check_source_map_loc_info(unreachable_addr, unreachable_loc[0])

	# 2. Test source map only
	# The addresses, function names, and source locations are the same across
	# the builds because they are relative offsets from the code section, so we
	# don't need to recompute them
	self.run_process([EMCC, test_file('core/test_dwarf.c'),
	'-gsource-map', '-O1', '-o', 'test_dwarf.js'])
	check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_loc[0])
	check_source_map_loc_info(unreachable_addr, unreachable_loc[0])

	# 3. Test DWARF only
	self.run_process([EMCC, test_file('core/test_dwarf.c'),
	'-g', '-O1', '-o', 'test_dwarf.js'])
	check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
	out_to_js_call_loc)
	check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)

Add support for symbol maps to emsymbolizer #24735

Add support for symbol maps to emsymbolizer #24735

Uh oh!

Conversation

dschuff commented Jul 17, 2025

Uh oh!

Uh oh!

kripken Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

aheejin Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

dschuff Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!