Skip to content

Add support for symbol maps to emsymbolizer #24735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 18, 2025

Conversation

dschuff
Copy link
Member

@dschuff dschuff commented Jul 17, 2025

Read symbol information from the symbol map, and function offset info
from the binary, and match them up.

Read symbol information from the symbol map, and function offset info
from the binary, and match them up.
@dschuff dschuff requested review from kripken and aheejin July 17, 2025 22:28
@@ -10950,6 +10950,26 @@ def check_func_info(filename, address, func):
# The name section will not show bar, as it's inlined into main
check_func_info('test_dwarf.wasm', unreachable_addr, '__original_main')

# 2. Test symbol map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this a separate test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently have all emsymbolizer modes that print function + lines within a function:

emscripten/test/test_other.py

Lines 10867 to 10930 in a524013

def test_emsymbolizer_srcloc(self):
'Test emsymbolizer use cases that provide src location granularity info'
def check_dwarf_loc_info(address, funcs, locs):
out = self.run_process(
[emsymbolizer, '-s', 'dwarf', 'test_dwarf.wasm', address],
stdout=PIPE).stdout
for func in funcs:
self.assertIn(func, out)
for loc in locs:
self.assertIn(loc, out)
def check_source_map_loc_info(address, loc):
out = self.run_process(
[emsymbolizer, '-s', 'sourcemap', 'test_dwarf.wasm', address],
stdout=PIPE).stdout
self.assertIn(loc, out)
# We test two locations within test_dwarf.c:
# out_to_js(0); // line 6
# __builtin_trap(); // line 13
self.run_process([EMCC, test_file('core/test_dwarf.c'),
'-g', '-gsource-map', '-O1', '-o', 'test_dwarf.js'])
# Address of out_to_js(0) within foo(), uninlined
out_to_js_call_addr = self.get_instr_addr('call\t0', 'test_dwarf.wasm')
# Address of __builtin_trap() within bar(), inlined into main()
unreachable_addr = self.get_instr_addr('unreachable', 'test_dwarf.wasm')
# Function name of out_to_js(0) within foo(), uninlined
out_to_js_call_func = ['foo']
# Function names of __builtin_trap() within bar(), inlined into main(). The
# first one corresponds to the innermost inlined function.
unreachable_func = ['bar', 'main']
# Source location of out_to_js(0) within foo(), uninlined
out_to_js_call_loc = ['test_dwarf.c:6:3']
# Source locations of __builtin_trap() within bar(), inlined into main().
# The first one corresponds to the innermost inlined location.
unreachable_loc = ['test_dwarf.c:13:3', 'test_dwarf.c:18:3']
# 1. Test DWARF + source map together
# For DWARF, we check for the full inlined info for both function names and
# source locations. Source maps provide neither function names nor inlined
# info. So we only check for the source location of the outermost function.
check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
out_to_js_call_loc)
check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_loc[0])
check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)
check_source_map_loc_info(unreachable_addr, unreachable_loc[0])
# 2. Test source map only
# The addresses, function names, and source locations are the same across
# the builds because they are relative offsets from the code section, so we
# don't need to recompute them
self.run_process([EMCC, test_file('core/test_dwarf.c'),
'-gsource-map', '-O1', '-o', 'test_dwarf.js'])
check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_loc[0])
check_source_map_loc_info(unreachable_addr, unreachable_loc[0])
# 3. Test DWARF only
self.run_process([EMCC, test_file('core/test_dwarf.c'),
'-g', '-O1', '-o', 'test_dwarf.js'])
check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
out_to_js_call_loc)
check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)

And this function contains emsymbolizer modes that only print functions. So I think it'd be consistent to keep it this way or also split test_emsymbolizer_srcloc into three different functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that (function-only vs src location granularity) was my rationale for keeping this together with the name section tests. The test assertions are basically duplicated (DAMP-style) but it seemed best to keep them togeher. But if you feel strongly we can split both of these tests up.

@dschuff dschuff merged commit cbc71d9 into emscripten-core:main Jul 18, 2025
30 checks passed
cwoffenden pushed a commit to cwoffenden/emscripten that referenced this pull request Jul 19, 2025
Read symbol information from the symbol map, and function offset info
from the binary, and match them up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants