Skip to content

ruby: add DTV-based TLS access for ruby_current_ec#1226

Merged
christos68k merged 9 commits intoopen-telemetry:mainfrom
Shopify:ruby-dtv-ec-lookup-upstream
Apr 7, 2026
Merged

ruby: add DTV-based TLS access for ruby_current_ec#1226
christos68k merged 9 commits intoopen-telemetry:mainfrom
Shopify:ruby-dtv-ec-lookup-upstream

Conversation

@dalehamel
Copy link
Copy Markdown
Contributor

@dalehamel dalehamel commented Mar 4, 2026

What

Add DTV-based TLS access for finding ruby_current_ec when TLSDESC relocations are unavailable, and a shared dtv_read() eBPF helper for traversing the Dynamic Thread Vector.

Partially addresses #883, at least for ruby

Why

On x86_64 with the default TLS dialect, libruby.so uses R_X86_64_DTPMOD64 relocations instead of TLSDESC for thread-local storage. The existing Ruby profiler only supports TLSDESC-based TLS access, so it cannot find ruby_current_ec on these systems and falls back to the ractor path (which is slower and less reliable). This is also needed for systems where TLSDESC may not be available.

How

  • Adds a shared dtv_read() helper in support/ebpf/tsd.h (alongside existing tsd_read()) that traverses the DTV using DTVInfo from Extract DTV info from __tls_get_addr, add to LibcInfo #929's libc introspection. This helper is available to all interpreters, not just Ruby.
  • Generalizes VisitTLSRelocations into VisitRelocations with a pluggable relocation type filter in libpf/pfelf/file.go, enabling lookup of DTPMOD64 relocations to find the TLS module ID offset for libruby.so.
  • Adds DTVInfo, current_ec_tls_offset, and tls_module_id fields to RubyProcInfo so the eBPF unwinder can use DTV traversal.
  • Implements UpdateLibcInfo for Ruby to receive DTVInfo when the libc package provides it (may arrive from a separate DSO like ld-linux.so).
  • Adds a DTV fallback path in ruby_tracer.ebpf.c between the existing TLSDESC path and the ractor fallback.
  • Adds metricID_UnwindErrBadDTVRead for observability of DTV read failures.
  • Includes an amd64 coredump test (aarch64 not possible — linker relaxes GD→TLSDESC on aarch64).

NOTE: This depends on coredump artifacts which need to be added to the CI modulestore, they've been uploaded to a shared drive that @florianl has access to. Please re-run CI after uploading these artifacts.

@dalehamel
Copy link
Copy Markdown
Contributor Author

flagging that #1229 also uses DTV stuff, we need to verify that these two PRs don't conflict, and are aligned in their approach for DTV access.

Copy link
Copy Markdown
Member

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just minor style comments. maybe we could add a dtv check helper function and use it in rubys Loader() and VisitTLSRelocations().

Comment thread interpreter/ruby/ruby.go Outdated
Comment thread support/ebpf/tsd.h Outdated
Comment thread libpf/pfelf/file.go Outdated
@dalehamel dalehamel force-pushed the ruby-dtv-ec-lookup-upstream branch from 18d5d03 to 1b19af1 Compare March 9, 2026 14:02
@dalehamel
Copy link
Copy Markdown
Contributor Author

just minor style comments.

@florianl thanks for the first pass, addressed your nits I believe, but I need clarification on

maybe we could add a dtv check helper function and use it in rubys Loader() and VisitTLSRelocations().

Did you mean something along these lines?

diff --git a/interpreter/ruby/ruby.go b/interpreter/ruby/ruby.go
index 2f3eeddf..36c51210 100644
--- a/interpreter/ruby/ruby.go
+++ b/interpreter/ruby/ruby.go
@@ -4,7 +4,6 @@
 package ruby // import "go.opentelemetry.io/ebpf-profiler/interpreter/ruby"
 
 import (
-	"debug/elf"
 	"encoding/binary"
 	"errors"
 	"fmt"
@@ -1405,30 +1404,27 @@ func Loader(ebpf interpreter.EbpfHandler, info *interpreter.LoaderInfo) (interpr
 	// ruby_current_ec TLS symbol.
 	// We could potentially add a fallback for this in the future, but for now
 	// only unstripped ruby is supported. Many distro supplied rubies are stripped.
-	if err = ef.VisitTLSRelocations(func(r pfelf.ElfReloc, symName string) bool {
-		if symName == rubyCurrentEcTlsSymbol ||
-			libpf.SymbolValue(r.Addend) == currentEcSymbolAddress {
-			currentEcTpBaseTlsOffset = libpf.Address(r.Off)
-			return false
+	//
+	// We scan relocations in a single pass, looking for both TLSDESC (preferred)
+	// and DTPMOD64 (DTV fallback when TLSDESC is unavailable).
+	var tlsModuleIdOffset libpf.Address
+	if err = ef.VisitRelocations(func(r pfelf.ElfReloc, symName string) bool {
+		switch {
+		case ef.IsTLSDesc(r):
+			if symName == rubyCurrentEcTlsSymbol ||
+				libpf.SymbolValue(r.Addend) == currentEcSymbolAddress {
+				currentEcTpBaseTlsOffset = libpf.Address(r.Off)
+				return false
+			}
+		case ef.IsDTPMOD64(r):
+			log.Debugf("Found DTPMOD64 relocation at offset %x", r.Off)
+			tlsModuleIdOffset = libpf.Address(r.Off)
 		}
 		return true
+	}, func(r pfelf.ElfReloc) bool {
+		return ef.IsTLSDesc(r) || ef.IsDTPMOD64(r)
 	}); err != nil {
-		log.Warnf("failed to locate TLS descriptor: %v", err)
-	}
-
-	// Look for DTPMOD64 relocation to find the TLS module ID offset.
-	// This is used for DTV-based TLS access when TLSDESC is unavailable.
-	var tlsModuleIdOffset libpf.Address
-	if err = ef.VisitRelocations(func(r pfelf.ElfReloc, _ string) bool {
-		log.Debugf("Found DTPMOD64 relocation at offset %x", r.Off)
-		tlsModuleIdOffset = libpf.Address(r.Off)
-		return false
-	}, func(rela pfelf.ElfReloc) bool {
-		ty := rela.Info & 0xffff
-		return (ef.Machine == elf.EM_AARCH64 && elf.R_AARCH64(ty) == elf.R_AARCH64_TLS_DTPMOD64) ||
-			(ef.Machine == elf.EM_X86_64 && elf.R_X86_64(ty) == elf.R_X86_64_DTPMOD64)
-	}); err != nil {
-		log.Warnf("failed to find DTPMOD64 relocation: %v", err)
+		log.Warnf("failed to locate TLS relocations: %v", err)
 	}
 
 	log.Debugf("Discovered EC tls tpbase offset %x, dtpmod offset %x, fallback ctx %x, interp ranges: %v, global symbols: %x",
diff --git a/libpf/pfelf/file.go b/libpf/pfelf/file.go
index 27581c09..ed3f1ce9 100644
--- a/libpf/pfelf/file.go
+++ b/libpf/pfelf/file.go
@@ -635,16 +635,39 @@ func (f *File) DebuglinkFileName(elfFilePath string, elfOpener ELFOpener) string
 
 type ElfReloc *elf.Rela64
 
+// IsTLSDesc returns true if the relocation is a TLSDESC relocation for the
+// machine type of this ELF file.
+func (f *File) IsTLSDesc(rela ElfReloc) bool {
+	ty := rela.Info & 0xffff
+	switch f.Machine {
+	case elf.EM_AARCH64:
+		return elf.R_AARCH64(ty) == elf.R_AARCH64_TLSDESC
+	case elf.EM_X86_64:
+		return elf.R_X86_64(ty) == elf.R_X86_64_TLSDESC
+	default:
+		return false
+	}
+}
+
+// IsDTPMOD64 returns true if the relocation is a DTPMOD64 relocation for the
+// machine type of this ELF file.
+func (f *File) IsDTPMOD64(rela ElfReloc) bool {
+	ty := rela.Info & 0xffff
+	switch f.Machine {
+	case elf.EM_AARCH64:
+		return elf.R_AARCH64(ty) == elf.R_AARCH64_TLS_DTPMOD64
+	case elf.EM_X86_64:
+		return elf.R_X86_64(ty) == elf.R_X86_64_DTPMOD64
+	default:
+		return false
+	}
+}
+
 // VisitTLSDescriptors visits all TLS relocations and provides the relocation
 // for the TLS symbol, as well as a best-effort string for the symbol's name
 // it continues until the visitor returns false
 func (f *File) VisitTLSRelocations(visitor func(ElfReloc, string) bool) error {
-	checkFunc := func(rela ElfReloc) bool {
-		ty := rela.Info & 0xffff
-		return (f.Machine == elf.EM_AARCH64 && elf.R_AARCH64(ty) == elf.R_AARCH64_TLSDESC) ||
-			(f.Machine == elf.EM_X86_64 && elf.R_X86_64(ty) == elf.R_X86_64_TLSDESC)
-	}
-	return f.VisitRelocations(visitor, checkFunc)
+	return f.VisitRelocations(visitor, f.IsTLSDesc)
 }

-- 
2.52.0

Where this IsDTPMOD64 is the sort of 'can we fall back to DTV' helper I'm envisioning, but we prefer tls desc and then fallback accordingly when visiting the relocations.

If not, could you clarify what kind of DTV helper you meant?

@florianl
Copy link
Copy Markdown
Member

Did you mean something along these lines?

Yeah, I had something like this in mind. But its also fine for me without.

@dalehamel
Copy link
Copy Markdown
Contributor Author

Did you mean something along these lines?

Yeah, I had something like this in mind. But its also fine for me without.

For now i think let's leave without, we can revisit refactoring this code later - for now this feels premature to me.

Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a small comment on the filter function to make it just a const instead of function argument. Also, would it be possible to get musl based coredump test?

Comment thread interpreter/ruby/ruby.go Outdated
@dalehamel dalehamel force-pushed the ruby-dtv-ec-lookup-upstream branch from 84eba46 to 1a9790f Compare March 19, 2026 19:24
@dalehamel
Copy link
Copy Markdown
Contributor Author

dalehamel commented Mar 19, 2026

a small comment on the filter function to make it just a const instead of function argument.

Good idea, that's a lot cleaner. No need for debug/elf anymore, which always felt a bit wrong to me. I set it up to work with a bitmask as you said, take a look and let me know if that is what you had in mind.

Also, would it be possible to get musl based coredump test?

Done, this ended up being more of an adventure than I thought - i was taking the coredumps inside an alpine docker container, but it turns out if you try and import them on a glibc-based host, it fails in strange ways. Took me a while to realize this was because of CGO / using gcore built for the wrong libc. As an aside, we should probably update the coredump tooling to bail if the libc's don't match when calling coredump new. The fix was generate the coredump artifacts from INSIDE the docker container.

As expected though, the PR is now failing CI - @florianl can you please import ruby-3.4.7-musl-dtv-loop-amd64.moduledata.tar ? That should cause CI to pass (I also verified the tests themselves should be able to run despite a different glibc, it is only the importing that is problematic, FWIW).

FWIW I also ran validation tests where I explicitly forced the "single_ractor_main" fallback for the EC to fail by updating the symbol name to append _DISABLED. On main, both coredump tests fail, but on this branch, they are able to succeed because the DTV path works. So, verified that they legitimately fix this case in the absence of that fallback. We could theoretically remove it now, i'm not sure what other cases it would still be useful in, but i'll leave that out of the scope of this PR.

Hopefully, when we upload the new blob and rerun, everything passes 🤞

@florianl florianl requested a review from fabled March 20, 2026 10:13
Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One nit, but approving at this point.

Comment thread libpf/pfelf/file.go Outdated
Comment on lines +683 to +685
checkFunc := func(rela ElfReloc) bool {
return f.classifyReloc(rela)&relTypes != 0
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Perhaps the switch f.Machine could be here, and it assigns checkFunc (maybe call it filterFunc instead?) to a previously defined "plain" function that is architecture specific. We get the switch out from inner loop, and the code simplifies a little bit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fabled i think i've addressed this, let me know if it isn't what you had in mind.

@dalehamel dalehamel force-pushed the ruby-dtv-ec-lookup-upstream branch 3 times, most recently from dece87c to d1f7408 Compare March 24, 2026 13:13
Comment thread support/ebpf/tsd.h Outdated
Comment thread tools/coredump/testdata/amd64/ruby-3.4.7-dtv-loop.json
@dalehamel
Copy link
Copy Markdown
Contributor Author

@fabled @florianl I have put this back in draft as from @nsavoire 's discoveries above, there are two issues with the code that this depends on that need to be addressed first. I have opened a PR for each one:

My profuse apologies for both oversights, which made it into main in #929 which I submitted.

Also to address the issue of the coredump tests not failing on main, I want to resubmit them but first run:

objcopy --strip-symbol=ruby_single_main_ractor libruby.so.3.4

This will allow the coredump artifact itself to not do the single_main_ractor_fallback, rather than having to fudge the symbol in go as I had done above.

I'll prepare new coredump tests and if/when the above two PRs land, rebase this branch and re-open it for review - but it should not be merged in its current state.

@dalehamel dalehamel force-pushed the ruby-dtv-ec-lookup-upstream branch 2 times, most recently from a8756d6 to b6ef3e3 Compare April 1, 2026 19:23
@dalehamel
Copy link
Copy Markdown
Contributor Author

dalehamel commented Apr 1, 2026

@florianl i've added two replacement coredumps, please add these to the CI module data (and we can remove the previously added dtv ones as unused). These two should properly fail on main without any modifications to the source code, as i've stripped the fallback symbol out of libruby.so before taking the coredump.

I've rebased with the other fixes mentioned above that have since merged, this PR should be green and ready to go again once the module data is uploaded.

@dalehamel dalehamel marked this pull request as ready for review April 1, 2026 19:25
@florianl
Copy link
Copy Markdown
Member

florianl commented Apr 2, 2026

@fabled can you merge this PR?

@dalehamel
Copy link
Copy Markdown
Contributor Author

Thanks for uploading, to prove that the coredump tests fail on main, i checked them out onto a branch based on main and ran CI:

https://github.com/open-telemetry/opentelemetry-ebpf-profiler/actions/runs/23900868312/job/69696908798?pr=1318

as expected they both fail

--- FAIL: TestCoreDumps (9.77s)
    --- FAIL: TestCoreDumps/testdata/amd64/ruby-3.4.7-dtv-loop.json (1.31s)
        coredump_test.go:40: 
            	Error Trace:	/home/runner/work/opentelemetry-ebpf-profiler/opentelemetry-ebpf-profiler/tools/coredump/coredump_test.go:40
            	Error:      	Not equal: 
            	            	expected: []main.ThreadInfo{main.ThreadInfo{LWP:0x26902, Frames:[]string{"libruby.so.3.4.7+0x312edf", "Object#is_prime+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:15", "libruby.so.3.4.7+0x316278", "libruby.so.3.4.7+0x31b0b7", "libruby.so.3.4.7+0x225a4d", "libruby.so.3.4.7+0x2fc1eb", "libruby.so.3.4.7+0x302c2e", "libruby.so.3.4.7+0x3103ad", "Range#each+0 in <cfunc>:0", "Object#is_prime+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:14", "Object#sum_of_primes+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:24", "block (2 levels) in <main>+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:34", "libruby.so.3.4.7+0x3162f1", "libruby.so.3.4.7+0x31b0b7", "libruby.so.3.4.7+0x22593d", "libruby.so.3.4.7+0x2fad84", "libruby.so.3.4.7+0x302c2e", "libruby.so.3.4.7+0x3103ad", "Range#each+0 in <cfunc>:0", "block in <main>+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:33", "Kernel#loop+0 in <internal:kernel>:168", "<main>+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:32", "libruby.so.3.4.7+0x316278", "libruby.so.3.4.7+0x11eb28", "libruby.so.3.4.7+0x1225ba", "ruby+0x1111", "libc.so.6+0x27249", "libc.so.6+0x27304", "ruby+0x1150"}}, main.ThreadInfo{LWP:0x26904, Frames:[]string{"libc.so.6+0x108f26", "libruby.so.3.4.7+0x2bbdf4", "libc.so.6+0x891f4", "libc.so.6+0x1098db"}}}
            	            	actual  : []main.ThreadInfo{main.ThreadInfo{LWP:0x26902, Frames:[]string{"libruby.so.3.4.7+0x312edf", "libruby.so.3.4.7+0x316278", "libruby.so.3.4.7+0x31b0b7", "libruby.so.3.4.7+0x225a4d", "libruby.so.3.4.7+0x2fc1eb", "libruby.so.3.4.7+0x302c2e", "libruby.so.3.4.7+0x3103ad", "libruby.so.3.4.7+0x3162f1", "libruby.so.3.4.7+0x31b0b7", "libruby.so.3.4.7+0x22593d", "libruby.so.3.4.7+0x2fad84", "libruby.so.3.4.7+0x302c2e", "libruby.so.3.4.7+0x3103ad", "libruby.so.3.4.7+0x316278", "libruby.so.3.4.7+0x11eb28", "libruby.so.3.4.7+0x1225ba", "ruby+0x1111", "libc.so.6+0x27249", "libc.so.6+0x27304", "ruby+0x1150"}}, main.ThreadInfo{LWP:0x26904, Frames:[]string{"libc.so.6+0x108f26", "libruby.so.3.4.7+0x2bbdf4", "libc.so.6+0x891f4", "libc.so.6+0x1098db"}}}
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -3,5 +3,4 @@
            	            	   LWP: (uint32) 157954,
            	            	-  Frames: ([]string) (len=29) {
            	            	+  Frames: ([]string) (len=20) {
            	            	    (string) (len=25) "libruby.so.3.4.7+0x312edf",
            	            	-   (string) (len=107) "Object#is_prime+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:15",
            	            	    (string) (len=25) "libruby.so.3.4.7+0x316278",
            	            	@@ -12,6 +11,2 @@
            	            	    (string) (len=25) "libruby.so.3.4.7+0x3103ad",
            	            	-   (string) (len=25) "Range#each+0 in <cfunc>:0",
            	            	-   (string) (len=107) "Object#is_prime+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:14",
            	            	-   (string) (len=112) "Object#sum_of_primes+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:24",
            	            	-   (string) (len=118) "block (2 levels) in <main>+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:34",
            	            	    (string) (len=25) "libruby.so.3.4.7+0x3162f1",
            	            	@@ -22,6 +17,2 @@
            	            	    (string) (len=25) "libruby.so.3.4.7+0x3103ad",
            	            	-   (string) (len=25) "Range#each+0 in <cfunc>:0",
            	            	-   (string) (len=107) "block in <main>+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:33",
            	            	-   (string) (len=38) "Kernel#loop+0 in <internal:kernel>:168",
            	            	-   (string) (len=98) "<main>+0 in /home/dalehamel/opentelemetry-ebpf-profiler/tools/coredump/testsources/ruby/loop.rb:32",
            	            	    (string) (len=25) "libruby.so.3.4.7+0x316278",
            	Test:       	TestCoreDumps/testdata/amd64/ruby-3.4.7-dtv-loop.json
    --- FAIL: TestCoreDumps/testdata/amd64/ruby-3.4.7-musl-dtv-loop.json (1.51s)
        coredump_test.go:40: 
            	Error Trace:	/home/runner/work/opentelemetry-ebpf-profiler/opentelemetry-ebpf-profiler/tools/coredump/coredump_test.go:40
            	Error:      	Not equal: 
            	            	expected: []main.ThreadInfo{main.ThreadInfo{LWP:0x1, Frames:[]string{"libruby.so.3.4.7+0x339606", "libruby.so.3.4.7+0x33df39", "libruby.so.3.4.7+0x342c27", "libruby.so.3.4.7+0x23c7a8", "libruby.so.3.4.7+0x3217b0", "libruby.so.3.4.7+0x328d64", "libruby.so.3.4.7+0x3393c5", "Range#each+0 in <cfunc>:0", "Object#is_prime+0 in /src/tools/coredump/testsources/ruby/loop.rb:14", "Object#sum_of_primes+0 in /src/tools/coredump/testsources/ruby/loop.rb:24", "block (2 levels) in <main>+0 in /src/tools/coredump/testsources/ruby/loop.rb:34", "libruby.so.3.4.7+0x33e009", "libruby.so.3.4.7+0x342c27", "libruby.so.3.4.7+0x23c9ed", "libruby.so.3.4.7+0x32034b", "libruby.so.3.4.7+0x328d64", "libruby.so.3.4.7+0x3393c5", "Range#each+0 in <cfunc>:0", "block in <main>+0 in /src/tools/coredump/testsources/ruby/loop.rb:33", "Kernel#loop+0 in <internal:kernel>:168", "<main>+0 in /src/tools/coredump/testsources/ruby/loop.rb:32", "libruby.so.3.4.7+0x33df39", "libruby.so.3.4.7+0x128ea8", "libruby.so.3.4.7+0x12f2fa", "ruby+0x10ff", "ld-musl-x86_64.so.1+0x41a2f", "ruby+0x1131"}}, main.ThreadInfo{LWP:0x8, Frames:[]string{"ld-musl-x86_64.so.1+0x68ef4", "ld-musl-x86_64.so.1+0x66867", "ld-musl-x86_64.so.1+0x4396b", "libruby.so.3.4.7+0x2dd9e3", "ld-musl-x86_64.so.1+0x67572", "ld-musl-x86_64.so.1+0x68ec0"}}}
            	            	actual  : []main.ThreadInfo{main.ThreadInfo{LWP:0x1, Frames:[]string{"libruby.so.3.4.7+0x339606", "libruby.so.3.4.7+0x33df39", "libruby.so.3.4.7+0x342c27", "libruby.so.3.4.7+0x23c7a8", "libruby.so.3.4.7+0x3217b0", "libruby.so.3.4.7+0x328d64", "libruby.so.3.4.7+0x3393c5", "libruby.so.3.4.7+0x33e009", "libruby.so.3.4.7+0x342c27", "libruby.so.3.4.7+0x23c9ed", "libruby.so.3.4.7+0x32034b", "libruby.so.3.4.7+0x328d64", "libruby.so.3.4.7+0x3393c5", "libruby.so.3.4.7+0x33df39", "libruby.so.3.4.7+0x128ea8", "libruby.so.3.4.7+0x12f2fa", "ruby+0x10ff", "ld-musl-x86_64.so.1+0x41a2f", "ruby+0x1131"}}, main.ThreadInfo{LWP:0x8, Frames:[]string{"ld-musl-x86_64.so.1+0x68ef4", "ld-musl-x86_64.so.1+0x66867", "ld-musl-x86_64.so.1+0x4396b", "libruby.so.3.4.7+0x2dd9e3", "ld-musl-x86_64.so.1+0x67572", "ld-musl-x86_64.so.1+0x68ec0"}}}
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -3,3 +3,3 @@
            	            	   LWP: (uint32) 1,
            	            	-  Frames: ([]string) (len=27) {
            	            	+  Frames: ([]string) (len=19) {
            	            	    (string) (len=25) "libruby.so.3.4.7+0x339606",
            	            	@@ -11,6 +11,2 @@
            	            	    (string) (len=25) "libruby.so.3.4.7+0x3393c5",
            	            	-   (string) (len=25) "Range#each+0 in <cfunc>:0",
            	            	-   (string) (len=68) "Object#is_prime+0 in /src/tools/coredump/testsources/ruby/loop.rb:14",
            	            	-   (string) (len=73) "Object#sum_of_primes+0 in /src/tools/coredump/testsources/ruby/loop.rb:24",
            	            	-   (string) (len=79) "block (2 levels) in <main>+0 in /src/tools/coredump/testsources/ruby/loop.rb:34",
            	            	    (string) (len=25) "libruby.so.3.4.7+0x33e009",
            	            	@@ -21,6 +17,2 @@
            	            	    (string) (len=25) "libruby.so.3.4.7+0x3393c5",
            	            	-   (string) (len=25) "Range#each+0 in <cfunc>:0",
            	            	-   (string) (len=68) "block in <main>+0 in /src/tools/coredump/testsources/ruby/loop.rb:33",
            	            	-   (string) (len=38) "Kernel#loop+0 in <internal:kernel>:168",
            	            	-   (string) (len=59) "<main>+0 in /src/tools/coredump/testsources/ruby/loop.rb:32",
            	            	    (string) (len=25) "libruby.so.3.4.7+0x33df39",
            	Test:       	TestCoreDumps/testdata/amd64/ruby-3.4.7-musl-dtv-loop.json
FAIL
FAIL	go.opentelemetry.io/ebpf-profiler/tools/coredump	9.777s

So this should be (finally) good to go.

@christos68k christos68k enabled auto-merge (squash) April 7, 2026 17:16
@christos68k
Copy link
Copy Markdown
Member

@dalehamel Can you resolve conflicts? I'll merge after.

@christos68k christos68k disabled auto-merge April 7, 2026 17:16
When TLSDESC relocations are unavailable (e.g. some musl setups or
dynamically allocated TLS), the Ruby execution context can still be
found by traversing the Dynamic Thread Vector (DTV).

This commit:

- Adds a shared dtv_read() helper in tsd.h (alongside existing tsd_read)
  that traverses the DTV using DTVInfo from PR open-telemetry#929's libc introspection.
  This helper is available to all interpreters, not just Ruby.

- Generalizes VisitTLSRelocations into VisitRelocations with a pluggable
  relocation type filter, enabling lookup of DTPMOD64 relocations to find
  the TLS module ID offset for libruby.so.

- Adds DTVInfo, current_ec_tls_offset, and tls_module_id fields to
  RubyProcInfo so the eBPF unwinder can use DTV traversal.

- Implements UpdateLibcInfo for Ruby to receive DTVInfo when the libc
  package provides it (may arrive from a separate DSO like ld-linux.so).

- Adds a DTV fallback path in ruby_tracer.ebpf.c between the existing
  TLSDESC path and the ractor fallback.
Adds a coredump test for a shared-library Ruby 3.4.7 on amd64, where
libruby.so uses R_X86_64_DTPMOD64 relocations (the default x86_64 TLS
dialect). This exercises the DTV-based TLS lookup path: VisitRelocations
with DTPMOD64 filter, UpdateLibcInfo with DTVInfo, and dtv_read() in
the eBPF unwinder.

This test is amd64-only because aarch64 linkers aggressively relax
GD TLS to TLSDESC, making it impossible to produce DTPMOD64 relocations
on aarch64 with standard toolchains.

Captured via gdb attach + breakpoint on rb_vm_exec to ensure a clean
snapshot deep in the Ruby VM stack.
- Convert relocation type checks from chained || expressions to switch
  statements in both ruby.go Loader() and pfelf VisitTLSRelocations()
  for improved readability.
- Include module_id in DTV read failure DEBUG_PRINT for easier debugging.
Address review feedback from fabled: move arch-specific relocation type
checks into pfelf as abstract RelocType constants (RelTLSDESC, RelDTPMOD64)
with a bitmask-based filter, replacing the function argument in
VisitRelocations. This centralizes the arch-to-reloc mapping in
classifyReloc() and simplifies callers.
Ruby 3.4.7 built from source with ruby-install on Alpine (musl libc),
linked with --enable-shared. This exercises the DTV-based TLS access
path for ruby_current_ec on musl, where TLSDESC relocations are not
used and we fall back to DTPMOD64.
Move the architecture dispatch (switch f.Machine) from classifyReloc,
which was called on every relocation entry, into VisitRelocations where
it runs once to select an arch-specific classifier function. This avoids
redundant machine-type checks in the inner loop.

Rename checkFunc to filterFunc for clarity.

Addresses review feedback from @fabled.
DTV access is always indirect: TP+offset yields a pointer to the DTV
array, which must be dereferenced before indexing by module ID. This is
true for both glibc and musl (see open-telemetry#1295)

Thus, remove the conditional indirect check from dtv_read() in tsd.h and
always perform the pointer dereference. Also remove the now-invalid
reference to DTVInfo.Indirect in the Ruby interpreter debug logging.

Thanks @nsavoire for uncovering this.
Regenerate both glibc and musl Ruby 3.4.7 DTV coredump tests with
ruby_single_main_ractor stripped from libruby.so via objcopy, and
coredump_filter set to 0x3f for full memory dumps.

Both fail on main (which lacks the DTV path) and pass on this branch.
This is a more robust enforcement of disabling the fallback to
single_main_ractor in a way that cleanly adds the coredumps as regression test.
@dalehamel dalehamel force-pushed the ruby-dtv-ec-lookup-upstream branch from 17f9eb3 to 80d11ce Compare April 7, 2026 17:55
@dalehamel
Copy link
Copy Markdown
Contributor Author

@dalehamel Can you resolve conflicts? I'll merge after.

Cheers, done. blobs rebuilt on top commit after rebase.

@christos68k christos68k merged commit 24fddb5 into open-telemetry:main Apr 7, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants