feature: strategy for reading TLS when not using TLS descriptors

#710 has a lengthy discussion about this in this comment thread https://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/710#discussion_r2445465297, and this comment has lots of great resources https://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/710#discussion_r2453276745, including:

- https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt
- https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
- https://www.akkadia.org/drepper/tls.pdf

But fundamentally:

- On aarch64, we so far seem to be able to reliably use TLS descriptors. The compiler will relocate the TLS references, and there is no need to directly access the DTV.
- On x86_64, it uses legacy TLS access mode by default, unless users specify -mtls-dialect=gnu2 to the compiler. This mode requires us to access the TLS through the DTV, as the compiler won't do the relocations for us. For this we need to add additonal code that will detect:
  - That the executable is using the legacy TLS mode (easy enough, we won't see the TLS descriptors, eg if we do `readelf -r PATH_TO_SO | grep TLS`
  - How to get the DTV offset from tpbase/fsbase (i think we should just be able to disassemble __tls_get_addr  from libc or ld, and the offset should be readily apparent (see below)
  - What the Module ID is. We should be able to get it from R_X86_64_DTPMOD64. Basically we read this relocation symbol for an ELF address, add that to the base address of the process, and we can read the module ID from that once the linker has loaded (no need to parse the link map)
  - This will need to be verified to work with both glibc and musl



# Getting the DTV offset from fsbase

The disassembly of __tls_get_addr shouldn't be too bad, here it is for glibc (x86_64):

```
Dump of assembler code for function __tls_get_addr:
   0x00007ffff7fdc820 <+0>:     endbr64
   0x00007ffff7fdc824 <+4>:     mov    %fs:0x8,%rdx
   0x00007ffff7fdc82d <+13>:    mov    0x21874(%rip),%rax        # 0x7ffff7ffe0a8 <_rtld_global+4264>
   0x00007ffff7fdc834 <+20>:    cmp    %rax,(%rdx)
   0x00007ffff7fdc837 <+23>:    jne    0x7ffff7fdc84f <__tls_get_addr+47>
   0x00007ffff7fdc839 <+25>:    mov    (%rdi),%rax
   0x00007ffff7fdc83c <+28>:    shl    $0x4,%rax
   0x00007ffff7fdc840 <+32>:    mov    (%rdx,%rax,1),%rax
   0x00007ffff7fdc844 <+36>:    cmp    $0xffffffffffffffff,%rax
   0x00007ffff7fdc848 <+40>:    je     0x7ffff7fdc84f <__tls_get_addr+47>
   0x00007ffff7fdc84a <+42>:    add    0x8(%rdi),%rax
   0x00007ffff7fdc84e <+46>:    ret
   0x00007ffff7fdc84f <+47>:    push   %rbp
   0x00007ffff7fdc850 <+48>:    mov    %rsp,%rbp
   0x00007ffff7fdc853 <+51>:    and    $0xfffffffffffffff0,%rsp
   0x00007ffff7fdc857 <+55>:    call   0x7ffff7fd9e30 <__tls_get_addr_slow>
   0x00007ffff7fdc85c <+60>:    mov    %rbp,%rsp
   0x00007ffff7fdc85f <+63>:    pop    %rbp
   0x00007ffff7fdc860 <+64>:    ret
```

The first line has what we need `mov    %fs:0x8,%rdx`, the DTV is at $fs_base + 8 (which matches what I've determined through source code analysis, again in this comment https://github.com/open-telemetry/opentelemetry-ebpf-profiler/pull/710#discussion_r2445465297)

And here it is for musl (x86_64):

```
(gdb) disassemble __tls_get_addr 
Dump of assembler code for function __tls_get_addr:
   0x0000000000065370 <+0>:     mov    %fs:0x0,%rax
   0x0000000000065379 <+9>:     mov    (%rdi),%rcx
   0x000000000006537c <+12>:    mov    0x8(%rax),%rdx
   0x0000000000065380 <+16>:    mov    0x8(%rdi),%rax
   0x0000000000065384 <+20>:    add    (%rdx,%rcx,8),%rax
   0x0000000000065388 <+24>:    ret
```

It again is accessing it at $fs_base + 8 (as we can interpret from the source code, it should be at an offset of 8), but we need to do a bit more work to get this value from the disassembly.

# Static TLS

As it happens, ruby can be significantly faster if we use a static build (with no libruby.so, all code directly in bin/ruby), and one of the reason is that you can read the value directly relative from $fsbase. Ruby checks the execution context a LOT, and so this ends up being a significant speedup. So, I'll use ruby as an example

In this model, we just need to look up a constant when we disassemble:

```
Dump of assembler code for function rb_current_execution_context:
   0x000055555558d244 <+0>:     push   %rbp
   0x000055555558d245 <+1>:     mov    %rsp,%rbp
   0x000055555558d248 <+4>:     mov    %edi,%eax
   0x000055555558d24a <+6>:     mov    %al,-0x14(%rbp)
   0x000055555558d24d <+9>:     mov    $0xfffffffffffffff0,%rax
   0x000055555558d254 <+16>:    mov    %fs:(%rax),%rax
   0x000055555558d258 <+20>:    mov    %rax,-0x8(%rbp)
   0x000055555558d25c <+24>:    mov    -0x8(%rbp),%rax
   0x000055555558d260 <+28>:    pop    %rbp
   0x000055555558d261 <+29>:    ret
```

We also can't get away with reading the TLSDesc from the relocation table, because there isn't one. But, we should be able to calculate that constant ($0xfffffffffffffff0 is 16 in 2s compliment, which is the negative offset from $fsbase where the value is located, which we can compute using the TLS symbol value for `ruby_current_ec`, and we don't need to access the DTV at all.

cc @fabled 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: strategy for reading TLS when not using TLS descriptors #883

Getting the DTV offset from fsbase

Static TLS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feature: strategy for reading TLS when not using TLS descriptors #883

Description

Getting the DTV offset from fsbase

Static TLS

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions