Skip to content

Support more than 32 extensions #2848

@moste00

Description

@moste00

Feature

  • New architecture module
  • Support for processor extension
  • Add more instruction details (elaborated below)
  • Binding support for: language
  • Other (elaborated below)

Make cs_mode capable of storing an arbitary number of modes and retreiving them.

Describe the feature you'd like

The cs_mode type is used to encode, among other things, extensions and features of the target architecture.

Image Image

This has the advantage of being compact, quick to set/reset and quick to test for the existence of a feature. Essentially a SmallSet implementation for integers. The problem is that this can only ever hold 64 features. Currently the type is 32-bits because the biggest value is 2^31 as seen above.

Indeed:

$> cat probe_size.c 

#include <stdio.h>
#include <capstone/capstone.h>

int main() {
    printf("Size of cs_mode: %zu bytes\n", sizeof(cs_mode));
    return 0;
}

$> ./a.out

Size of cs_mode: 4 bytes

But even if the field became 64 bits it still wouldn't contain all the possible extensions in, e.g., RISC-V which has 26 extensions involved in the vector sub-architecture alone [1] (overlapping and subsettng each other).

Additional context

There are multiple levels of needed features, each level encompassing the one before it.

1- At minimum, the new type should hold at least 256 extensions. This means that changing the enum to a unsigned __int256 intrinsic type (supported starting from clang 16.0) should satisfy the issue. This has the advantage of working with all the code at the moment with minimal refactoring, as ordinary literals of 32-bit width and 64-bit width work normally with the wider types and widen as usual.

The problem with this solution is (A) Requiring much newer compilers than we do at the moment, as wider types are very recent (B) Guranteed to work only on Clang and GCC (C) This is just one strict upper limit like the one we're trying to avoid, only higher.

2- Using bitfield structs or arrays to encode arbitrary (compile-time determined) number of extensions, this can store an arbitrary amount of extensions, but all current usage code of cs_mode would need refactoring to treat instances of it as structs or arrays instead of integers. (e.g. no | or & operators)

There is still the problem, however, of the mode object not capable of relating extensions to each other, each extension is an isolated member in a flat set.

3- The most complex implementation would also provide the ability to declare the "dependencies" of an extension and its "conflicts". An extension can only be enabled if and only if all of its dependencies have been enabled before it (or can be enabled right now for their dependent) and all of its conflicts are NOT enabled.

This ability is inspired from RISC-V extension system where precisely this need arises for a correct handling. However, I suspect it's niche and never used outside of RISC-V, so it might make more sense to implement it on top of the cs_mode for RISC-V only.

[1] https://fprox.substack.com/p/taxonomy-of-risc-v-vector-extensions

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions