Specifying class names for files when using generalize-tsvs

We're trying to create a workflow using Schema Automator that requires as little intervention as possible after generating a schema from multiple TSVs. An issue we've run into is being able to specify the class names derived from different files.

When running `schemauto generalize-tsvs filea.tsv fileb.tsv filec.tsv`, the names of the resulting classes are derived from the individual filenames. That derivation happens here:

https://github.com/linkml/schema-automator/blob/99aff03af4ef5f3356297c3a85cf786a990d1818/schema_automator/generalizers/csv_data_generalizer.py#L249-L250

I decided to write a small function as a replacement to `CSVDataGeneralizer.convert_multiple` that allows me to set class names explicitly. (And also set some metadata like `id`, `name` and `description` which are [not configurable themselves](https://github.com/linkml/schema-automator/blob/99aff03af4ef5f3356297c3a85cf786a990d1818/schema_automator/generalizers/csv_data_generalizer.py#L522-L526)).

This works fine, except that I'm not able to infer foreign keys using this method-- the reason being that `CSVDataGeneralizer.infer_linkages` uses the same method for deriving class names:

https://github.com/linkml/schema-automator/blob/99aff03af4ef5f3356297c3a85cf786a990d1818/schema_automator/generalizers/csv_data_generalizer.py#L130-L131

Should it be possible to explicitly specify class names here? I'm not sure what a CLI flag would look like that allows this. Maybe something like `schemauto generalize-tsvs --class ClassA=filea.tsv --class ClassB=fileb.tsv --class ClassC=filec.tsv`.

For the time being, I can run the builtin `convert_multiple` function and then replace any values in the resulting schema in code. (EDIT: That was a poor idea. The better workaround is just to create a soft link of the file where the file name is the desired class name).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specifying class names for files when using generalize-tsvs #158

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	for file in files:
	c = os.path.splitext(os.path.basename(file))[0]

Specifying class names for files when using generalize-tsvs #158

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions