Skip to content

Specifying class names for files when using generalize-tsvs #158

@ptgolden

Description

@ptgolden

We're trying to create a workflow using Schema Automator that requires as little intervention as possible after generating a schema from multiple TSVs. An issue we've run into is being able to specify the class names derived from different files.

When running schemauto generalize-tsvs filea.tsv fileb.tsv filec.tsv, the names of the resulting classes are derived from the individual filenames. That derivation happens here:

for file in files:
c = os.path.splitext(os.path.basename(file))[0]

I decided to write a small function as a replacement to CSVDataGeneralizer.convert_multiple that allows me to set class names explicitly. (And also set some metadata like id, name and description which are not configurable themselves).

This works fine, except that I'm not able to infer foreign keys using this method-- the reason being that CSVDataGeneralizer.infer_linkages uses the same method for deriving class names:

for file in files:
c = os.path.splitext(os.path.basename(file))[0]

Should it be possible to explicitly specify class names here? I'm not sure what a CLI flag would look like that allows this. Maybe something like schemauto generalize-tsvs --class ClassA=filea.tsv --class ClassB=fileb.tsv --class ClassC=filec.tsv.

For the time being, I can run the builtin convert_multiple function and then replace any values in the resulting schema in code. (EDIT: That was a poor idea. The better workaround is just to create a soft link of the file where the file name is the desired class name).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions