-
Notifications
You must be signed in to change notification settings - Fork 20
Description
We're trying to create a workflow using Schema Automator that requires as little intervention as possible after generating a schema from multiple TSVs. An issue we've run into is being able to specify the class names derived from different files.
When running schemauto generalize-tsvs filea.tsv fileb.tsv filec.tsv
, the names of the resulting classes are derived from the individual filenames. That derivation happens here:
schema-automator/schema_automator/generalizers/csv_data_generalizer.py
Lines 249 to 250 in 99aff03
for file in files: | |
c = os.path.splitext(os.path.basename(file))[0] |
I decided to write a small function as a replacement to CSVDataGeneralizer.convert_multiple
that allows me to set class names explicitly. (And also set some metadata like id
, name
and description
which are not configurable themselves).
This works fine, except that I'm not able to infer foreign keys using this method-- the reason being that CSVDataGeneralizer.infer_linkages
uses the same method for deriving class names:
schema-automator/schema_automator/generalizers/csv_data_generalizer.py
Lines 130 to 131 in 99aff03
for file in files: | |
c = os.path.splitext(os.path.basename(file))[0] |
Should it be possible to explicitly specify class names here? I'm not sure what a CLI flag would look like that allows this. Maybe something like schemauto generalize-tsvs --class ClassA=filea.tsv --class ClassB=fileb.tsv --class ClassC=filec.tsv
.
For the time being, I can run the builtin convert_multiple
function and then replace any values in the resulting schema in code. (EDIT: That was a poor idea. The better workaround is just to create a soft link of the file where the file name is the desired class name).