Update file I/O for Harmonica

**TASKS**
- [x] add script to pivot conditions as column headers to row values
- [ ] try to include conditions into SchemaAutomator to include in model

Some "conditions" may be included within a subjects file like this, e.g. https://github.com/linkml/dm-bip/blob/main/toy_data/raw_data/subject.tsv. Where for each line of participant/subject information, there are column headers that represent the condition names and then for each participant/subject values are yes or no to indicate if they have the condition and there is only one line of data per participant/subject.

However, some information about conditions is more complicated, e.g. https://github.com/linkml/dm-bip/blob/main/toy_data/raw_data_conditions/conditions_complex-questions.tsv where information about conditions can be from survey questions and each participant/subject is asked the full set of questions and answers may or may not overlap across participants/subjects and the file has multiple lines per participant/subject representing each survey question/answer.  

Currently, for the INCLUDE project the Data Intake team reformats the "conditions" data. Using the non-survey question conditions as the first example, the information is reformatted so that there is one unique conditions value in the file and that is annotated with Harmonica and then the resulting file of annotations include additional columns for the ontology CURIE and label for each ontology the data is annotated with. However, the steps pre- and post- annotation to create this file to annotate are not known. 

For the data ingest pipeline, initially it was discussed that this file reformatting to create the conditions file would happen as manual, customized steps. More recently, there have been discussions to more fully automate this into the pipeline and change the file I/O for Harmonica. One option is to have Harmonica annotate the conditions from the participant/subject file and add the annotations back into the participant/subject file. Another option is to include the annotation as a step within LinkMLMap.


For the first option, to annotate the participant/subject file and add back annotations, @amc-corey-cox do you want to have Harmonica take a config file for example to know which columns are conditions to annotate or do you want that as a separate pre-processing script? There are a couple of ways to extract and then combine the annotated conditions information to the subjects so I would like to align the plans with you.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update file I/O for Harmonica #91

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update file I/O for Harmonica #91

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions