Description
Discuss strategies, methodologies, and approaches that we implemented and/or considered.
Need to assess precision and accuracy
Methods: Describe the strategies you used to detect the different types, and the benefits and limitations of these strategies. Provide a link to the github repo where your code for this task resides.
Evaluation: Report the precision and recall for the different strategies. To compute precision and recall, you need the true type of each column: the team members will collaboratively and manually label the columns with their true type. Note that a given column may have values of different types, therefore, its true type may consist of multiple labels. Include only labels for types that occur frequently, and omit any outliers. Include visualizations that summarize your findings, e.g., a histogram showing for each type, the number of columns in which the type appears; a visualization that shows the prevalence of heterogeneous columns, i.e., columns that have values belonging to multiple types.