Add embeddings and search file support #35

hallacy · 2021-09-30T04:34:28Z

Adds support for the embeddings endpoint
Add search.prepare_data to the CLI to validate search files for upload

* Add validators for search files * Clean up fields

rachellim · 2021-10-01T00:43:19Z

openai/cli.py

+        "-f",
+        "--file",
+        required=True,
+        help="JSONL, JSON, CSV, TSV, TXT or XLSX file containing prompt-completion examples to be analyzed."


Copy pasta of prompt-completion from fine tuning?

rachellim · 2021-10-01T00:45:22Z

openai/cli.py

+    sub.add_argument(
+        "-p",
+        "--purpose",
+        help="Why are you uploading this file? (see https://beta.openai.com/docs/api-reference/ for purposes)",


It seems unintuitive to me that the Search purpose (which is what this is ostensibly under) is further split into different purposes. Maybe each of them should be a different tool instead? (Or do they all fall under the search umbrella, and how? We can chat outside this PR if i'm missing some context)

hallacy · 2021-10-01T16:21:01Z

I think we're gonna split the validation code from the embeddings code. Closing this request down

* Add CLI option to download files (#34) * Option to check if file has been uploaded in the past before uploading (#33) The check is done based on filename, file purpose and file size * Add fine-tuning hparams directly into the fine-tunes CLI (#35) * update fine_tunes cli use_packing argument (#38) * A file verification and remediation tool. It applies the following validations: - prints the number of examples, and warns if it's lower than 100 - ensures prompt and completion columns are present - optionally removes any additional columns - ensures all completions are non-empty - infers which type of fine-tuning the data is most likely in (classification, conditional generation and open-ended generation) - optionally removes duplicate rows - infers the existence of a common suffix, and if there is none, suggests one for classification and conditional generation - optionally prepends a space to each completion, to make tokenization better - optionally splits into training and validation set for the classification use case - optionally ensures there's an ending string for all completions - optionally lowercases completions or prompts if more than a 1/3 of alphanumeric characters are upper case It interactively asks the user to accept or reject recommendations. If the user is happy, then it saves the modified output file as a jsonl, which is ready for being used in fine-tuning with the printed command. * Completion: remove from kwargs before passing to EngineAPI (#37) * Version bump before pushing to external Co-authored-by: Todor Markov <[email protected]> Co-authored-by: Boris Power <[email protected]> Co-authored-by: Dave Cummings <[email protected]>

* Add CLI option to download files (openai#34) * Option to check if file has been uploaded in the past before uploading (openai#33) The check is done based on filename, file purpose and file size * Add fine-tuning hparams directly into the fine-tunes CLI (openai#35) * update fine_tunes cli use_packing argument (openai#38) * A file verification and remediation tool. It applies the following validations: - prints the number of examples, and warns if it's lower than 100 - ensures prompt and completion columns are present - optionally removes any additional columns - ensures all completions are non-empty - infers which type of fine-tuning the data is most likely in (classification, conditional generation and open-ended generation) - optionally removes duplicate rows - infers the existence of a common suffix, and if there is none, suggests one for classification and conditional generation - optionally prepends a space to each completion, to make tokenization better - optionally splits into training and validation set for the classification use case - optionally ensures there's an ending string for all completions - optionally lowercases completions or prompts if more than a 1/3 of alphanumeric characters are upper case It interactively asks the user to accept or reject recommendations. If the user is happy, then it saves the modified output file as a jsonl, which is ready for being used in fine-tuning with the printed command. * Completion: remove from kwargs before passing to EngineAPI (openai#37) * Version bump before pushing to external Co-authored-by: Todor Markov <[email protected]> Co-authored-by: Boris Power <[email protected]> Co-authored-by: Dave Cummings <[email protected]>

docs: Fix typos in documentation files

hallacy added 2 commits September 29, 2021 21:22

Validate search files (#69)

c902a55

* Add validators for search files * Clean up fields

Add embeddings endpoint (#85)

c2d5bf8

hallacy requested review from rachellim, BorisPower and madeleineth September 30, 2021 04:34

rachellim reviewed Oct 1, 2021

View reviewed changes

hallacy closed this Oct 1, 2021

safa0 pushed a commit to safa0/openai-agents-python that referenced this pull request Apr 27, 2025

Merge pull request openai#35 from AliYmn/28

b6c9572

docs: Fix typos in documentation files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add embeddings and search file support #35

Add embeddings and search file support #35

Uh oh!

hallacy commented Sep 30, 2021

Uh oh!

rachellim Oct 1, 2021

Uh oh!

rachellim Oct 1, 2021

Uh oh!

hallacy commented Oct 1, 2021

Uh oh!

Uh oh!

Add embeddings and search file support #35

Add embeddings and search file support #35

Uh oh!

Conversation

hallacy commented Sep 30, 2021

Uh oh!

rachellim Oct 1, 2021

Choose a reason for hiding this comment

Uh oh!

rachellim Oct 1, 2021

Choose a reason for hiding this comment

Uh oh!

hallacy commented Oct 1, 2021

Uh oh!

Uh oh!