Skip to content

Commit 0c253ad

Browse files
committed
OpenAI API example to create instruction examples
1 parent 9c93300 commit 0c253ad

File tree

5 files changed

+1653
-4
lines changed

5 files changed

+1653
-4
lines changed

ch07/02_dataset-utilities/README.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ pip install -r requirements-extra.txt
1111

1212

1313

14-
15-
### Finding near duplicates
14+
 
15+
## Finding Near-duplicates
1616

1717
The `find-near-duplicates.py` function can be used to identify duplicates and near-duplicates in an instruction dataset. For example,
1818

@@ -23,6 +23,7 @@ python find-near-duplicates.py --json_file instruction-examples.json
2323
```
2424

2525
```
26+
scikit-learn version: 1.3.1
2627
2728
2829
==================================================
@@ -69,3 +70,17 @@ Duplicate pair found with similarity 1.00:
6970
7071
```
7172

73+
74+
 
75+
## Creating Passive Voice Entries
76+
77+
- The [create-passive-voice-entries.ipynb](create-passive-voice-entries.ipynb) notebook uses OpenAI's GPT-4 to create "passive voice" entries for an instruction dataset, as shown in the example below
78+
79+
```python
80+
{
81+
'instruction': 'Identify the verb in the following sentence',
82+
'input': 'The cat sleeps on the couch.',
83+
'output': 'The verb in the sentence is "sleeps."',
84+
'output_2': 'The sentence is "sleeps."' # <---- Newly created entry
85+
}
86+
```

0 commit comments

Comments
 (0)