Skip to content

Commit e0cc6f7

Browse files
new: remove confusing arg 'import_mode' and set it automatically depending on if imported or launched from cli
Signed-off-by: thiswillbeyourgithub <[email protected]>
1 parent 9995843 commit e0cc6f7

File tree

7 files changed

+15
-22
lines changed

7 files changed

+15
-22
lines changed

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ wdoc --path=$link --task=summarize --filetype="online_pdf"
108108
* **Markdown formatted answers and summaries**: using [rich](https://github.com/Textualize/rich).
109109
* **Sane embeddings**: By default use sophisticated embeddings like [multi query retrievers](https://python.langchain.com/docs/how_to/MultiQueryRetriever) but also include SVM, KNN, parent retriever etc. Customizable.
110110
* **Fully documented** Lots of docstrings, lots of in code comments, detailed `--help` etc. Take a look at the [examples.md](https://github.com/thiswillbeyourgithub/wdoc/blob/main/wdoc/docs/examples.md) for a list of shell and python examples. The full help can be found in the file [help.md](https://github.com/thiswillbeyourgithub/wdoc/docs/help.md) or via `python -m wdoc --help`. I work hard to maintain an exhaustive documentation. The complete documentation in a single page is available [on the website](https://wdoc.readthedocs.io/en/latest/all_docs.html).
111-
* **Scriptable / Extensible**: You can use `wdoc` in other python project using `--import_mode`. Take a look at the scripts [below](#scripts-made-with-wdoc). There is even [an open-webui Tool](https://openwebui.com/t/qqqqqqqqqqqqqqqqqqqq/wdoctool).
111+
* **Scriptable / Extensible**: You can use `wdoc` as an executable or as a library. Take a look at the scripts [below](#scripts-made-with-wdoc). There is even [an open-webui Tool](https://openwebui.com/t/qqqqqqqqqqqqqqqqqqqq/wdoctool).
112112
* **Statically typed**: Runtime type checking. Opt out with an environment flag: `WDOC_TYPECHECKING="disabled / warn / crash" wdoc` (by default: `warn`). Thanks to [beartype](https://beartype.readthedocs.io/en/latest/) it shouldn't even slow down the code!
113113
* **LLM (and embeddings) caching**: speed things up, as well as index storing and loading (handy for large collections).
114114
* **Good PDF parsing** PDF parsers are notoriously unreliable, so 15 (!) different loaders are used, and the best according to a parsing scorer is kept. Including table support via [openparse](https://github.com/Filimoa/open-parse/) (no GPU needed by default) or via [UnstructuredPDFLoader](https://python.langchain.com/docs/integrations/document_loaders/unstructured_pdfloader/).
@@ -136,7 +136,6 @@ Click to read more
136136
- add test for each loader
137137
- the logit bias is wrong for openai models: the token is specific to a given family of model
138138
- rewrite the python API to make it more useable. (also related to https://github.com/thiswillbeyourgithub/wdoc/issues/13)
139-
- be careful to how to use import_mode
140139
- pay attention to how to modify the init and main.py files
141140
- pay attention to how the --help flag works
142141
- pay attention to how the USAGE document is structured

scripts/AnkiFiltered/AnkiFilteredDeckCreator.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,6 @@ def __init__(
7676
instance = wdoc(
7777
query_eval_modelname=query_eval_modelname,
7878
task=task,
79-
import_mode=True,
8079
query=query,
8180
**kwargs,
8281
)

scripts/TheFiche/TheFiche.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,6 @@ def run_wdoc(query: str, kwargs2: dict) -> Tuple[wdoc, dict]:
7373
"call to wdoc, optionaly cached"
7474
instance = wdoc(
7575
task="query",
76-
import_mode=True,
7776
query=query,
7877
**kwargs2,
7978
)

tests/test_wdoc.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -321,7 +321,6 @@ def test_summary_tim_urban():
321321
filetype="auto",
322322
debug=False,
323323
verbose=False,
324-
import_mode=True,
325324
)
326325
out = inst.summary_task()
327326
assert "tim urban" in out["summary"].lower()
@@ -382,7 +381,6 @@ def test_query_tim_urban():
382381
filetype="auto",
383382
debug=False,
384383
verbose=False,
385-
import_mode=True,
386384
)
387385
out = inst.query_task(
388386
query="What is the allegory used by the speaker",
@@ -407,7 +405,6 @@ def test_whisper_tim_urban():
407405
whisper_lang="en",
408406
debug=False,
409407
verbose=False,
410-
import_mode=True,
411408
)
412409

413410

wdoc/__main__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@
1818
from .wdoc import is_verbose, wdoc, whi, deb
1919
from .utils.misc import piped_input
2020

21+
# if __main__ is called, then we are using the cli instead of importing the class from python
22+
wdoc.__import_mode__ = False
23+
2124

2225
def cli_launcher() -> None:
2326
"""entry point function, modifies arguments on the fly for easier

wdoc/docs/help.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -351,11 +351,6 @@
351351
information to a remote server, you can use `---private`.
352352
Note that the values of `llms_api_bases` are whitelisted when using `private`.
353353

354-
* `--import_mode`: bool, default `False`
355-
* if True, will return the answer from query instead of printing it.
356-
The idea is to use if when you import wdoc instead of running
357-
it from the cli.
358-
359354
* `--disable_md_printing`: bool, default `True` if in a pipe and `False` otherwise.
360355
* if True, instead of using rich to display some information, default to simpler colored prints.
361356
* Naturally this is disablef if we are in a pipe, for example if you want to

wdoc/wdoc.py

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ class wdoc:
122122
VERSION: str = "2.8.0"
123123
allowed_extra_args = extra_args_types
124124
md_printer = md_printer
125+
__import_mode__: bool = True
125126

126127
@optional_typecheck
127128
@set_func_signature
@@ -154,7 +155,6 @@ def __init__(
154155
file_loader_n_jobs: int = -1,
155156
private: Union[bool, int] = False,
156157
llms_api_bases: Optional[Union[dict, str]] = None,
157-
import_mode: Union[bool, int] = False,
158158
disable_md_printing: bool = is_piped,
159159
out_file: Optional[Union[str, Path]] = None,
160160
oneoff: bool = False,
@@ -445,7 +445,6 @@ def print_exception(exc_type, exc_value, exc_traceback):
445445
self.file_loader_parallel_backend = file_loader_parallel_backend
446446
self.file_loader_n_jobs = file_loader_n_jobs
447447
self.llms_api_bases = llms_api_bases
448-
self.import_mode = import_mode
449448
self.oneoff = oneoff
450449

451450
if disable_llm_cache:
@@ -593,7 +592,7 @@ def print_exception(exc_type, exc_value, exc_traceback):
593592
if self.task in ["query", "search", "summary_then_query"]:
594593
self.prepare_query_task()
595594

596-
if self.import_mode:
595+
if self.__import_mode__:
597596
deb(
598597
"Ready to query or summarize, call your_instance.query_task(your_question)"
599598
)
@@ -785,7 +784,7 @@ def summarize_documents(
785784
if self.summary_n_recursion > 0:
786785
for n_recur in range(1, self.summary_n_recursion + 1):
787786
summary_text = copy.deepcopy(recursive_summaries[n_recur - 1])
788-
if not self.import_mode:
787+
if not self.__import_mode__:
789788
red(f"Doing summary check #{n_recur} of {item_name}")
790789

791790
# remove any chunk count that is not needed to summarize
@@ -866,7 +865,7 @@ def summarize_documents(
866865
)
867866
if prev_real_text is not MISSING:
868867
if real_text == prev_real_text:
869-
if not self.import_mode:
868+
if not self.__import_mode__:
870869
red(
871870
f"Identical summary after {n_recur} "
872871
"recursion, adding more recursion will not "
@@ -878,7 +877,7 @@ def summarize_documents(
878877

879878
assert n_recur not in recursive_summaries
880879
if summary_text not in recursive_summaries:
881-
if not self.import_mode:
880+
if not self.__import_mode__:
882881
red(
883882
f"Identical summary after {n_recur} "
884883
"recursion, adding more recursion will not "
@@ -891,7 +890,7 @@ def summarize_documents(
891890

892891
best_sum_i = max(list(recursive_summaries.keys()))
893892
doc_total_tokens = doc_total_tokens_in + doc_total_tokens_out
894-
if not self.import_mode:
893+
if not self.__import_mode__:
895894
print("\n\n")
896895
md_printer("# Summary")
897896
md_printer(f"## {path}")
@@ -915,7 +914,9 @@ def summarize_documents(
915914

916915
# save to output file
917916
if self.out_file:
918-
assert not self.import_mode, "Can't use import_mode with --out_file"
917+
assert (
918+
not self.__import_mode__
919+
), "Can't use __import_mode__ with --out_file"
919920
for nrecur, sum in recursive_summaries.items():
920921
out_file = Path(self.out_file)
921922
if len(recursive_summaries) > 1 and nrecur < max(
@@ -958,7 +959,7 @@ def summarize_documents(
958959
relevant_docs=self.loaded_docs,
959960
)
960961

961-
if not self.import_mode:
962+
if not self.__import_mode__:
962963
red(
963964
self.ntfy(
964965
f"Total cost of those summaries: {results['doc_total_tokens']} tokens for ${results['doc_total_cost']:.5f} (estimate was ${estimate_dol:.5f})"
@@ -1683,7 +1684,7 @@ def retrieve_documents(inputs):
16831684
if len(docs) < self.interaction_settings["top_k"]:
16841685
red(f"Only found {len(docs)} relevant documents")
16851686

1686-
if self.import_mode:
1687+
if self.__import_mode__:
16871688
if "unfiltered_docs" in output:
16881689
red(
16891690
f"Number of documents using embeddings: {len(output['unfiltered_docs'])}"

0 commit comments

Comments
 (0)