From 6c5975d6759c9ff71b18f538bbdf377ab1f6e571 Mon Sep 17 00:00:00 2001 From: Andrew Sears Date: Sun, 7 Apr 2024 22:57:01 -0400 Subject: [PATCH 1/4] fix scikit link --- nbs/2. Topic Modeling with NMF and SVD.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nbs/2. Topic Modeling with NMF and SVD.ipynb b/nbs/2. Topic Modeling with NMF and SVD.ipynb index 9b09ca1..85bd50b 100644 --- a/nbs/2. Topic Modeling with NMF and SVD.ipynb +++ b/nbs/2. Topic Modeling with NMF and SVD.ipynb @@ -100,7 +100,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "- [Data source](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html): Newsgroups are discussion groups on Usenet, which was popular in the 80s and 90s before the web really took off. This dataset includes 18,000 newsgroups posts with 20 topics.\n", + "- [Data source](https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html#loading-the-20-newsgroups-dataset): Newsgroups are discussion groups on Usenet, which was popular in the 80s and 90s before the web really took off. This dataset includes 18,000 newsgroups posts with 20 topics.\n", "- [Chris Manning's book chapter](https://nlp.stanford.edu/IR-book/pdf/18lsi.pdf) on matrix factorization and LSI \n", "- Scikit learn [truncated SVD LSI details](http://scikit-learn.org/stable/modules/decomposition.html#lsa)\n", "\n", From 046980aafc9c76dbb82c7d108d0dff628abdf29e Mon Sep 17 00:00:00 2001 From: Andrew Sears Date: Sun, 7 Apr 2024 23:01:41 -0400 Subject: [PATCH 2/4] Fix reuters-21578 link --- nbs/2. Topic Modeling with NMF and SVD.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nbs/2. Topic Modeling with NMF and SVD.ipynb b/nbs/2. Topic Modeling with NMF and SVD.ipynb index 85bd50b..0d7c743 100644 --- a/nbs/2. Topic Modeling with NMF and SVD.ipynb +++ b/nbs/2. Topic Modeling with NMF and SVD.ipynb @@ -105,7 +105,7 @@ "- Scikit learn [truncated SVD LSI details](http://scikit-learn.org/stable/modules/decomposition.html#lsa)\n", "\n", "### Other Tutorials\n", - "- [Scikit-Learn: Out-of-core classification of text documents](http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html): uses [Reuters-21578](https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection) dataset (Reuters articles labeled with ~100 categories), HashingVectorizer\n", + "- [Scikit-Learn: Out-of-core classification of text documents](http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html): uses [Reuters-21578](https://archive.ics.uci.edu/dataset/137/reuters+21578+text+categorization+collection) dataset (Reuters articles labeled with ~100 categories), HashingVectorizer\n", "- [Text Analysis with Topic Models for the Humanities and Social Sciences](https://de.dariah.eu/tatom/index.html): uses [British and French Literature dataset](https://de.dariah.eu/tatom/datasets.html) of Jane Austen, Charlotte Bronte, Victor Hugo, and more" ] }, From 097774493be39f9cdb822c9deb7a9f118439dd00 Mon Sep 17 00:00:00 2001 From: Andrew Sears Date: Sun, 7 Apr 2024 23:05:13 -0400 Subject: [PATCH 3/4] Fix Dariah Tatom links --- nbs/2. Topic Modeling with NMF and SVD.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nbs/2. Topic Modeling with NMF and SVD.ipynb b/nbs/2. Topic Modeling with NMF and SVD.ipynb index 0d7c743..1bb27e0 100644 --- a/nbs/2. Topic Modeling with NMF and SVD.ipynb +++ b/nbs/2. Topic Modeling with NMF and SVD.ipynb @@ -106,7 +106,7 @@ "\n", "### Other Tutorials\n", "- [Scikit-Learn: Out-of-core classification of text documents](http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html): uses [Reuters-21578](https://archive.ics.uci.edu/dataset/137/reuters+21578+text+categorization+collection) dataset (Reuters articles labeled with ~100 categories), HashingVectorizer\n", - "- [Text Analysis with Topic Models for the Humanities and Social Sciences](https://de.dariah.eu/tatom/index.html): uses [British and French Literature dataset](https://de.dariah.eu/tatom/datasets.html) of Jane Austen, Charlotte Bronte, Victor Hugo, and more" + "- [Text Analysis with Topic Models for the Humanities and Social Sciences](https://github.com/DARIAH-DE/tatom): uses [British and French Literature dataset](https://github.com/DARIAH-DE/tatom/tree/develop/data) of Jane Austen, Charlotte Bronte, Victor Hugo, and more" ] }, { From a30e70ab0e54b7e540c70620b31dbb15fed73f5b Mon Sep 17 00:00:00 2001 From: Andrew Sears Date: Sun, 7 Apr 2024 23:13:02 -0400 Subject: [PATCH 4/4] fix facebook blog link --- nbs/2. Topic Modeling with NMF and SVD.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nbs/2. Topic Modeling with NMF and SVD.ipynb b/nbs/2. Topic Modeling with NMF and SVD.ipynb index 1bb27e0..7182fd9 100644 --- a/nbs/2. Topic Modeling with NMF and SVD.ipynb +++ b/nbs/2. Topic Modeling with NMF and SVD.ipynb @@ -418,7 +418,7 @@ "The SVD algorithm factorizes a matrix into one matrix with **orthogonal columns** and one with **orthogonal rows** (along with a diagonal matrix, which contains the **relative importance** of each factor).\n", "\n", "\"\"\n", - "(source: [Facebook Research: Fast Randomized SVD](https://research.fb.com/fast-randomized-svd/))\n", + "(source: [Facebook Research: Fast Randomized SVD](https://research.facebook.com/blog/2014/9/fast-randomized-svd/))\n", "\n", "SVD is an **exact decomposition**, since the matrices it creates are big enough to fully cover the original matrix. SVD is extremely widely used in linear algebra, and specifically in data science, including:\n", "\n",