diff --git a/nbs/2. Topic Modeling with NMF and SVD.ipynb b/nbs/2. Topic Modeling with NMF and SVD.ipynb index 9b09ca1..7182fd9 100644 --- a/nbs/2. Topic Modeling with NMF and SVD.ipynb +++ b/nbs/2. Topic Modeling with NMF and SVD.ipynb @@ -100,13 +100,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "- [Data source](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html): Newsgroups are discussion groups on Usenet, which was popular in the 80s and 90s before the web really took off. This dataset includes 18,000 newsgroups posts with 20 topics.\n", + "- [Data source](https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html#loading-the-20-newsgroups-dataset): Newsgroups are discussion groups on Usenet, which was popular in the 80s and 90s before the web really took off. This dataset includes 18,000 newsgroups posts with 20 topics.\n", "- [Chris Manning's book chapter](https://nlp.stanford.edu/IR-book/pdf/18lsi.pdf) on matrix factorization and LSI \n", "- Scikit learn [truncated SVD LSI details](http://scikit-learn.org/stable/modules/decomposition.html#lsa)\n", "\n", "### Other Tutorials\n", - "- [Scikit-Learn: Out-of-core classification of text documents](http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html): uses [Reuters-21578](https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection) dataset (Reuters articles labeled with ~100 categories), HashingVectorizer\n", - "- [Text Analysis with Topic Models for the Humanities and Social Sciences](https://de.dariah.eu/tatom/index.html): uses [British and French Literature dataset](https://de.dariah.eu/tatom/datasets.html) of Jane Austen, Charlotte Bronte, Victor Hugo, and more" + "- [Scikit-Learn: Out-of-core classification of text documents](http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html): uses [Reuters-21578](https://archive.ics.uci.edu/dataset/137/reuters+21578+text+categorization+collection) dataset (Reuters articles labeled with ~100 categories), HashingVectorizer\n", + "- [Text Analysis with Topic Models for the Humanities and Social Sciences](https://github.com/DARIAH-DE/tatom): uses [British and French Literature dataset](https://github.com/DARIAH-DE/tatom/tree/develop/data) of Jane Austen, Charlotte Bronte, Victor Hugo, and more" ] }, { @@ -418,7 +418,7 @@ "The SVD algorithm factorizes a matrix into one matrix with **orthogonal columns** and one with **orthogonal rows** (along with a diagonal matrix, which contains the **relative importance** of each factor).\n", "\n", "\"\"\n", - "(source: [Facebook Research: Fast Randomized SVD](https://research.fb.com/fast-randomized-svd/))\n", + "(source: [Facebook Research: Fast Randomized SVD](https://research.facebook.com/blog/2014/9/fast-randomized-svd/))\n", "\n", "SVD is an **exact decomposition**, since the matrices it creates are big enough to fully cover the original matrix. SVD is extremely widely used in linear algebra, and specifically in data science, including:\n", "\n",