This repository contains a collection of Jupyter notebooks and resources for learning and experimenting with Natural Language Processing (NLP) and Large Language Models (LLMs). The project is organized into different subdirectories, each focusing on specific topics or tutorials.
.gitignore
README.md
notebooks/
huggingface_learning/
datasets_tutorial.ipynb
nlp_learning/
tokenization.ipynb
SQLDB_chain/
experiment_001.ipynb
-
Hugging Face Learning
datasets_tutorial.ipynb: Demonstrates how to use the Hugging Facedatasetslibrary to load and process datasets like WMT14 for machine translation tasks.
-
NLP Learning
tokenization.ipynb: Explores tokenization techniques usingspaCyfor English and French text, vocabulary building, and dataset preprocessing for translation tasks.
-
SQLDB Chain
experiment_001.ipynb: Experiments with thelangchain_communitylibrary for SQL database tools, including error handling and debugging.
- Python 3.11 or higher
- Jupyter Notebook
- Required Python libraries:
spacydatasetssqlalchemytorchlangchain_community
-
Clone the repository:
git clone <repository-url> cd NLP_LLM_Coding
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Download
spaCylanguage models:python -m spacy download en_core_web_sm python -m spacy download fr_core_news_sm
- Open the Jupyter notebooks in the
notebooks/directory to explore the tutorials and experiments:jupyter notebook
This project is licensed under the MIT License. See the LICENSE file for details.
- Hugging Face for the
datasetslibrary. - spaCy for NLP tools.
- LangChain for SQL database tools.