FlashRead

A Spanish text processing and vocabulary flashcard application that analyzes Spanish literature and creates interactive learning tools.

Features

Text Processing: Analyze Spanish text files to extract word frequencies and create vocabulary lists
Interactive Flashcards: Pygame-based interface for vocabulary practice
Word Filtering: Filter words by length, frequency, and starting letters
Syllable Hyphenation: Spanish word syllabification using the silabeador library
Statistical Analysis: Visualize word frequency distributions with matplotlib and seaborn
Multiple Case Options: Display words in lowercase, UPPERCASE, or Title Case
Data Persistence: Save and load processed data for faster startup
Command-Line Interface: Easy-to-use CLI for data generation and app launching

Project Structure

flashread/
├── src/                          # Source code modules
│   ├── __init__.py              # Package initialization
│   ├── cli.py                   # Command-line interface
│   ├── text_processor.py        # Text processing and vocabulary management
│   ├── flashcard_app.py         # Pygame flashcard interface
│   └── utils.py                 # Shared utilities
├── data/                        # Generated data files
│   ├── word_frequencies.csv     # Processed word counts
│   ├── vocabulary.csv           # Filtered vocabulary
│   └── plots/                   # Generated visualizations
├── corpus/                      # Spanish text files for analysis
│   ├── Paulo Coelho - El Alquimista.txt
│   └── J.K. Rowling - Harry Potter (Books 1-7).txt
├── 04.txt - 08.txt             # Word lists for vocabulary filtering
├── flashread.py                # Main entry point
├── requirements.txt            # Python dependencies
└── README.md                   # This file

Installation

Clone the repository:

git clone <repository-url>
cd flashread

Recommended: Create and activate a virtual environment:

python -m venv flashread-env
source flashread-env/bin/activate  # On Linux/Mac
# or
flashread-env\Scripts\activate     # On Windows

Install the package in your virtual environment:

pip install -e .

This installs FlashRead and its dependencies, making the flashread command available only in your current virtual environment.

NLTK data will be downloaded automatically on first run.

Usage

FlashRead now provides a modern command-line interface with two main commands:

Generate Vocabulary Data

Process corpus files and generate vocabulary data:

# Basic generation
python flashread.py generate

# Generate with frequency plot
python flashread.py generate --plot

# Use custom parameters
python flashread.py generate --min-frequency 5 --corpus-dir my_corpus --data-dir my_data

# Force regeneration of existing data
python flashread.py generate --force

Generate Command Options:

--corpus-dir: Directory containing text files (default: corpus)
--data-dir: Directory for saving processed data (default: data)
--min-frequency: Minimum word frequency for vocabulary inclusion (default: 3)
--plot: Generate and display word frequency plot
--force: Force reprocessing even if data files exist

Launch Flashcard Interface

Run the interactive pygame flashcard application:

# Basic launch
python flashread.py run

# Use custom data directory
python flashread.py run --data-dir my_data

# Use specific vocabulary file
python flashread.py run --vocabulary custom_vocab.csv

Run Command Options:

--data-dir: Directory containing processed data (default: data)
--vocabulary: Vocabulary file to use (default: vocabulary.csv)
--corpus-dir: Corpus directory (used if vocabulary needs generation)

Interface Controls

Word Display: Random words appear at the top of the screen
Next Word: Click on the word to display a new random word
Length Sliders: Adjust minimum and maximum word length
Letter Toggles: Enable/disable words starting with specific letters
Case Selection: Choose between lowercase, UPPERCASE, or Title case
Hyphenation Toggle: Show/hide syllable breaks in Spanish words

Architecture

Class-Based Design

FlashRead uses a clean, object-oriented architecture:

TextProcessor Class (src/text_processor.py):

File processing and text analysis
Word frequency calculation
Vocabulary DataFrame creation
Statistical analysis and plotting
Data persistence (save/load processed data)

FlashCardApp Class (src/flashcard_app.py):

Pygame interface and UI management
Word display and formatting
User interaction handling
Application state management

FlashReadCLI Class (src/cli.py):

Command-line argument parsing
Orchestrates text processing and app launching
Progress reporting and error handling

Data Management

Persistent Storage: Generated files are stored in the data/ directory
Cached Processing: Word frequencies and vocabulary are saved to avoid reprocessing
Incremental Updates: Only reprocess when forced or when source files change
Organized Output: Plots and data files are systematically organized

Advanced Usage

Custom Word Lists

You can use custom word list files by placing them as 04.txt through 08.txt in the project root. These files should contain one word per line and will be filtered against the corpus word frequencies.

Data Analysis

Generate detailed frequency analysis:

from src.text_processor import TextProcessor

processor = TextProcessor()
processor.process_all_corpus_files()
processor.plot_word_frequencies(top_n=50, save_plot=True)
stats = processor.get_vocabulary_stats()
print(stats)

Custom Vocabulary Creation

Create vocabulary with different frequency thresholds:

processor = TextProcessor()
processor.load_word_frequencies()  # Load existing frequencies
vocab_df = processor.create_vocabulary_dataframe(min_frequency=10)
processor.save_vocabulary('high_frequency_vocab.csv')

Dependencies

nltk: Natural language processing
matplotlib: Data visualization
seaborn: Statistical plotting
pandas: Data manipulation
pygame: Interactive GUI
silabeador: Spanish syllabification

Development

Running Tests

# Test data generation
python flashread.py generate --corpus-dir corpus --data-dir test_data

# Test flashcard interface
python flashread.py run --data-dir test_data

Adding New Features

The modular architecture makes it easy to extend:

Text Processing: Add methods to TextProcessor class
UI Features: Extend FlashCardApp class
CLI Commands: Add new subparsers to FlashReadCLI
Utilities: Add shared functions to utils.py

Data Sources

The application works with Spanish literature texts including:

Paulo Coelho's "El Alquimista"
J.K. Rowling's Harry Potter series (Spanish translations)

Performance

Efficient Processing: Text files are processed once and cached
Smart Loading: Only loads necessary data for each operation
Memory Management: Large datasets are handled efficiently with pandas
Responsive UI: Pygame interface maintains 60 FPS performance

Contributing

Feel free to contribute by:

Adding more Spanish texts to the corpus
Improving the user interface
Adding new analysis features
Optimizing text processing algorithms
Adding tests and documentation

License

This project is for educational purposes. Please respect copyright of the included literary works.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTE.md		CONTRIBUTE.md
README.md		README.md
flashread.py		flashread.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlashRead

Features

Project Structure

Installation

Usage

Generate Vocabulary Data

Launch Flashcard Interface

Interface Controls

Architecture

Class-Based Design

Data Management

Advanced Usage

Custom Word Lists

Data Analysis

Custom Vocabulary Creation

Dependencies

Development

Running Tests

Adding New Features

Data Sources

Performance

Contributing

License

About

Uh oh!

Releases

Packages

Languages

curcio/flashread

Folders and files

Latest commit

History

Repository files navigation

FlashRead

Features

Project Structure

Installation

Usage

Generate Vocabulary Data

Launch Flashcard Interface

Interface Controls

Architecture

Class-Based Design

Data Management

Advanced Usage

Custom Word Lists

Data Analysis

Custom Vocabulary Creation

Dependencies

Development

Running Tests

Adding New Features

Data Sources

Performance

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages