Skip to content

Memory Re‐tagging and Organization System

Henry edited this page Jul 20, 2025 · 1 revision

Memory Re-tagging and Organization System

This guide presents a comprehensive three-phase approach to implementing a memory re-tagging and organization system for the MCP Memory Service.

Table of Contents

Overview

As memory databases grow, maintaining consistent and meaningful tags becomes crucial for effective retrieval. This system addresses untagged memories, inconsistent tagging, and the need for regular organization.

Problem Statement

Common challenges in memory organization:

  • Memories created without tags or with generic tags
  • Inconsistent tag naming (e.g., "bug-fix" vs "debugging" vs "bugfix")
  • No systematic way to review and improve tags
  • Difficulty finding related memories due to poor tagging
  • Tag proliferation without hierarchical organization

Three-Phase Implementation

Phase Overview

  1. Phase 1: Manual recurring review process (immediate implementation)
  2. Phase 2: Automated tag suggestion system (medium-term)
  3. Phase 3: Integration with multi-layered consolidation (long-term)

Phase 1: Manual Recurring Review

Prompt-Based Review Process

Implement a recurring prompt template for regular maintenance:

Weekly Memory Maintenance Prompt:

\"Memory Maintenance Mode: Review and tag untagged memories from last week. 
Tasks:
1. Use retrieve_memory to find memories without proper tags
2. Analyze content semantically for themes
3. Suggest appropriate tags based on:
   - Project names (mcp-memory-service, dashboard, etc.)
   - Technologies (python, react, chromadb, etc.)
   - Activity types (debugging, implementation, research)
   - Status indicators (resolved, in-progress, blocked)
   - Domain categories (frontend, backend, devops)
4. Update memories with new tags
5. Document tagging decisions for consistency\"

Implementation Steps

Step 1: Find Untagged Memories

async def find_untagged_memories(storage, time_range=\"last week\"):
    \"\"\"Find memories with missing or generic tags\"\"\"
    
    # Search broadly within time range
    all_memories = await storage.recall_natural(time_range)
    
    untagged = []
    poorly_tagged = []
    
    for memory in all_memories:
        tags = memory.metadata.get(\"tags\", [])
        
        if not tags:
            untagged.append(memory)
        elif len(tags) == 1 and tags[0] in [\"test\", \"misc\", \"other\"]:
            poorly_tagged.append(memory)
    
    return untagged, poorly_tagged

Step 2: Analyze Content for Themes

def analyze_content_themes(content):
    \"\"\"Extract potential tags from content\"\"\"
    
    themes = {
        \"technologies\": [],
        \"projects\": [],
        \"activities\": [],
        \"status\": [],
        \"domains\": []
    }
    
    # Technology detection
    tech_keywords = {
        \"python\": [\"python\", \"py\", \"pip\", \"asyncio\", \"pytest\"],
        \"javascript\": [\"javascript\", \"js\", \"node\", \"npm\", \"react\"],
        \"docker\": [\"docker\", \"container\", \"dockerfile\"],
        \"git\": [\"git\", \"commit\", \"branch\", \"merge\"],
        \"mcp\": [\"mcp\", \"protocol\", \"server.call_tool\"],
    }
    
    content_lower = content.lower()
    
    for tech, keywords in tech_keywords.items():
        if any(keyword in content_lower for keyword in keywords):
            themes[\"technologies\"].append(tech)
    
    # Activity detection
    if any(word in content_lower for word in [\"debug\", \"fix\", \"error\", \"issue\"]):
        themes[\"activities\"].append(\"debugging\")
    if any(word in content_lower for word in [\"implement\", \"create\", \"build\", \"add\"]):
        themes[\"activities\"].append(\"implementation\")
    if any(word in content_lower for word in [\"test\", \"verify\", \"check\"]):
        themes[\"activities\"].append(\"testing\")
    
    # Status detection
    if any(word in content_lower for word in [\"resolved\", \"fixed\", \"complete\"]):
        themes[\"status\"].append(\"resolved\")
    if any(word in content_lower for word in [\"working on\", \"in progress\", \"wip\"]):
        themes[\"status\"].append(\"in-progress\")
    
    return themes

Step 3: Re-tag Memories

async def retag_memory(storage, memory, new_tags):
    \"\"\"Re-tag a memory with better tags\"\"\"
    
    # Create new memory with improved tags
    updated_memory = Memory(
        content=memory.content,
        metadata={
            **memory.metadata,
            \"tags\": new_tags,
            \"retagged_date\": datetime.now().isoformat(),
            \"original_tags\": memory.metadata.get(\"tags\", [])
        }
    )
    
    # Store updated version
    await storage.store(updated_memory)
    
    # Delete old version
    await storage.delete(memory.id)
    
    return updated_memory

Phase 2: Automated Tag Suggestion

Semantic Analysis System

Use existing ChromaDB embeddings to suggest tags:

class TagSuggestionEngine:
    def __init__(self, storage):
        self.storage = storage
        self.tag_embeddings = {}
    
    async def build_tag_profiles(self):
        \"\"\"Build semantic profiles for each tag\"\"\"
        
        # Get all unique tags
        stats = await self.storage.get_stats()
        all_tags = stats.get(\"all_tags\", [])
        
        for tag in all_tags:
            # Get memories with this tag
            memories = await self.storage.search_by_tag([tag])
            
            # Combine content of tagged memories
            combined_content = \" \".join([m.content for m in memories[:10]])
            
            # Generate embedding for tag profile
            embedding = self.storage.embedding_function([combined_content])[0]
            self.tag_embeddings[tag] = embedding
    
    async def suggest_tags_for_memory(self, memory, top_k=5):
        \"\"\"Suggest tags based on semantic similarity\"\"\"
        
        # Get memory embedding
        memory_embedding = self.storage.embedding_function([memory.content])[0]
        
        # Calculate similarities with tag profiles
        similarities = []
        for tag, tag_embedding in self.tag_embeddings.items():
            similarity = cosine_similarity(memory_embedding, tag_embedding)
            similarities.append((tag, similarity))
        
        # Return top suggestions
        similarities.sort(key=lambda x: x[1], reverse=True)
        return [tag for tag, _ in similarities[:top_k]]

Batch Processing Tools

async def batch_tag_by_pattern(storage, search_pattern, new_tags):
    \"\"\"Tag all memories matching a pattern\"\"\"
    
    # Search for memories matching pattern
    results = await storage.retrieve(search_pattern, n_results=100)
    
    updated_count = 0
    for result in results:
        memory = result.memory
        current_tags = memory.metadata.get(\"tags\", [])
        
        # Merge new tags with existing
        merged_tags = list(set(current_tags + new_tags))
        
        # Update memory
        await retag_memory(storage, memory, merged_tags)
        updated_count += 1
    
    return updated_count

Tag Clustering

from sklearn.cluster import KMeans
import numpy as np

async def discover_tag_clusters(storage, n_clusters=10):
    \"\"\"Discover natural tag groupings\"\"\"
    
    # Get all memories with embeddings
    all_memories = await storage.get_all_with_embeddings()
    
    # Extract embeddings
    embeddings = np.array([m.embedding for m in all_memories])
    
    # Cluster embeddings
    kmeans = KMeans(n_clusters=n_clusters)
    clusters = kmeans.fit_predict(embeddings)
    
    # Analyze each cluster
    cluster_themes = {}
    for i in range(n_clusters):
        cluster_memories = [m for j, m in enumerate(all_memories) if clusters[j] == i]
        
        # Extract common tags in cluster
        tag_counts = {}
        for memory in cluster_memories:
            for tag in memory.metadata.get(\"tags\", []):
                tag_counts[tag] = tag_counts.get(tag, 0) + 1
        
        # Top tags represent cluster theme
        top_tags = sorted(tag_counts.items(), key=lambda x: x[1], reverse=True)[:5]
        cluster_themes[i] = [tag for tag, _ in top_tags]
    
    return cluster_themes

Phase 3: Multi-Layered Consolidation

Integration with Issue #11

Connect re-tagging with the memory consolidation system:

Daily Processing Layer:

async def daily_tag_processing():
    \"\"\"Process new memories for tagging\"\"\"
    
    # Get memories from last 24 hours
    recent_memories = await storage.recall_natural(\"today\")
    
    for memory in recent_memories:
        if not memory.metadata.get(\"tags\"):
            # Auto-suggest tags
            suggested_tags = await tag_engine.suggest_tags_for_memory(memory)
            
            # Apply suggestions with confidence threshold
            confident_tags = [tag for tag, conf in suggested_tags if conf > 0.7]
            
            if confident_tags:
                await retag_memory(storage, memory, confident_tags)

Weekly Processing Layer:

async def weekly_tag_consolidation():
    \"\"\"Consolidate and standardize tags\"\"\"
    
    # Identify similar tags
    tag_groups = {
        \"debugging\": [\"debug\", \"bug-fix\", \"bugfix\", \"troubleshooting\"],
        \"implementation\": [\"implement\", \"implementation\", \"feature\", \"development\"],
        \"testing\": [\"test\", \"testing\", \"qa\", \"verification\"]
    }
    
    for canonical_tag, variations in tag_groups.items():
        for variation in variations:
            # Find memories with variation
            memories = await storage.search_by_tag([variation])
            
            # Update to canonical form
            for memory in memories:
                current_tags = memory.metadata.get(\"tags\", [])
                new_tags = [canonical_tag if t == variation else t for t in current_tags]
                await retag_memory(storage, memory, new_tags)

Monthly Processing Layer:

async def monthly_tag_analysis():
    \"\"\"Analyze tag effectiveness and suggest improvements\"\"\"
    
    report = {
        \"tag_usage\": {},
        \"undertagged_memories\": 0,
        \"overtagged_memories\": 0,
        \"suggested_merges\": [],
        \"suggested_splits\": []
    }
    
    # Analyze tag distribution
    all_memories = await storage.get_all()
    
    for memory in all_memories:
        tag_count = len(memory.metadata.get(\"tags\", []))
        
        if tag_count == 0:
            report[\"undertagged_memories\"] += 1
        elif tag_count > 10:
            report[\"overtagged_memories\"] += 1
        
        for tag in memory.metadata.get(\"tags\", []):
            report[\"tag_usage\"][tag] = report[\"tag_usage\"].get(tag, 0) + 1
    
    # Suggest merges for rarely used tags
    for tag, count in report[\"tag_usage\"].items():
        if count < 3:
            similar_tags = await find_similar_tags(tag)
            if similar_tags:
                report[\"suggested_merges\"].append({
                    \"merge\": tag,
                    \"into\": similar_tags[0]
                })
    
    return report

Standardized Tag Schema

Recommended Categories

Projects/Repositories:

  • Format: project-name (lowercase, hyphenated)
  • Examples: mcp-memory-service, memory-dashboard, github-actions

Technologies:

  • Format: technology (lowercase, no version numbers)
  • Examples: python, typescript, react, docker, chromadb

Activities:

  • Format: activity-type (lowercase, hyphenated)
  • Examples: debugging, implementation, testing, documentation, research

Status:

  • Format: status-indicator (lowercase, hyphenated)
  • Examples: resolved, in-progress, blocked, needs-review

Domains:

  • Format: domain-area (lowercase)
  • Examples: frontend, backend, devops, database, architecture

Priority:

  • Format: priority-level (lowercase, hyphenated)
  • Examples: urgent, high-priority, normal-priority, low-priority

Time-based:

  • Format: YYYY-MM or YYYY-QN (for monthly/quarterly tags)
  • Examples: 2025-06, 2025-Q2

Implementation Examples

Example 1: Weekly Review Script

async def weekly_memory_review():
    \"\"\"Complete weekly memory maintenance\"\"\"
    
    print(\"=== Weekly Memory Review ===\")
    
    # Step 1: Find untagged memories
    untagged, poorly_tagged = await find_untagged_memories(storage)
    print(f\"Found {len(untagged)} untagged memories\")
    print(f\"Found {len(poorly_tagged)} poorly tagged memories\")
    
    # Step 2: Process each memory
    for memory in untagged + poorly_tagged:
        print(f\"\
Memory: {memory.content[:100]}...\")
        
        # Analyze themes
        themes = analyze_content_themes(memory.content)
        
        # Suggest tags
        suggested_tags = []
        for category, tags in themes.items():
            suggested_tags.extend(tags)
        
        print(f\"Suggested tags: {suggested_tags}\")
        
        # Apply tags
        if suggested_tags:
            await retag_memory(storage, memory, suggested_tags)
            print(\"Re-tagged successfully\")
    
    # Step 3: Generate report
    stats = await storage.get_stats()
    print(f\"\
=== Review Complete ===\")
    print(f\"Total memories: {stats['total_memories']}\")
    print(f\"Unique tags: {stats['unique_tags']}\")

Example 2: Automated Suggestion

async def auto_tag_new_memory(memory):
    \"\"\"Automatically suggest tags for new memories\"\"\"
    
    # Get suggestions from multiple sources
    theme_tags = analyze_content_themes(memory.content)
    semantic_tags = await tag_engine.suggest_tags_for_memory(memory)
    
    # Combine and deduplicate
    all_suggestions = []
    for category, tags in theme_tags.items():
        all_suggestions.extend(tags)
    all_suggestions.extend([tag for tag, _ in semantic_tags[:3]])
    
    # Remove duplicates while preserving order
    seen = set()
    final_tags = []
    for tag in all_suggestions:
        if tag not in seen:
            seen.add(tag)
            final_tags.append(tag)
    
    return final_tags

Best Practices

1. Tag Hygiene

  • Keep tags lowercase for consistency
  • Use hyphens for multi-word tags
  • Avoid special characters
  • Limit to 5-7 tags per memory
  • Remove redundant tags

2. Regular Maintenance

  • Weekly: Review new memories
  • Monthly: Consolidate similar tags
  • Quarterly: Analyze tag effectiveness
  • Yearly: Major schema updates

3. Documentation

# Document tagging decisions
TAGGING_RULES = {
    \"projects\": {
        \"description\": \"Project or repository names\",
        \"format\": \"lowercase-hyphenated\",
        \"examples\": [\"mcp-memory-service\", \"react-dashboard\"]
    },
    \"technologies\": {
        \"description\": \"Programming languages and tools\",
        \"format\": \"lowercase\",
        \"avoid\": [\"version numbers\", \"minor variants\"],
        \"examples\": [\"python\", \"javascript\", \"docker\"]
    }
}

4. User Training

Create a tagging guide for consistent application:

QUICK TAGGING GUIDE:

When storing a memory, ask yourself:
1. What project is this about? → project tag
2. What technologies are involved? → tech tags
3. What am I doing? → activity tag
4. What's the status? → status tag
5. What domain? → domain tag

Example:
\"Fixed React component rendering issue in dashboard\"
Tags: react-dashboard, react, debugging, resolved, frontend

Conclusion

The three-phase memory re-tagging system provides:

  1. Immediate value through manual review processes
  2. Scalability with automated suggestions
  3. Long-term organization via consolidation integration

By implementing this system, the MCP Memory Service can maintain a well-organized, easily searchable knowledge base that grows more valuable over time.

Clone this wiki locally