Longest Repeated Substring (LRS)

Algorithms for finding the longest substring that appears multiple times in a text.

Overview

The module provides multiple strategies for finding the longest repeated substring:

Suffix Tree-based LRS: Optimal for complex queries and repeated operations
Suffix Array-based LRS: Memory efficient using LCP array
Hybrid approach: Automatically selects best strategy
Brute force: Simple implementation for small texts

Usage

from algo.algorithms.strings.longest_repeated_substring import *

# Basic usage
lrs = find_longest_repeated_substring("banana")  # Returns "ana"

# With positions
substring, positions = find_lrs_with_positions("mississippi")

# Compare strategies
comparison = compare_lrs_strategies("text")

# Find all repeated substrings
substrings = find_all_repeated_substrings("banana", min_length=2)

Performance

Suffix Tree: O(n) construction, O(n) LRS computation
Suffix Array: O(n log n) construction, O(n) LRS computation
Brute Force: O(n³) for comparison
Memory usage: Suffix Tree > Suffix Array > Brute Force

API Reference

Longest Repeated Substring Algorithms

This module provides high-level longest repeated substring functionality using different underlying data structures and strategies. It demonstrates the practical applications of Suffix Tree and Suffix Array data structures for finding the longest substring that appears multiple times in a single string.

Strategies Implemented:

Suffix Tree-based LRS (faster for complex queries)
Suffix Array-based LRS (memory efficient using LCP array)
Hybrid approach combining multiple strategies
Brute force approach for small strings

Performance Characteristics:

Suffix Tree: O(n) construction, O(n) LRS computation where n is text length
Suffix Array: O(n log n) construction, O(n) LRS computation using LCP array
Brute Force: O(n³) for comparison
Memory usage: Suffix Tree > Suffix Array > Brute Force

Use Cases:

Finding repetitive patterns in DNA sequences
Code duplication detection
Text compression algorithms
Data deduplication
Pattern analysis in logs
Finding common phrases in documents

`BruteForceLRS`

Brute force longest repeated substring implementation.

Best for: Very small strings, educational purposes, verification. Use when: You need a simple, easy-to-understand implementation.

`find_lrs(text)`

Find the longest repeated substring using brute force approach.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required

Returns:

Type	Description
`LRSResult`	LRSResult containing the longest repeated substring

`HybridLRS()`

Hybrid LRS that combines multiple strategies for optimal performance.

Uses different strategies based on input characteristics and performance requirements.

Initialize hybrid LRS with all underlying implementations.

`compare_strategies(text)`

Compare all strategies and return results from each.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required

Returns:

Type	Description
`Dict[str, LRSResult]`	Dictionary mapping strategy names to their results

`find_lrs(text, strategy=None)`

Find the longest repeated substring using optimal strategy.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required
`strategy`	`Optional[LRSStrategy]`	Force specific strategy (optional)	`None`

Returns:

Type	Description
`LRSResult`	LRSResult from the selected strategy

`LRSResult(substring, length, positions=list(), occurrences=0, strategy_used='unknown', computation_time=0.0, metadata=dict())` `dataclass`

Represents a longest repeated substring result with metadata.

`__post_init__()`

Update fields based on substring if not provided.

`repr()`

Return detailed representation.

`str()`

Return string representation.

`LRSStrategy`

Bases: Enum

Different LRS computation strategies.

`BRUTE_FORCE = 'brute_force'` `class-attribute` `instance-attribute`

Brute force LRS strategy. Best for very small strings and educational purposes.

`HYBRID = 'hybrid'` `class-attribute` `instance-attribute`

Hybrid LRS strategy that combines multiple approaches for optimal performance.

`SUFFIX_ARRAY = 'suffix_array'` `class-attribute` `instance-attribute`

Suffix Array-based LRS strategy. Best for memory efficiency and one-time computations.

`SUFFIX_TREE = 'suffix_tree'` `class-attribute` `instance-attribute`

Suffix Tree-based LRS strategy. Best for complex queries and multiple operations.

`SuffixArrayLRS()`

Suffix Array-based longest repeated substring implementation.

Best for: Memory-constrained environments, one-time computations. Use when: You need space efficiency and can tolerate slightly slower construction.

Initialize suffix array-based LRS.

`find_all_repeated_substrings(text, min_length=2)`

Find all repeated substrings using suffix array and LCP array.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required
`min_length`	`int`	Minimum length of substrings to consider	`2`

Returns:

Type	Description
`List[LRSResult]`	List of LRSResult objects for all repeated substrings

`find_lrs(text)`

Find the longest repeated substring using suffix array and LCP array.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required

Returns:

Type	Description
`LRSResult`	LRSResult containing the longest repeated substring and metadata

`SuffixTreeLRS()`

Suffix Tree-based longest repeated substring implementation.

Best for: Complex queries, multiple LRS operations, when memory is not a constraint. Use when: You need fast repeated queries on the same text.

Initialize suffix tree-based LRS.

`find_all_repeated_substrings(text, min_length=2)`

Find all repeated substrings of at least the given length.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required
`min_length`	`int`	Minimum length of substrings to consider	`2`

Returns:

Type	Description
`List[LRSResult]`	List of LRSResult objects for all repeated substrings

`find_lrs(text)`

Find the longest repeated substring using suffix tree.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required

Returns:

Type	Description
`LRSResult`	LRSResult containing the longest repeated substring and metadata

`compare_lrs_strategies(text)`

Compare all LRS strategies and return performance metrics.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required

Returns:

Type	Description
`Dict[str, Dict]`	Dictionary with strategy comparisons

`count_repeated_substrings(text, min_length=2, strategy=LRSStrategy.SUFFIX_TREE)`

Count the number of unique repeated substrings in a text.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required
`min_length`	`int`	Minimum length of substrings to consider	`2`
`strategy`	`LRSStrategy`	Strategy to use for computation	`SUFFIX_TREE`

Returns:

Type	Description
`int`	Number of unique repeated substrings

`demo_lrs()`

Demonstrate different LRS strategies.

`find_all_repeated_substrings(text, min_length=2, strategy=LRSStrategy.SUFFIX_TREE)`

Find all repeated substrings in a text.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required
`min_length`	`int`	Minimum length of substrings to consider	`2`
`strategy`	`LRSStrategy`	Strategy to use (SUFFIX_TREE or SUFFIX_ARRAY supported)	`SUFFIX_TREE`

Returns:

Type	Description
`List[str]`	List of repeated substrings sorted by length (descending)

`find_longest_repeated_substring(text, strategy=LRSStrategy.SUFFIX_TREE)`

Find the longest repeated substring in a text.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required
`strategy`	`LRSStrategy`	Strategy to use for computation	`SUFFIX_TREE`

Returns:

Type	Description
`str`	The longest repeated substring

`find_lrs_with_positions(text, strategy=LRSStrategy.SUFFIX_TREE)`

Find LRS and return positions where it occurs in the text.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required
`strategy`	`LRSStrategy`	Strategy to use for computation	`SUFFIX_TREE`

Returns:

Type	Description
`Tuple[str, List[int]]`	Tuple of (substring, positions_in_text)

`get_repetition_statistics(text, min_length=2)`

Get comprehensive statistics about repeated substrings in a text.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required
`min_length`	`int`	Minimum length of substrings to consider	`2`

Returns:

Type	Description
`Dict[str, any]`	Dictionary with repetition statistics

Longest Repeated Substring (LRS)

Overview

Usage

Performance

API Reference

BruteForceLRS

find_lrs(text)

HybridLRS()

compare_strategies(text)

find_lrs(text, strategy=None)

LRSResult(substring, length, positions=list(), occurrences=0, strategy_used='unknown', computation_time=0.0, metadata=dict()) dataclass

__post_init__()

__repr__()

__str__()

LRSStrategy

BRUTE_FORCE = 'brute_force' class-attribute instance-attribute

HYBRID = 'hybrid' class-attribute instance-attribute

SUFFIX_ARRAY = 'suffix_array' class-attribute instance-attribute

SUFFIX_TREE = 'suffix_tree' class-attribute instance-attribute

SuffixArrayLRS()

find_all_repeated_substrings(text, min_length=2)

find_lrs(text)

SuffixTreeLRS()

find_all_repeated_substrings(text, min_length=2)

find_lrs(text)

compare_lrs_strategies(text)

count_repeated_substrings(text, min_length=2, strategy=LRSStrategy.SUFFIX_TREE)

demo_lrs()

find_all_repeated_substrings(text, min_length=2, strategy=LRSStrategy.SUFFIX_TREE)

find_longest_repeated_substring(text, strategy=LRSStrategy.SUFFIX_TREE)

find_lrs_with_positions(text, strategy=LRSStrategy.SUFFIX_TREE)

get_repetition_statistics(text, min_length=2)

`BruteForceLRS`

`find_lrs(text)`

`HybridLRS()`

`compare_strategies(text)`

`find_lrs(text, strategy=None)`

`LRSResult(substring, length, positions=list(), occurrences=0, strategy_used='unknown', computation_time=0.0, metadata=dict())` `dataclass`

`__post_init__()`

`repr()`

`str()`

`LRSStrategy`

`BRUTE_FORCE = 'brute_force'` `class-attribute` `instance-attribute`

`HYBRID = 'hybrid'` `class-attribute` `instance-attribute`

`SUFFIX_ARRAY = 'suffix_array'` `class-attribute` `instance-attribute`

`SUFFIX_TREE = 'suffix_tree'` `class-attribute` `instance-attribute`

`SuffixArrayLRS()`

`find_all_repeated_substrings(text, min_length=2)`

`find_lrs(text)`

`SuffixTreeLRS()`

`find_all_repeated_substrings(text, min_length=2)`

`find_lrs(text)`

`compare_lrs_strategies(text)`

`count_repeated_substrings(text, min_length=2, strategy=LRSStrategy.SUFFIX_TREE)`

`demo_lrs()`

`find_all_repeated_substrings(text, min_length=2, strategy=LRSStrategy.SUFFIX_TREE)`

`find_longest_repeated_substring(text, strategy=LRSStrategy.SUFFIX_TREE)`

`find_lrs_with_positions(text, strategy=LRSStrategy.SUFFIX_TREE)`

`get_repetition_statistics(text, min_length=2)`