Longest Common Substring (LCS)

Algorithms for finding the longest substring shared between two strings.

Overview

The module provides multiple strategies for computing the longest common substring between two texts:

Suffix Tree-based LCS: Fastest for complex queries and repeated operations
Suffix Array-based LCS: Memory efficient using LCP array optimization
Hybrid approach: Automatically selects optimal strategy based on input
Brute force: Simple implementation for small strings

Usage

from algo.algorithms.strings.longest_common_substring import *

# Basic usage
lcs = find_longest_common_substring("abcdef", "defghi")  # Returns "def"

# With positions
substring, pos1, pos2 = find_lcs_with_positions("banana", "ananas")

# Compare strategies
comparison = compare_lcs_strategies("text1", "text2")

# Find all common substrings
substrings = find_all_common_substrings("abc", "bcd", min_length=1)

Performance

Suffix Tree: O(n + m) construction, O(n + m) LCS computation
Suffix Array: O((n + m) log(n + m)) construction, O(n + m) LCS computation
Brute Force: O(n * m * min(n, m))
Memory usage: Suffix Tree > Suffix Array > Brute Force

API Reference

Longest Common Substring Algorithms

This module provides high-level longest common substring functionality using different underlying data structures and strategies. It demonstrates the practical applications of Suffix Tree and Suffix Array data structures for finding the longest substring that appears in multiple strings.

Strategies Implemented:

Suffix Tree-based LCS (faster for complex queries)
Suffix Array-based LCS (memory efficient)
Hybrid approach combining multiple strategies
Brute force approach for small strings

Performance Characteristics:

Suffix Tree: O(n + m) construction, O(n + m) LCS computation where n, m are text lengths
Suffix Array: O((n + m) log(n + m)) construction, O(n + m) LCS computation
Brute Force: O(n * m * min(n, m)) for comparison
Memory usage: Suffix Tree > Suffix Array > Brute Force

Use Cases:

Finding similarities between documents
DNA sequence analysis
Code plagiarism detection
Data deduplication
Version control systems
Text comparison tools

`BruteForceLCS`

Brute force longest common substring implementation.

Best for: Very small strings, educational purposes, verification. Use when: You need a simple, easy-to-understand implementation.

`find_lcs(text1, text2)`

Find the longest common substring using brute force approach.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required

Returns:

Type	Description
`LCSResult`	LCSResult containing the longest common substring

`HybridLCS()`

Hybrid LCS that combines multiple strategies for optimal performance.

Uses different strategies based on input characteristics and performance requirements.

Initialize hybrid LCS with all underlying implementations.

`compare_strategies(text1, text2)`

Compare all strategies and return results from each.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required

Returns:

Type	Description
`Dict[str, LCSResult]`	Dictionary mapping strategy names to their results

`find_lcs(text1, text2, strategy=None)`

Find the longest common substring using optimal strategy.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required
`strategy`	`Optional[LCSStrategy]`	Force specific strategy (optional)	`None`

Returns:

Type	Description
`LCSResult`	LCSResult from the selected strategy

`LCSResult(substring, length, positions_text1=list(), positions_text2=list(), strategy_used='unknown', computation_time=0.0, metadata=dict())` `dataclass`

Represents a longest common substring result with metadata.

`__post_init__()`

Update length based on substring if not provided.

`repr()`

Return detailed representation.

`str()`

Return string representation.

`LCSStrategy`

Bases: Enum

Different LCS computation strategies.

`BRUTE_FORCE = 'brute_force'` `class-attribute` `instance-attribute`

Brute force LCS strategy.

`HYBRID = 'hybrid'` `class-attribute` `instance-attribute`

Hybrid LCS strategy that combines multiple approaches for optimal performance.

`SUFFIX_ARRAY = 'suffix_array'` `class-attribute` `instance-attribute`

Suffix Array-based LCS strategy.

`SUFFIX_TREE = 'suffix_tree'` `class-attribute` `instance-attribute`

Suffix Tree-based LCS strategy.

`SuffixArrayLCS()`

Suffix Array-based longest common substring implementation.

Best for: Memory-constrained environments, one-time computations. Use when: You need space efficiency and can tolerate slightly slower construction.

Initialize suffix array-based LCS.

`find_lcs(text1, text2)`

Find the longest common substring between two texts using suffix array.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required

Returns:

Type	Description
`LCSResult`	LCSResult containing the longest common substring and metadata

`find_lcs_with_lcp(text1, text2)`

Find LCS using generalized suffix array with LCP array optimization.

This method builds a generalized suffix array for both texts and uses the LCP array to efficiently find the longest common substring.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required

Returns:

Type	Description
`LCSResult`	LCSResult with detailed LCP analysis

`SuffixTreeLCS()`

Suffix Tree-based longest common substring implementation.

Best for: Complex queries, multiple LCS operations, when memory is not a constraint. Use when: You need fast repeated queries on the same text pairs.

Initialize suffix tree-based LCS.

`find_all_common_substrings(text1, text2, min_length=1)`

Find all common substrings between two texts.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required
`min_length`	`int`	Minimum length of substrings to consider	`1`

Returns:

Type	Description
`List[LCSResult]`	List of LCSResult objects for all common substrings

`find_lcs(text1, text2)`

Find the longest common substring between two texts using suffix tree.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required

Returns:

Type	Description
`LCSResult`	LCSResult containing the longest common substring and metadata

`compare_lcs_strategies(text1, text2)`

Compare all LCS strategies and return performance metrics.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required

Returns:

Type	Description
`Dict[str, Dict]`	Dictionary with strategy comparisons

`demo_lcs()`

Demonstrate different LCS strategies.

`find_all_common_substrings(text1, text2, min_length=2, strategy=LCSStrategy.SUFFIX_TREE)`

Find all common substrings between two texts.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required
`min_length`	`int`	Minimum length of substrings to consider	`2`
`strategy`	`LCSStrategy`	Strategy to use (only SUFFIX_TREE supported for now)	`SUFFIX_TREE`

Returns:

Type	Description
`List[str]`	List of common substrings sorted by length (descending)

`find_lcs_with_positions(text1, text2, strategy=LCSStrategy.SUFFIX_TREE)`

Find LCS and return positions where it occurs in both texts.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required
`strategy`	`LCSStrategy`	Strategy to use for computation	`SUFFIX_TREE`

Returns:

Type	Description
`Tuple[str, List[int], List[int]]`	Tuple of (substring, positions_in_text1, positions_in_text2)

`find_longest_common_substring(text1, text2, strategy=LCSStrategy.SUFFIX_TREE)`

Find the longest common substring between two texts.

Parameters:

Name	Type	Description	Default
`text1`	`str`	First text	required
`text2`	`str`	Second text	required
`strategy`	`LCSStrategy`	Strategy to use for computation	`SUFFIX_TREE`

Returns:

Type	Description
`str`	The longest common substring

Longest Common Substring (LCS)

Overview

Usage

Performance

API Reference

BruteForceLCS

find_lcs(text1, text2)

HybridLCS()

compare_strategies(text1, text2)

find_lcs(text1, text2, strategy=None)

LCSResult(substring, length, positions_text1=list(), positions_text2=list(), strategy_used='unknown', computation_time=0.0, metadata=dict()) dataclass

__post_init__()

__repr__()

__str__()

LCSStrategy

BRUTE_FORCE = 'brute_force' class-attribute instance-attribute

HYBRID = 'hybrid' class-attribute instance-attribute

SUFFIX_ARRAY = 'suffix_array' class-attribute instance-attribute

SUFFIX_TREE = 'suffix_tree' class-attribute instance-attribute

SuffixArrayLCS()

find_lcs(text1, text2)

find_lcs_with_lcp(text1, text2)

SuffixTreeLCS()

find_all_common_substrings(text1, text2, min_length=1)

find_lcs(text1, text2)

compare_lcs_strategies(text1, text2)

demo_lcs()

find_all_common_substrings(text1, text2, min_length=2, strategy=LCSStrategy.SUFFIX_TREE)

find_lcs_with_positions(text1, text2, strategy=LCSStrategy.SUFFIX_TREE)

find_longest_common_substring(text1, text2, strategy=LCSStrategy.SUFFIX_TREE)

`BruteForceLCS`

`find_lcs(text1, text2)`

`HybridLCS()`

`compare_strategies(text1, text2)`

`find_lcs(text1, text2, strategy=None)`

`LCSResult(substring, length, positions_text1=list(), positions_text2=list(), strategy_used='unknown', computation_time=0.0, metadata=dict())` `dataclass`

`__post_init__()`

`repr()`

`str()`

`LCSStrategy`

`BRUTE_FORCE = 'brute_force'` `class-attribute` `instance-attribute`

`HYBRID = 'hybrid'` `class-attribute` `instance-attribute`

`SUFFIX_ARRAY = 'suffix_array'` `class-attribute` `instance-attribute`

`SUFFIX_TREE = 'suffix_tree'` `class-attribute` `instance-attribute`

`SuffixArrayLCS()`

`find_lcs(text1, text2)`

`find_lcs_with_lcp(text1, text2)`

`SuffixTreeLCS()`

`find_all_common_substrings(text1, text2, min_length=1)`

`find_lcs(text1, text2)`

`compare_lcs_strategies(text1, text2)`

`demo_lcs()`

`find_all_common_substrings(text1, text2, min_length=2, strategy=LCSStrategy.SUFFIX_TREE)`

`find_lcs_with_positions(text1, text2, strategy=LCSStrategy.SUFFIX_TREE)`

`find_longest_common_substring(text1, text2, strategy=LCSStrategy.SUFFIX_TREE)`