Edit Distance
Edit Distance Algorithms (Levenshtein Distance)
Edit distance is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another. This is also known as the Levenshtein distance.
Applications:
- Spell checking and correction
- DNA sequence analysis
- Plagiarism detection
- Fuzzy string matching
- Data deduplication
This module implements multiple approaches:
- Naive recursive approach
- Memoized recursive approach
- Dynamic programming (bottom-up)
- Space-optimized dynamic programming
- Operations tracking (backtracking)
Time Complexities:
- Naive recursive: O(3^max(m,n)) - exponential
- Memoized: O(m×n) - with O(m×n) space
- DP: O(m×n) - with O(m×n) space
- Space-optimized DP: O(m×n) time, O(min(m,n)) space
EditOperation
Bases: Enum
Types of edit operations.
DELETE = 'delete'
class-attribute
instance-attribute
Delete a character from the string.
INSERT = 'insert'
class-attribute
instance-attribute
Insert a character into the string.
MATCH = 'match'
class-attribute
instance-attribute
Characters in both strings match, no operation needed.
SUBSTITUTE = 'substitute'
class-attribute
instance-attribute
Substitute one character for another in the string.
apply_operations(str1, operations)
Apply a sequence of edit operations to transform one string to another.
Takes a string and a list of operations (from edit_distance_with_operations) and applies them to get the target string.
Time Complexity: O(n×k) where k is number of operations Space Complexity: O(n) - for result string
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
str1
|
str
|
Source string |
required |
operations
|
List[Tuple[EditOperation, int, Optional[str]]]
|
List of operations to apply |
required |
Returns:
| Type | Description |
|---|---|
str
|
Result string after applying all operations |
Examples:
edit_distance_dp(str1, str2)
Calculate edit distance using dynamic programming (bottom-up).
This is the standard DP approach, building up solutions from smaller subproblems. Most commonly used implementation due to good performance and clarity.
Time Complexity: O(m×n) - fill entire DP table Space Complexity: O(m×n) - DP table storage
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
str1
|
str
|
First string |
required |
str2
|
str
|
Second string |
required |
Returns:
| Type | Description |
|---|---|
int
|
Minimum edit distance between the strings |
Examples:
edit_distance_memoized(str1, str2)
Calculate edit distance using memoized recursion.
This approach caches subproblem results to avoid recomputation, reducing time complexity to polynomial.
Time Complexity: O(m×n) - each subproblem computed once Space Complexity: O(m×n) - memoization table + recursion stack
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
str1
|
str
|
First string |
required |
str2
|
str
|
Second string |
required |
Returns:
| Type | Description |
|---|---|
int
|
Minimum edit distance between the strings |
Examples:
edit_distance_recursive(str1, str2)
Calculate edit distance using naive recursion.
This is the most straightforward implementation but has exponential time complexity. Only suitable for very short strings.
Time Complexity: O(3^max(m,n)) - exponential Space Complexity: O(max(m,n)) - recursion stack depth
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
str1
|
str
|
First string |
required |
str2
|
str
|
Second string |
required |
Returns:
| Type | Description |
|---|---|
int
|
Minimum edit distance between the strings |
Examples:
edit_distance_space_optimized(str1, str2)
Calculate edit distance with space optimization.
Since we only need the previous row to compute the current row, we can optimize space complexity from O(m×n) to O(min(m,n)).
Time Complexity: O(m×n) - same computation as standard DP Space Complexity: O(min(m,n)) - only store current and previous row
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
str1
|
str
|
First string |
required |
str2
|
str
|
Second string |
required |
Returns:
| Type | Description |
|---|---|
int
|
Minimum edit distance between the strings |
Examples:
edit_distance_with_costs(str1, str2, insert_cost=1, delete_cost=1, substitute_cost=1)
Calculate edit distance with custom operation costs.
Allows different costs for different operations, which can be useful in various applications (e.g., DNA analysis where certain mutations are more likely than others).
Time Complexity: O(m×n) - standard DP approach Space Complexity: O(m×n) - DP table storage
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
str1
|
str
|
First string |
required |
str2
|
str
|
Second string |
required |
insert_cost
|
int
|
Cost of inserting a character |
1
|
delete_cost
|
int
|
Cost of deleting a character |
1
|
substitute_cost
|
int
|
Cost of substituting a character |
1
|
Returns:
| Type | Description |
|---|---|
int
|
Minimum edit distance with custom costs |
Examples:
edit_distance_with_operations(str1, str2)
Calculate edit distance and return the sequence of operations.
This function not only computes the minimum edit distance but also backtracks to find the actual sequence of operations needed.
Time Complexity: O(m×n) - DP table construction + backtracking Space Complexity: O(m×n) - DP table storage
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
str1
|
str
|
First string (source) |
required |
str2
|
str
|
Second string (target) |
required |
Returns:
| Type | Description |
|---|---|
int
|
Tuple of (edit_distance, operations_list) |
List[Tuple[EditOperation, int, Optional[str]]]
|
Operations list contains tuples of (operation_type, position, character) |
Examples:
similarity_ratio(str1, str2)
Calculate similarity ratio between two strings based on edit distance.
Returns a value between 0.0 (completely different) and 1.0 (identical). Formula: 1 - (edit_distance / max(len(str1), len(str2)))
Time Complexity: O(m×n) - uses edit distance calculation Space Complexity: O(min(m,n)) - uses space-optimized version
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
str1
|
str
|
First string |
required |
str2
|
str
|
Second string |
required |
Returns:
| Type | Description |
|---|---|
float
|
Similarity ratio between 0.0 and 1.0 |
Examples: