Skip to content

Edit Distance

Edit Distance Algorithms (Levenshtein Distance)

Edit distance is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another. This is also known as the Levenshtein distance.

Applications:

  • Spell checking and correction
  • DNA sequence analysis
  • Plagiarism detection
  • Fuzzy string matching
  • Data deduplication

This module implements multiple approaches:

  1. Naive recursive approach
  2. Memoized recursive approach
  3. Dynamic programming (bottom-up)
  4. Space-optimized dynamic programming
  5. Operations tracking (backtracking)

Time Complexities:

  • Naive recursive: O(3^max(m,n)) - exponential
  • Memoized: O(m×n) - with O(m×n) space
  • DP: O(m×n) - with O(m×n) space
  • Space-optimized DP: O(m×n) time, O(min(m,n)) space

EditOperation

Bases: Enum

Types of edit operations.

DELETE = 'delete' class-attribute instance-attribute

Delete a character from the string.

INSERT = 'insert' class-attribute instance-attribute

Insert a character into the string.

MATCH = 'match' class-attribute instance-attribute

Characters in both strings match, no operation needed.

SUBSTITUTE = 'substitute' class-attribute instance-attribute

Substitute one character for another in the string.

apply_operations(str1, operations)

Apply a sequence of edit operations to transform one string to another.

Takes a string and a list of operations (from edit_distance_with_operations) and applies them to get the target string.

Time Complexity: O(n×k) where k is number of operations Space Complexity: O(n) - for result string

Parameters:

Name Type Description Default
str1 str

Source string

required
operations List[Tuple[EditOperation, int, Optional[str]]]

List of operations to apply

required

Returns:

Type Description
str

Result string after applying all operations

Examples:

>>> ops = [(EditOperation.SUBSTITUTE, 0, 'b')]
>>> apply_operations("cat", ops)
'bat'

edit_distance_dp(str1, str2)

Calculate edit distance using dynamic programming (bottom-up).

This is the standard DP approach, building up solutions from smaller subproblems. Most commonly used implementation due to good performance and clarity.

Time Complexity: O(m×n) - fill entire DP table Space Complexity: O(m×n) - DP table storage

Parameters:

Name Type Description Default
str1 str

First string

required
str2 str

Second string

required

Returns:

Type Description
int

Minimum edit distance between the strings

Examples:

>>> edit_distance_dp("kitten", "sitting")
3
>>> edit_distance_dp("intention", "execution")
5

edit_distance_memoized(str1, str2)

Calculate edit distance using memoized recursion.

This approach caches subproblem results to avoid recomputation, reducing time complexity to polynomial.

Time Complexity: O(m×n) - each subproblem computed once Space Complexity: O(m×n) - memoization table + recursion stack

Parameters:

Name Type Description Default
str1 str

First string

required
str2 str

Second string

required

Returns:

Type Description
int

Minimum edit distance between the strings

Examples:

>>> edit_distance_memoized("kitten", "sitting")
3
>>> edit_distance_memoized("sunday", "saturday")
3

edit_distance_recursive(str1, str2)

Calculate edit distance using naive recursion.

This is the most straightforward implementation but has exponential time complexity. Only suitable for very short strings.

Time Complexity: O(3^max(m,n)) - exponential Space Complexity: O(max(m,n)) - recursion stack depth

Parameters:

Name Type Description Default
str1 str

First string

required
str2 str

Second string

required

Returns:

Type Description
int

Minimum edit distance between the strings

Examples:

>>> edit_distance_recursive("kitten", "sitting")
3
>>> edit_distance_recursive("", "abc")
3
>>> edit_distance_recursive("abc", "")
3

edit_distance_space_optimized(str1, str2)

Calculate edit distance with space optimization.

Since we only need the previous row to compute the current row, we can optimize space complexity from O(m×n) to O(min(m,n)).

Time Complexity: O(m×n) - same computation as standard DP Space Complexity: O(min(m,n)) - only store current and previous row

Parameters:

Name Type Description Default
str1 str

First string

required
str2 str

Second string

required

Returns:

Type Description
int

Minimum edit distance between the strings

Examples:

>>> edit_distance_space_optimized("kitten", "sitting")
3
>>> edit_distance_space_optimized("abc", "def")
3

edit_distance_with_costs(str1, str2, insert_cost=1, delete_cost=1, substitute_cost=1)

Calculate edit distance with custom operation costs.

Allows different costs for different operations, which can be useful in various applications (e.g., DNA analysis where certain mutations are more likely than others).

Time Complexity: O(m×n) - standard DP approach Space Complexity: O(m×n) - DP table storage

Parameters:

Name Type Description Default
str1 str

First string

required
str2 str

Second string

required
insert_cost int

Cost of inserting a character

1
delete_cost int

Cost of deleting a character

1
substitute_cost int

Cost of substituting a character

1

Returns:

Type Description
int

Minimum edit distance with custom costs

Examples:

>>> edit_distance_with_costs("cat", "bat", insert_cost=2, delete_cost=2, substitute_cost=1)
1
>>> edit_distance_with_costs("abc", "def", substitute_cost=2)
6

edit_distance_with_operations(str1, str2)

Calculate edit distance and return the sequence of operations.

This function not only computes the minimum edit distance but also backtracks to find the actual sequence of operations needed.

Time Complexity: O(m×n) - DP table construction + backtracking Space Complexity: O(m×n) - DP table storage

Parameters:

Name Type Description Default
str1 str

First string (source)

required
str2 str

Second string (target)

required

Returns:

Type Description
int

Tuple of (edit_distance, operations_list)

List[Tuple[EditOperation, int, Optional[str]]]

Operations list contains tuples of (operation_type, position, character)

Examples:

>>> distance, ops = edit_distance_with_operations("cat", "bat")
>>> distance
1
>>> ops
[(EditOperation.SUBSTITUTE, 0, 'b')]

similarity_ratio(str1, str2)

Calculate similarity ratio between two strings based on edit distance.

Returns a value between 0.0 (completely different) and 1.0 (identical). Formula: 1 - (edit_distance / max(len(str1), len(str2)))

Time Complexity: O(m×n) - uses edit distance calculation Space Complexity: O(min(m,n)) - uses space-optimized version

Parameters:

Name Type Description Default
str1 str

First string

required
str2 str

Second string

required

Returns:

Type Description
float

Similarity ratio between 0.0 and 1.0

Examples:

>>> similarity_ratio("kitten", "sitting")
0.5714285714285714
>>> similarity_ratio("identical", "identical")
1.0
>>> similarity_ratio("", "abc")
0.0