When spellcheck finds a misspelled word, how does it find words that are “close by” to suggest corrections? We can use “edit distance” which is how many edits we need to make to get our correct word.
The problem here, is given two strings, find their minimum cost alignment (which is the number of columns that differ).
Sub Problem
- Instead of matching the entire words, we want to find the best alignment of the first i characters of the first string with the first j characters of the second string. Therefore, given and , I want the edit distance between , .
- The last columns will be
- is placed over a blank
- is left over
- This leaves 1 + the cost of alignment of and
- is placed over a blank
- is left over
- This leaves 1 + the cost of alignment of and
- is over
- We get the alignment of and + 0 or 1 depending on if the last indices of and align
- The base case is j and i edits are needed if nothing from x is aligned with y, where here, x would be empty.
- is placed over a blank
This uses a bottom up approach.
editdistance(x, y)
for i = 0, 1, 2, ..., m:
E(i, 0) = i
for j = 0, 1, 2, ..., n:
E(0, j) = j
for i = 1, 2, ..., m:
for j = 1, 2, ..., n:
E(i, j) = min {
1 + E(i - 1, j),
1 + E(i, j - 1),
diff(i, j) + E(i - 1, j - 1)
}
return E(m, n)
This has a runtime of which is that big for loop.