Leetcode - Repeated DNA Sequences

Question

The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'.

For example, "ACGAATTCCG" is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Ideas:

substrings - > Sliding Window O(n) solution
start at beginning of array [0:11] to get first 10 letters, move towards end
DS:
- Hashmap: sorted string → list of variations
On update:
while r < len(s)
- Check if window is in hashmap and that its value is the empty list:
- if true add to res
- if not add key to map and initialize to []
- Probs some idomatic python way to do both operations at once
- increment l and r pointers
Hashmap will now only contain a single occurance of subsequences have have appeared at least twice
We could probs just use a hashset here but when I read the question I didn’t fully understand what they were asking
- We could use 2 hash sets, one for all subsequences and one for our result and basically just append to the result if we are in all patterns already, otherwise add to all patterns
Edge cases:
- Not 10 chars long
- Char is not in sequence (we would actually solve this as well)

Solution 1 Solution 2

Idk why this gave me so much trouble before? This is lowkey easy. O(m*n) solution
O(n) solution, the secret here was instead of doing the O(m) check if we have a duplicate, we just make res a set and then convert it to a list on return which is O(n).

🤖 Dan Huynh

Recent Notes

Dan Huynh

Software Architecture

ECE406

ECE459

Algorithms

Explorer

Leetcode - Repeated DNA Sequences

Graph View

Recent Notes

Dan Huynh

Software Architecture

ECE406