Question
The DNA sequence is composed of a series of nucleotides abbreviated as
'A','C','G', and'T'.
- For example,
"ACGAATTCCG"is a DNA sequence.When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string
sthat represents a DNA sequence, return all the10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Ideas:
- substrings - > Sliding Window O(n) solution
- start at beginning of array [0:11] to get first 10 letters, move towards end
- DS:
- Hashmap: sorted string → list of variations
- On update:
- while r < len(s)
- Check if window is in hashmap and that its value is the empty list:
- if true add to res
- if not add key to map and initialize to []
- Probs some idomatic python way to do both operations at once
- increment l and r pointers
- Hashmap will now only contain a single occurance of subsequences have have appeared at least twice
- We could probs just use a hashset here but when I read the question I didn’t fully understand what they were asking
- We could use 2 hash sets, one for all subsequences and one for our result and basically just append to the result if we are in all patterns already, otherwise add to all patterns
- Edge cases:
- Not 10 chars long
- Char is not in sequence (we would actually solve this as well)