187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Solution
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
map<string, int> record;
vector<string> res;
int n = s.size();
if ( n <= 9 ) return res;
for (int i = 0; i <= n-10; i++ ) {
string sub_s = s.substr(i, 10);
if ( record.find(sub_s) == record.end() ) {
record[sub_s] = 1;
continue;
}
if ( record[sub_s] == 1 ) {
res.push_back(sub_s);
record[sub_s] += 1;
}
}
return res;
}
};
Notes
Note the use of hash table.
Similar problem
For a given string, count the number of all unique sub strings.