187 Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:

["AAAAACCCCC", "CCCCCAAAAA"].

Solution

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        map<string, int> record;
        vector<string> res;
        int n = s.size();
        if ( n <= 9 ) return res;
        for (int i = 0; i <= n-10; i++ ) {
            string sub_s = s.substr(i, 10);
            if ( record.find(sub_s) == record.end() ) {
                record[sub_s] = 1;
                continue;
            }
            if ( record[sub_s] == 1 ) {
                res.push_back(sub_s);
                record[sub_s] += 1;
            }
        }
        return res;
    }
};

Notes
Note the use of hash table.

Similar problem
For a given string, count the number of all unique sub strings.

results matching ""

    No results matching ""