Hashtable

Definition

  • an effective data structure for implementing dictionaries, mapping keys and values.
  • uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.
  • (under reasonable assumptions) $$O(1)$$ time complexity to search for an element in a hash table.

Hashing

  • With hashing, this element is stored in slot $$h(k)$$. We use a hash function $$h$$ to compute the slot from the key k. Here, h maps the universe U of keys into the slots of a hash table.
  • The hash function reduces the range of array indices and hence the size of the array. Instead of a size of $$|U|$$, the array can have a much smaller size $$m$$.

Collision Handling

  • Collision: two keys hash to the same slot.
  • Chaining:
    • Use a linked list at every slot. Slot j contains a pointer to the head of the list of all stored elements that hash to j.
  • Open addressing:
    • All elements occupy the hash table itself. No lists and no elements are stored outside the table.
    • Use the key and a probe number as the input for hashing function: $$h(k, i), i = 0, 1, ......, m-1$$. We require that for every key k, the probe sequence ($$$$) be a permutation of $$<0, 1,="" ......,="" m-1="">$$, so that every hash-table position is eventually considered as a slot for a new key as the table fills up.

Hashing function

  • For integers: modular hashing. Take modular w.r.t. the table size.
  • For strings: treat it as a large integer. Below is a modular hashing function for string s
    int hash = 0;
    for (int i = 0; i <= s.size()-1; i++) {
      hash = (R * hash + s[i]) % M;
    }
    
  • Q: Why does Java use 31 in the hashCode() for String?
    A: It's prime, so that when the user mods out by another number, they have no common factors (unless it's a multiple of 31). 31 is also a Mersenne prime (like 127 or 8191) which is a prime number that is one less than a power of 2. This means that the mod can be done with one shift and one subtract if the machine's multiply instruction is slow.

Implement a simple Hashtable:

http://www.algolist.net/Data_structures/Hash_table/Simple_example

class HashItem{
  private:
    int key, val;
  public:
    HashItem(): key(0), val(0) {}
    HashItem(int k, int v): key(k), val(v){}
    int getKey(){ return key; }
    int getVal(){ return val; }
};

const int SIZE = 256;
class HashTable{
  private:
    HashItem **table;
  public:
    HashTable(){
      table = new HashItem*[SIZE];
      for ( int i = 0; i <= SIZE-1; i++) { table[i] = NULL; }
    }
    void put(int key, int val) {
      int hash = key%SIZE;
      while ( table[hash] != NULL and table[hash]->getKey() != key ) {
        hash = (hash+1)%SIZE;
      }
      table[hash] = new HashItem(key, val);
    }

    int get(int key) {
      int hash = key%SIZE;
      while ( table[hash] != NULL and table[hash]->getKey() != key ) {
        hash = (hash+1)%SIZE;
      }
      if ( table[hash] == NULL ) return -1;
      return table[hash]->getVal();
    }
};

results matching ""

    No results matching ""