Skip to content

Files

Latest commit

23ed48e ยท Mar 6, 2022

History

History
72 lines (54 loc) ยท 6.34 KB

HashTable.md

File metadata and controls

72 lines (54 loc) ยท 6.34 KB

Hash Table

key์— ๋Œ€ํ•œ hash ๊ฐ’์„ ์ธ๋ฑ์Šค๋กœ ์‚ฌ์šฉํ•˜์—ฌ key-value ์Œ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ์กฐํšŒํ•˜๋ฉฐ, key-value ์Œ์˜ ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ํฌ๊ธฐ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ์ž๋ฃŒ๊ตฌ์กฐ์ด๋‹ค.

hash table์€ array๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ๊ณ  array ๊ฐ๊ฐ์˜ ์ฃผ์†Œ๋ฅผ bucket ์ด๋ผ ๋ถ€๋ฅธ๋‹ค. hash function ์€ key-value์˜ key๊ฐ’์„ array์˜ index๋กœ ๋งคํ•‘์‹œํ‚ค๋Š” ํ•จ์ˆ˜์ด๋‹ค.

Hash Functin

ํ•ด์‹œ ํ•จ์ˆ˜๋Š” ์ž„์˜ ํฌ๊ธฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ • ํฌ๊ธฐ์˜ ๊ฐ’์œผ๋กœ ๋งคํ•‘ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋Š” ํ•จ์ˆ˜์ด๋‹ค.

  • hash function๋กœ hash table์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” array์— ์ €์žฅ๋  ๋ฐ์ดํ„ฐ์˜ ์œ„์น˜(index)๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
    • index๋ฅผ ๊ตฌํ•  ๋•Œ, ๋ณดํ†ต mod ์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•œ๋‹ค.
    • hash function์„ ํ†ตํ•ด index๋ฅผ ๊ตฌํ•˜๋Š” ๊ณผ์ •์„ hashing ์ด๋ผ๊ณ  ํ•œ๋‹ค.

Hash Collision

1/m ํ™•๋ฅ ์˜ ํ•ด์‹œ ์ถฉ๋Œ (hash table์˜ ํฌ๊ธฐ: m)

hash function์˜ ํ‘œํ˜„๋ฒ”์œ„๋ฅผ m์œผ๋กœ ์ขํž˜์œผ๋กœ์จ ์„œ๋กœ ๋‹ค๋ฅธ hashCode๊ฐ’์„ ๊ฐ€์ง„ ๊ฐ์ฒด๊ฐ€ 1/m ์˜ ํ™•๋ฅ ๋กœ ๋™์ผํ•œ ํ•ด์‹œ ๋ฒ„ํ‚ท ์ธ๋ฑ์Šค๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ํ•ด์‹œ ์ถฉ๋Œ(hash collision) ์ด ๋ฐœ์ƒํ•œ๋‹ค. ํ•ด์‹œ ์ถฉ๋Œ์€ hash function์„ ์•„๋ฌด๋ฆฌ ์ž˜ ๊ตฌํ˜„ํ•˜์—ฌ๋„ ์ƒ๊ด€์—†์ด ๋ฐœ์ƒํ•œ๋‹ค.

ํ•ด์‹œ ์ถฉ๋Œ ํ•ด๊ฒฐ๋ฐฉ๋ฒ•

Open Addressing (๊ฐœ๋ฐฉ ์ฃผ์†Œ๋ฒ•)

๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•˜๋ ค๋Š” hash bucket์ด ์ด๋ฏธ ์‚ฌ์šฉ์ค‘์ด๋ผ๋ฉด ๋น„์–ด์žˆ๋Š” bucket์„ ์ฐพ์•„ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

  • Worst Case์˜ ๊ฒฝ์šฐ ๋น„์–ด์žˆ๋Š” bucket์„ ์ฐพ์ง€ ๋ชปํ•˜๊ณ  ํƒ์ƒ‰์„ ์‹œ์ž‘ํ•œ ์œ„์น˜๋กœ ๋˜๋Œ์•„์˜ฌ ์ˆ˜ ์žˆ๋‹ค.
    • ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๋Š” bucket์ด ๋ชจ์—ฌ์žˆ์œผ๋ฉด Worse Case ๋ฐœ์ƒ ๋นˆ๋„๊ฐ€ ๋†’์•„์ง„๋‹ค.
  • ๋น„์–ด์žˆ๋Š” bucket์„ ํƒ์ƒ‰ํ•˜๋Š” ๋ฐฉ์‹์—๋Š” Linear Probing, Quadratic Probing, Double Hashing ์ด ์žˆ๋‹ค.
    • Linear Probing(์„ ํ˜• ํƒ์ƒ‰)
      • ํ•ด์‹œ ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•œ ํ˜„์žฌ bucket index๋ถ€ํ„ฐ ๊ณ ์ •ํญ n๋งŒํผ ์ด๋™ํ•˜๋ฉด์„œ ๋น„์–ด์žˆ๋Š” bucket์„ ์ฐพ์•„ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.
      • Primary Clustering ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
      • ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๋Š” filled bucket๋“ค์ด ๋ชจ์—ฌ์žˆ๋‹ค๋ฉด ๋น„์–ด์žˆ๋Š” bucket์„ ์ฐพ๊ธฐ ๊นŒ์ง€์˜ ํƒ์ƒ‰ ์‹œ๊ฐ„์ด ๋Š˜์–ด๋‚œ๋‹ค. Primary Cluster๊ฐ€ ํ˜•์„ฑ๋˜๋ฉด ๋น ๋ฅด๊ฒŒ ์ปค์ง€๊ณ  ๊ฒฐ๊ตญ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์•ผ๊ธฐํ•œ๋‹ค.
    • Quadratic Probing(์ œ๊ณฑ ํƒ์ƒ‰)
      • ํ•ด์‹œ ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•œ ํ˜„์žฌ bucket index๋ถ€ํ„ฐ n^2 ๋งŒํผ ์ด๋™ํ•˜๋ฉด์„œ ๋น„์–ด์žˆ๋Š” bucket์„ ์ฐพ์•„ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ Linear Probing ๋ฐฉ์‹์˜ Primary Clustering ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.
      • Secondary Clustering ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
      • n^2 ๊ฐ„๊ฒฉ์œผ๋กœ filled bucket์ด ์กด์žฌํ•˜์—ฌ key์˜ hash๊ฐ’(index) ๋ณด๋‹ค ํ›จ์”ฌ ๋–จ์–ด์ง„ ๊ณณ์— ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฝ์ž…๋˜๋Š” ํ˜„์ƒ์ด๋‹ค. ์ด๋Š” filled bucket ๊ตฐ์ง‘์„ ํฌ๊ฒŒ๋งŒ๋“ค์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— Primary Clustering ๋ณด๋‹ค ๋œ ์‹ฌ๊ฐํ•œ ๋ฌธ์ œ์ด๋‹ค.
    • Double Hashing(์ด์ค‘ ํ•ด์‹ฑ)
      • hash๊ฐ’์„ ๋‹ค๋ฅธ hash function์œผ๋กœ ํ•œ๋ฒˆ ๋” ํ•ด์‹ฑํ•˜์—ฌ hash์˜ ๊ทœ์น™์„ฑ์„ ์—†์• ๋Š” ๋ฐฉ์‹์œผ๋กœ Secondary Clustering ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.

Open Addressing์˜ ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ ๋ฐ ์‚ญ์ œ

  • ํƒ์ƒ‰
    • target ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๊ฑฐ๋‚˜ empty bucket์— ๋„๋‹ฌํ•˜๊ธฐ ์ „๊นŒ์ง€ ํƒ์ƒ‰(probing)์„ ์ง„ํ–‰ํ•œ๋‹ค. target ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๋Š” ๊ณผ์ •์—์„œ empty bucket์— ๋„๋‹ฌํ•˜๋ฉด ํƒ์ƒ‰(probing)์„ ์ข…๋ฃŒํ•˜๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค.
  • ์‚ญ์ œ
    • ์ค‘๊ฐ„ ์š”์†Œ๋ฅผ ์‚ญ์ œํ•˜๊ฒŒ ๋˜๋ฉด ํƒ์ƒ‰ ์ค‘ empty bucket์„ ๋งŒ๋‚˜ ํƒ์ƒ‰์„ ์ข…๋ฃŒํ•˜๊ฒŒ ๋˜์–ด ์›ํ•˜๋Š” ์š”์†Œ๋ฅผ ํƒ์ƒ‰ํ•  ์ˆ˜ ์—†๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
    • ์‚ญ์ œํ•œ ๋ฐ์ดํ„ฐ์˜ bucket์— dummy node ๋ฅผ ๋„ฃ๊ฑฐ๋‚˜ flag (Occupied, Empty, Deleted) ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํƒ์ƒ‰์ด ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ง„ํ–‰๋˜๋„๋ก ํ•  ์ˆ˜ ์žˆ๋‹ค.

Separate Chaining (๋ถ„๋ฆฌ ์—ฐ๊ฒฐ๋ฒ•)

๊ฐ๊ฐ์˜ hash bucket์„ LinkedList๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š” ํฌ์ธํ„ฐ๋กœ ๊ตฌ์„ฑํ•˜์—ฌ ํ•ด์‹œ ์ถฉ๋Œ์‹œ ํ•ด๋‹น bucket์˜ LinkedList์— ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

  • ์ผ๋ฐ˜์ ์œผ๋กœ Seperate Chaining ๋ฐฉ์‹์€ Open Addressing ๋ฐฉ์‹๋ณด๋‹ค ๋น ๋ฅด๋‹ค.

    • Open Addressing์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๋Š” bucket์ด ๋ชจ์—ฌ์žˆ์œผ๋ฉด Worst Case ๋ฐœ์ƒ ๋นˆ๋„๊ฐ€ ๋†’์•„์ง„๋‹ค
    • Seperate Chaining์€ ํ•ด์‹œ ์ถœ๋Œ์„ ํ”ผํ•˜๋„๋ก ๋ณด์กฐ ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ์กฐ์ •ํ•˜๋ฉด Worse Case์— ๊ฐ€๊นŒ์›Œ์ง€๋Š” ๋นˆ๋„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.
  • Java 7์—์„œ HashMap์€ Seperate Chaining ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„๋˜์–ด์žˆ๋‹ค.

  • Seperate Chaining ๋ฐฉ์‹์€ ๋ณด์กฐ ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด์‹œ ์ถฉ๋Œ ๊ฐ€๋Šฅ์„ฑ์„ ์ค„์ธ๋‹ค.

  • Red Black Tree๋กœ ๊ตฌํ˜„ํ•œ Separate Chaining

    • ํ•˜๋‚˜์˜ hash bucket์— ํ•ด๋‹นํ•˜๋Š” LinkedList์˜ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ 8๊ฐœ๊ฐ€ ๋˜๋ฉด ๊ตฌ์กฐ๋ฅผ LinkedList -> Red-Black Tree ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. LinkedList ํƒ์ƒ‰ ์„ฑ๋Šฅ์˜ Worst Case๋Š” O(N) ์ด์ง€๋งŒ Red-Black Tree์˜ ๊ฒฝ์šฐ O(logN)์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

    • ํ•˜๋‚˜์˜ hash bucket์— ํ• ๋‹น๋œ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ 6๊ฐœ์ด๋ฉด ๊ตฌ์กฐ๋ฅผ Red-Black Tree -> LinkedList๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ์ ์œผ๋ฉด Red-Black Tree์™€ LinkedList๊ฐ„์˜ ํƒ์ƒ‰ ์„ฑ๋Šฅ ์ฐจ์ด๊ฐ€ ๊ฑฐ์˜ ์—†๊ณ , Red-Black Tree๋Š” LinkedList๋ณด๋‹ค ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๋งŽ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

    • ๋ณ€ํ™˜ ๊ธฐ์ค€์„ 6๊ฐœ, 8๊ฐœ ์ฒ˜๋Ÿผ 2๊ฐœ๋ผ๋Š” ์—ฌ์œ ๋ฅผ ๋‘์–ด ๊ณผ๋„ํ•œ ๊ตฌ์กฐ ๋ณ€ํ™˜์„ ๋ง‰์„ ์ˆ˜ ์žˆ๋‹ค. ๋งŒ์•ฝ ๋ณ€ํ™˜ ๊ธฐ์ค€์ด 6๊ฐœ , 7๊ฐœ ์ด๋ผ๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฐ˜๋ณต์ ์œผ๋กœ ์‚ฝ์ž…, ์‚ญ์ œ๋˜๋Š” ๊ฒฝ์šฐ ๋ถˆํ•„์š”ํ•œ LinkedList โ†” Tree ๋ณ€ํ™˜์ด ์ผ์–ด๋‚˜ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.

Open Addressing vs Seperate Chaining

Open Addressing Seperate Chaining
Worst Case O(M) O(M)
์บ์‹œ ํšจ์œจ ์ข‹๋‹ค (์—ฐ์†๋œ ๊ณต๊ฐ„์— ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค) Open Addressing ๋ณด๋‹ค ์ข‹์ง€ ์•Š๋‹ค (ํ•ด์‹œ ์ถฉ๋Œ์‹œ LinkedList์— ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค)
๊ณต๊ฐ„ ํšจ์œจ ์ข‹๋‹ค (ํ•ด์‹œ ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•œ ๊ฒฝ์šฐ์—๋„ probing์„ ํ†ตํ•ด ๋นˆ bucket์— ์ €์žฅ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค) Open Addressing ๋ณด๋‹ค ์ข‹์ง€ ์•Š๋‹ค (ํ•ด์‹œ ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•˜๋ฉด LinkedList์— ์ถ”๊ฐ€๋˜๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š” bucket์ด ์กด์žฌํ•œ๋‹ค)
Resizing ๋นˆ๋„ ๋†’๋‹ค (bucket ์‚ฌ์šฉ๋ฅ ์ด ๋†’์•„ load factor์˜ ์ž„๊ณ„์ ์— ์‰ฝ๊ฒŒ ๋„๋‹ฌํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค) Open Addressing ๋ณด๋‹ค ๋‚ฎ๋‹ค (bucket ์‚ฌ์šฉ๋ฅ ์ด ๋‚ฎ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค)

Dynamic Resizing

hash bucket์˜ ๊ฐœ์ˆ˜๊ฐ€ ์ ๋‹ค๋ฉด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ์„ ์•„๋‚„ ์ˆ˜ ์žˆ์ง€๋งŒ ํ•ด์‹œ ์ถฉ๋Œ๋กœ ์ธํ•ด ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ์ž๋ฐ”์˜ HashMap์€ load factor๊ฐ€ ์ž„๊ณ„์ (0.75)์„ ๋„˜์–ด๊ฐˆ ๊ฒฝ์šฐ์— hash bucket ๊ฐœ์ˆ˜๋ฅผ 2๋ฐฐ๋กœ ๋Š˜๋ฆฐ๋‹ค.