《【精品数据结构】Search.ppt》由会员分享,可在线阅读,更多相关《【精品数据结构】Search.ppt(41页珍藏版)》请在三一文库上搜索。
1、Search,Given: Distinct keys k1, k2, , kn and collection T of n records of the form (k1, I1), (k2, I2), , (kn, In) where Ij is the information associated with key kj for 1 = j = n. Search Problem: For key value K, locate the record (kj, Ij) in T such that kj = K. Searching is a systematic method for
2、locating the record(s) with key value kj = K.,Successful vs. Unsuccessful,A successful search is one in which a record with key kj = K is found. An unsuccessful search is one in which no record with kj = K is found (and presumably no such record exists).,Approaches to Search,1. Sequential and list m
3、ethods (lists, tables, arrays). 2. Direct access by key value (hashing) 3. Tree indexing methods.,Searching Ordered Arrays,Sequential Search Binary Search Dictionary Search,Lists Ordered by Frequency,Order lists by (expected) frequency of occurrence. Perform sequential search Cost to access first re
4、cord: 1 Cost to access second record: 2 Expected search cost:,Examples(1),(1) All records have equal frequency.,Examples(2),(2) Exponential Frequency,Zipf Distributions,Applications: Distribution for frequency of word usage in natural languages. Distribution for populations of cities, etc. 80/20 rul
5、e: 80% of accesses are to 20% of the records. For distributions following 80/20 rule,Self-Organizing Lists,Self-organizing lists modify the order of records within the list based on the actual pattern of record accesses. Self-organizing lists use a heuristic for deciding how to reorder the list. The
6、se heuristics are similar to the rules for managing buffer pools.,Heuristics,Order by actual historical frequency of access. (Similar to LFU buffer pool replacement strategy.) Move-to-Front: When a record is found, move it to the front of the list. Transpose: When a record is found, swap it with the
7、 record ahead of it.,Text Compression Example,Application: Text Compression. Keep a table of words already seen, organized via Move-to-Front heuristic. If a word not yet seen, send the word. Otherwise, send (current) index in the table. The car on the left hit the car I left. The car on 3 left hit 3
8、 5 I 5. This is similar in spirit to Ziv-Lempel coding.,Searching in Sets,For dense sets (small range, high percentage of elements in set). Can use logical bit operators. Example: To find all primes that are odd numbers, compute: 0011010100010100 This function is entirely dependent on the lower 4 bi
9、ts of the key. Mid-square method: Square the key value, take the middle r bits from the result for a hash table of 2r slots.,Examples (2),For strings: Sum the ASCII values of the letters and take results modulo M. int h(char* x) int i, sum; for (sum=0, i=0; xi != 0; i+) sum += (int) xi; return(sum %
10、 M); This is only good if the sum is large compared to M.,Examples (3),ELF Hash: From Executable and Linking Format (ELF), UNIX System V Release 4. int ELFhash(char* key) unsigned long h = 0; while(*key) h = (h 24; h ,Open Hashing,What to do when collisions occur? Open hashing treats each hash table
11、 slot as a bin.,Bucket Hashing,Divide the hash table slots into buckets. Example: 8 slots/bucket. Include an overflow bucket. Records hash to the first slot of the bucket, and fill bucket. Go to overflow if necessary. When searching, first check the proper bucket. Then check the overflow.,Closed Has
12、hing,Closed hashing stores all records directly in the hash table. Each record i has a home position h(ki). If another record occupies is home position, then another slot must be found to store i. The new slot is found by a collision resolution policy. Search must follow the same policy to find reco
13、rds not in their home slots.,Collision Resolution,During insertion, the goal of collision resolution is to find a free slot in the table. Probe sequence: The series of slots visited during insert/search by following a collision resolution policy. Let b0 = h(K). Let (b0, b1, ) be the series of slots
14、making up the probe sequence.,Insertion,/ Insert e into hash table HT template bool hashdict: hashInsert(const Elem ,Search,/ Search for the record with Key K template bool hashdict: hashSearch(const Key / K not in hash table ,Probe Function,Look carefully at the probe function p(). pos = (home + p(
15、getkey(e), i) % M; Each time p() is called, it generates a value to be added to the home position to generate the new slot to be examined. p() is a function both of the elements key value, and of the number of steps taken along the probe sequence. Not all probe functions use both parameters.,Linear
16、Probing,Use the following probe function: p(K, i) = i; Linear probing simply goes to the next slot in the table. Past bottom, wrap around to the top. To avoid infinite loop, one slot in the table must always be empty.,Linear Probing Example,Primary Clustering: Records tend to cluster in the table un
17、der linear probing since the probabilities for which slot to use next are not the same for all slots.,Improved Linear Probing,Instead of going to the next slot, skip by some constant c. Warning: Pick M and c carefully. The probe sequence SHOULD cycle through all slots of the table. Pick c to be rela
18、tively prime to M. There is still some clustering Ex: c=2, h(k1) = 3; h(k2) = 5. Probe sequences for k1 and k2 are linked together.,Pseudo-Random Probing(1),The ideal probe function would select the next slot on the probe sequence at random. An actual probe function cannot operate randomly. (Why?),P
19、seudo-Random Probing(2),Select a (random) permutation of the numbers from 1 to M-1: r1, r2, , rM-1 All insertions and searches use the same permutation. Example: Hash table size of M = 101 r1=2, r2=5, r3=32. h(k1)=30, h(k2)=28. Probe sequence for k1: Probe sequence for k2:,Quadratic Probing,Set the
20、ith value in the probe sequence as h(K, i) = i2; Example: M=101 h(k1)=30, h(k2) = 29. Probe sequence for k1 is: Probe sequence for k2 is:,Secondary Clustering,Pseudo-random probing eliminates primary clustering. If two keys hash to the same slot, they follow the same probe sequence. This is called s
21、econdary clustering. To avoid secondary clustering, need probe sequence to be a function of the original key value, not just the home position.,Double Hashing,p(K, i) = i * h2(K) Be sure that all probe sequence constants (h2(K) are relatively prime to M. This will be true if M is prime, or if M=2m a
22、nd the constants are odd. Example: Hash table of size M=101 h(k1)=30, h(k2)=28, h(k3)=30. h2(k1)=2, h2(k2)=5, h2(k3)=5. Probe sequence for k1 is: Probe sequence for k2 is: Probe sequence for k3 is:,Analysis of Closed Hashing,The load factor is a = N/M where N is the number of records currently in th
23、e table.,Deletion,Deleting a record must not hinder later searches. We do not want to make positions in the hash table unusable because of deletion.,Tombstones (1),Both of these problems can be resolved by placing a special mark in place of the deleted record, called a tombstone. A tombstone will not stop a search, but that slot can be used for future insertions.,Tombstones (2),Unfortunately, tombstones add to the average path length. Solutions: Local reorganizations to try to shorten the average path length. Periodically rehash the table (by order of most frequently accessed record).,