tem.Linq.Enumerable
2
, as a result DSA does not provide implementations of
1
http://www.microsoft.com/NET/
2
http://msdn.microsoft.com/en-us/library/system.linq.enumerable_members.aspx
44
CHAPTER 5. SETS
45
Figure 5.1: a) A ∩ B; b) A ∪ B
these algorithms. Most of the algorithms defined in System.Linq.Enumerable
deal mainly with sequences rather than sets exclusively.
Set union can be implemented as a simple traversal of both sets adding each
item of the two sets to a new union set.
1) algorithm Union(set1, set2)
2)
Pre: set1, and set2 6= ∅
3)
union is a set
3)
Post: A union of set1, and set2 has been created
4)
foreach item in set1
5)
union.Add(item)
6)
end foreach
7)
foreach item in set2
8)
union.Add(item)
9)
end foreach
10)
return union
11) end Union
The run time of our Union algorithm is O(m + n) where m is the number
of items in the first set and n is the number of items in the second set. This
runtime applies only to sets that exhibit O(1) insertions.
Set intersection is also trivial to implement. The only major thing worth
pointing out about our algorithm is that we traverse the set containing the
fewest items. We can do this because if we have exhausted all the items in the
smaller of the two sets then there are no more items that are members of both
sets, thus we have no more items to add to the intersection set.
CHAPTER 5. SETS
46
1) algorithm Intersection(set1, set2)
2)
Pre: set1, and set2 6= ∅
3)
intersection, and smallerSet are sets
3)
Post: An intersection of set1, and set2 has been created
4)
if set1.Count < set2.Count
5)
smallerSet ← set1
6)
else
7)
smallerSet ← set2
8)
end if
9)
foreach item in smallerSet
10)
if set1.Contains(item) and set2.Contains(item)
11)
intersection.Add(item)
12)
end if
13)
end foreach
14)
return intersection
15) end Intersection
The run time of our Intersection algorithm is O(n) where n is the number
of items in the smaller of the two sets. Just like our Union algorithm a linear
runtime can only be attained when operating on a set with O(1) insertion.
5.1
Unordered
Sets in the general sense do not enforce the explicit ordering of their mem-
bers. For example the members of B = {6, 2, 9} conform to no ordering scheme
because it is not required.
Most libraries provide implementations of unordered sets and so DSA does
not; we simply mention it here to disambiguate between an unordered set and
ordered set.
We will only look at insertion for an unordered set and cover briefly why a
hash table is an efficient data structure to use for its implementation.
5.1.1
Insertion
An unordered set can be efficiently implemented using a hash table as its backing
data structure. As mentioned previously we only add an item to a set if that
item is not already in the set, so the backing data structure we use must have
a quick look up and insertion run time complexity.
A hash map generally provides the following:
1.
O(1) for insertion
2.
approaching O(1) for look up
The above depends on how good the hashing algorithm of the hash table
is, but most hash tables employ incredibly efficient general purpose hashing
algorithms and so the run time complexities for the hash table in your library
of choice should be very similar in terms of efficiency.
CHAPTER 5. SETS
47
5.2
Ordered
An ordered set is similar to an unordered set in the sense that its members are
distinct, but an ordered set enforces some predefined comparison on each of its
members to produce a set whose members are ordered appropriately.
In DSA 0.5 and earlier we used a binary search tree (defined in §3) as the
internal backing data structure for our ordered set. From versions 0.6 onwards
we replaced the binary search tree with an AVL tree primarily because AVL is
balanced.
The ordered set has its order realised by performing an inorder traversal
upon its backing tree data structure which yields the correct ordered sequence
of set members.
Because an ordered set in DSA is simply a wrapper for an AVL tree that
additionally ensures that the tree contains unique items you should read §7 to
learn more about the run time complexities associated with its operations.
5.3
Summary
Sets provide a way of having a collection of unique objects, either ordered or
unordered.
When implementing a set (either ordered or unordered) it is key to select
the correct backing data structure. As we discussed in §5.1.1 because we check
first if the item is already contained within the set before adding it we need
this check to be as quick as possible. For unordered sets we can rely on the use
of a hash table and use the key of an item to determine whether or not it is
already contained within the set. Using a hash table this check results in a near
constant run time complexity. Ordered sets cost a little more for this check,
however the logarithmic growth that we incur by using a binary search tree as
its backing data structure is acceptable.
Another key property of sets implemented using the approach we describe is
that both have favourably fast look-up times. Just like the check before inser-
tion, for a hash table this run time complexity should be near constant. Ordered
sets as described in 3 perform a binary chop at each stage when searching for
the existence of an item yielding a logarithmic run time.
We can use sets to facilitate many algorithms that would otherwise be a little
less clear in their implementation. For example in §11.4 we use an unordered
set to assist in the construction of an algorithm that determines the number of
repeated words within a string.
Chapter 6
Queues
Queues are an essential data structure that are found in vast amounts of soft-
ware from user mode to kernel mode applications that are core to the system.
Fundamentally they honour a first in first out (FIFO) strategy, that is the item
first put into the queue will be the first served, the second item added to the
queue will be the second to be served and so on.
A traditional queue only allows you to access the item at the front of the
queue; when you add an item to the queue that item is placed at the back of
the queue.
Historically queues always have the following three core methods:
Enqueue:
places an item at the back of the queue;
Dequeue:
retrieves the item at the front of the queue, and removes it from the
queue;
Peek:
1
retrieves the item at the front of the queue without removing it from
the queue
As an example to demonstrate the behaviour of a queue we will walk through
a scenario whereby we invoke each of the previously mentioned methods observ-
ing the mutations upon the queue data structure. The following list describes
the operations performed upon the queue in Figure 6.1:
1.
Enqueue(10)
2.
Enqueue(12)
3.
Enqueue(9)
4.
Enqueue(8)
5.
Enqueue(3)
6.
Dequeue()
7.
Peek()
1
This operation is sometimes referred to as Front
48
CHAPTER 6. QUEUES
49
8.
Enqueue(33)
9.
Peek()
10.
Dequeue()
6.1
A standard queue
A queue is implicitly like that described prior to this section. In DSA we don’t
provide a standard queue because queues are so popular and such a core data
structure that you will find pretty much every mainstream library provides a
queue data structure that you can use with your language of choice. In this
section we will discuss how you can, if required, implement an efficient queue
data structure.
The main property of a queue is that we have access to the item at the
front of the queue. The queue data structure can be efficiently implemented
using a singly linked list (defined in Dostları ilə paylaş: |