Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    A suite of greedy methods for set cover computation
    Lim, Ching Lih ( 2015)
    The Set Cover problem is simple to describe: if a collection of sets is given, each containing a subset of a universe of elements, the challenge is to find a smallest (cardinality) sub-collection such that the union of the sets in the sub-collection is equal to the universe. Since this problem is NP-hard, there is no known efficient exact algorithm for non-trivial problem instances. An approximation algorithm is thus employed to construct a nearly optimal solution within reasonable resource bounds. One such algorithm is the greedy approach, which repeatedly adds the set with the greatest number of uncovered elements to the solution until every element in the universe is covered. In this thesis we make several contributions to Set Cover and the greedy approach. First, the approach can be implemented with a priority queue to keep track of set sizes so as to always be able to quickly find a set with the most uncovered elements. However, this implementation is costly in one aspect as some or all sets in the queue must be updated to their correct sizes after adding one set. A variant of the approach, Lazy-Greedy, is therefore proposed to evaluate sets lazily, reducing that cost. We show that while the worst-case time complexity is worsened, on typical input instances the revised approach implementation is faster than the standard eager greedy implementation, while retaining the same approximation ratio of the greedy method. Second, to process a large instance on secondary storage efficiently, Lazy-Greedy is re-engineered to break an input into a logarithmic number of buckets of related set sizes, and to process each bucket lazily. It has two innovations: automated management of a large number of buckets, and a fast search for a particular bucket. Although the algorithm handles a number of large instances efficiently, it runs more slowly than one benchmark disk-friendly algorithm that has a weaker approximation ratio. Finally, we propose two input-processing algorithms of the same following task given that some sets in a typical problem instance have hapax legomena (singleton elements). These sets must appear in every valid solution, including the optimal solution, because the hapax legomena will not be covered otherwise. A partial solution of such sets can be pre-built, with remaining sets and elements passed to a Set Cover algorithm to solve and complete the solution. The in-memory algorithm of counting element occurrences produces complete solutions faster than the direct approach of solving without the pre-processing. The disk-oriented algorithm of transforming an instance into a list of set-element items solves slower than a semi-external covering algorithm due to the relatively expensive cost of sorting the list.