The Set Library  0.2.1
C Data Structure Library: Set Library
Version:
0.2.1
Author:
Jun Woong (woong.jun at gmail.com)
Date:
last modified on 2013-04-23

Introduction

This document specifies the Set Library which belongs to the C Data Structure Library. The basic structure is from David Hanson's book, "C Interfaces and Implementations." I modifies the original implementation to make it more appropriate for my other projects and to enhance its readibility; for example a prefix is used more strictly in order to avoid the user namespace pollution.

Since the book explains its design and implementation in a very comprehensive way, not to mention the copyright issues, it is nothing but waste to repeat it here, so I finish this document by giving introduction to the library; how to use the facilities is deeply explained in files that define them.

The Set Library reserves identifiers starting with set_ and SET_, and imports the Assertion Library (which requires the Exception Handling Library) and the Memory Management Library.

How to Use The Library

The Set Library implements sets whose concept does not differ from that in mathematics. You can also regard it as a table where values do not have associated keys and act as keys by themselves. Since a value in a set is also a key, two instances of the same value refer to the same element in the set. The storage used to maintain a set itself is managed by the library, but any storage allocated for data stored in sets should be managed by a user program.

Similarly for other data structure libraries, use of the Set Library follows this sequence: create, use and destroy. Except for functions to inspect sets, all other functions do one of them in various ways.

set_new() that creates an empty set takes three unusual arguments. The first one is a hint for the expected length of the set it will create, and the other two are to specify user-defined functions that perform creation and comparison of hash values to represent members. Some important conditions that those functions have to satisfy are described in set_new().

In general, a null pointer given to an argument expecting a set is considered an error which results in an assertion failure, but the functions for set operations (set_union(), set_inter(), set_minus() and set_diff()) take a null pointer as a valid argument and treat it as representing an empty set. Also note that they always produce a distinct set; none of them alters the original set.

Boilerplate Code

Using a set starts with creating one using set_new(). As explained in the function, it is important to provide three arguments properly. If members to a set are generated by the Hash Library, the second and third arguments can be granted null pointers, which lets internal default functions used for the set. There are other ways to create sets from an existing set with set_union(), set_inter(), set_minus() and set_diff() (getting a union, intersection, difference, symmetric difference of sets, respectively); since this library does not define the concept of a universe, no support for a complement. All set creation functions allocate storage for a set and if no allocation is possible, an exception is raised instead of returning a failure indicator like a null pointer.

Once a set created, a member can be added to and removed from a set using set_put() and set_remove(). Adding a member to a set also entails memory allocation, and thus an exception can be raised. set_member() inspects if a set contains a specific member, and set_length() gives the number of members in a set, a.k.a. the length of a set.

There are two ways to apply some operations on every member in a set; set_map() takes a user-defined function and calls it for each of members, and set_toarray() converts a set into a dynamic arrays. Storage for the generated array is allocated by the library (thus, an exception is possible again), but a user program is responsible for releasing the storage when the array is no longer necessary.

set_free() takes a set and releases the storage used to maintain it. Note that any storage allocated by a user program to contain or represent members is not deallocated by the library.

As an example, the following code creates two sets (whose expected length is set to 20 and members are generated by the Hash Library) and those sets have input characters as members by turns. It then obtains from them a union and an intersection and enumerates members in the resultsing sets.

      int c;
      char b;
      unsigned i = 0;
      void **pa, **pb;
      set_t *myset1, *myset2, *u, *t;

      myset1 = set_new(20, NULL, NULL);
      myset2 = set_new(20, NULL, NULL);

      while ((c = getchar()) != EOF) {
          b = c;
          set_put((i++ % 2 == 0)? myset1: myset2, hash_new(&b, 1));
      }

      u = set_union(myset1, myset2);
      t = set_inter(myset1, myset2);

      set_free(&myset1);
      set_free(&myset2);

      pa = set_toarray(u, NULL);
      printf("union: ");
      for (pb = pa; *pb; pb++)
          printf("%c", *(char *)*pb);
      MEM_FREE(pa);

      pa = set_toarray(t, NULL);
      printf("\nintersection: ");
      for (pb = pa; *pb; pb++)
          printf("%c", *(char *)*pb);
      MEM_FREE(pa);

      hash_reset();
      set_free(&u);
      set_free(&t);

where hash_new() and hash_reset() come from the Hash Library, and MEM_NEW() and MEM_FREE() from the Memory Management Library.

Things to note include:

Future Directions

Storing Hash Numbers

Modifying the data structure for sets to have hash numbers explicitly makes it possible for a user-provided hashing function to be called only once for each member in sets and for a user-provided comparison function to be called only when the hash numbers differ, which leads to the performance improvement.

Improvement on Set Operations

Set operations like set_union(), set_inter(), set_minus() and set_diff() can be improved when two sets on which the operations are performed have the same number of buckets by applying the operations to each pair of corresponding buckets.

Contact Me

Visit http://code.woong.org to get the lastest version of this library. Only a small portion of my homepage (http://www.woong.org) is maintained in English, thus one who is not good at Korean would have difficulty when navigating most of other pages served in Korean. If you think the information you are looking for is on pages written in Korean, do not hesitate to send me an email to ask for help.

Any comments about the library are welcomed. If you have a proposal or question on the library just email me, and I will reply as soon as possible.

Copyright

I do not wholly hold the copyright of this library; it is mostly held by David Hanson as stated in his book, "C: Interfaces and Implementations:"

Copyright (c) 1994,1995,1996,1997 by David R. Hanson.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

For the parts I added or modified, the following applies:

Copyright (C) 2009-2013 by Jun Woong.

This package is a set implementation by Jun Woong. The implementation was written so as to conform with the Standard C published by ISO 9899:1990 and ISO 9899:1999.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

 All Files Functions Typedefs