code projects

code projects
I'm working on these days

beluga a standard C compiler

beluga is a standard C compiler being developed based on an earlier version of lcc. It supports C90 (to be precise, ISO/IEC 9899:1990) as its ancestor does and is planned to extend the coverage to C99 (and C11 finally).

Compared to its parent, beluga carefully implements the language standard and thus provides production-quality diagnostics including caret diagnostics, range highlighting, typedef preservation and macro expansion tracking:

screenshot for enhanced front-end features

The generated code is not highly optimized, but satisfactory enough for daily use. (This is a hobby project; never easy for me alone to catch up production compilers like gcc and clang+llvm.)

beluga currently produces assembly output for x86 only (and uses an assembler from the target system). Thanks to its origin, however, it can be readily retargeted to other platforms. Support for 64-bit machines (like x86-64) requires new significant features to be implemented and is one of most important goals of this project.

Also I'm redesigning each part of the compiler aiming for better structure (e.g., see below for an integrated preprocessor) and have a plan to completely replace the back-end interface and implementation to ease adoptation of more ambitious optimization techniques mainly based on a CFG.

An integrated preprocessor

The preprocessor formerly developed as a separate executable under the name of sea-canary, has been integrated into the compiler. It reads source code and delivers tokens (not characters) to the compiler proper via a token stream (not via a temporary file). It is fairly fast, is correct enough to pass many complicated test cases, produces highly compact output and has rich diagnostics. For example, it catches, with -Wtoken-paste-order option, code that subtly depends on an unspecified evaluation order of the ## operator like this:

#define concat(x, y, z) x ## y ## z
concat(3.14e, -, f)    /* non-portable */

and, due to the line mapper shared by the compiler, it pinpoints problematic spots as precisely as possible:

range highlighting on sub-expression from macro expansion

The current version conforms to C90, but supports features like empty arguments and variadic macros introduced in C99 and widely used now.

How to install

Refer to INSTALL.md for installing beluga.

License

Refer to

ocelot a language extension library

ocelot is a collection of libraries to provide features that the C language lacks, various data structures that most programs use in common, and facilities for interaction between a program and its environment.

This package collects libraries into three categories called cbl, cdsl and cel. Libraries belonging to cbl(C basic library) provide features that the the language lacks and include alternative memory allocators and an exception handling facility. Those to cdsl(C data structure library) implement various data structures frequently used by most programs. Those to cel(C environment library) aid interaction with the execution environment.

The src directory contains sub-directories cbl, cdsl and cel for the libraries of each category:

  • cbl: C basic library
    • arena.h/c: arena library (lifetime-based memory allocator)
    • assert.h/c: assertion library
    • except.h/c: exception library
    • memory.h/c: memory library (for production)
    • memory.h/memoryd.c: memory library (for debugging)
    • text.h/c: text library (high-level string manipulation)
  • cdsl: C data structure library
    • bitv.h/c: bit-vector library
    • dlist.h/c: doubly-linked list library
    • dwa.h/c: double-word arithmetic library
    • hash.h/c: hash library
    • list.h/c: list library (singly-linked list)
    • set.h/c: set library
    • stack.h/c: stack library
    • table.h/c: table library
  • cel: C environment library
    • conf.h/c: configuration library (configuration file parser)
    • opt.h/c: option library (option parser)

Libraries had been documented with doxygen, and changed to use markdown for easier maintenance and access. The doc directory contains documentation for them.

As of the 0.4.0 release which breaks backward compatibility, the soname has been adjusted from 1.x to 0.x in order to match the release version.

How to install

Refer to INSTALL.md for installing ocelot.

License

Refer to

quokka an interactive file renamer

quokka is an interactive file renamer, which helps to rename multiple files in a systematic manner.

It provides a set of rules:

  • to change letter case (#case)
  • to delete characters at a specified position (#delete)
  • to change file extensions (#extension)
  • to insert a text into a specified position (#insert)
  • to remove a text (#remove)
  • to replace a text (#replace)
  • to serialize file names (#serialize)
  • to strip a set of characters off (#strip) and
  • to import lines of a file for insertion (#import)

with options for fine control. You can combine these rules as you want by adding them into the rule chain. Editing each rule and the rule chain is performed interactively as you do in a shell prompt. The following, for example, shows how to rename files' extensions to .node using quokka:

> #extension
entering '#extension'

#extension> change to node
file extensions will change to 'node'

#extension> preview
current rule being edited
-------------------------
change extensions to 'node' not using limit

files will be renamed as follows when you type 'done' and 'rename'
------------------------------------------------------------------
./alphanum.js  | ./alphanum.node
./global.js    | ./global.node
./mycolors.js  | ./mycolors.node
./validator.js | ./validator.node

#extension> done
files will be renamed as follows when you type 'rename'
-------------------------------------------------------
./alphanum.js  | ./alphanum.node
./global.js    | ./global.node
./mycolors.js  | ./mycolors.node
./validator.js | ./validator.node

> rename
files are being renamed
-----------------------
./alphanum.js  | ./alphanum.node  [ok]
./global.js    | ./global.node    [ok]
./mycolors.js  | ./mycolors.node  [ok]
./validator.js | ./validator.node [ok]

4 files successfully renamed
you need to 'reset' file list and rules after 'rename'

> exit

where > indicates a quokka's prompt and #extension before it shows the user is editing the #extension rule. Typing help lists what commands quokka accepts in general and in a specific rule mode. (In fact, quokka displays characters in color for better readability.)

Even if its source code contains some stuff related to MS Windows, it currently supports and is tested only for Unix-like environments. For now, nothing is guaranteed for MS Windows.

Among libraries used, alphanum.js has been modified to meet quokka's needs; it has been modified to behave in a more similar way to ls -v and to return the sorted array instead of nothing. If you need to replace that module with, say, a updated one, it is necessary to apply these changes properly.

###Usage Tips

A few useful tips follow below.

  1. Sort files in a natural order

    The -v option makes quokka behave in the same way as ls -v when sorting file names; it affects how numbers in file names are handled. Without the option, quokka performs lexicographic comparison which puts, say, img10 before img2 because 1 has a smaller code than 2 has. This looks natural to most (if not all) programmers, but ordinary users would like to place 10 after 2, which the -v option does.

  2. Control the sorting order

    quokka can accept file names to rename from an external file given through the -f option. For example, you can edit the file obtained from redirection of ls -t -1 (where -t for sorting by modification time and -1 for displaying only file names) and give it to quokka with the -f option.

  3. One-line multiple-command

    quokka is designed to accept multiple commands in a line. For example, you can change files' extensions to docx by this one-line input:

    > #extension change to docx done rename
    

    instead of these multiple lines:

    > #extension
    #extension> change to docx
    #extension> done
    > rename
    

    The thing is that the newline character does not differ from other white-spaces in separating commands.

  4. Names with embedded spaces

    The earlier versions of quokka used quotation for spaces embedded in file names. This approach made troubles with readline's auto-completion supported by node.js, and had me choose to escape spaces with a leading backslash. Since the backslash character is now used for escaping spaces, it is necessary to escape backslashes themselves. For example,

    #replace> replace \  .
    

    makes quokka replace a space with a period (note the space after \), and

    #strip> strip \\
    

    does quokka strip off all instances of \. In most cases, the smart auto-completion explained below helps you not to forget escaping spaces.

  5. Smart auto-completion

    The recent versions of quokka support the smart auto-completion that is smart in the sense that it is aware of the input context and suggests appropriate words. For example, pressing a tab key after HDTV when quokka expects arguments for the replace command shows every partial string starting with HDTV in file names to rename. This helps you to avoid annoying use of your mouse to copy characters from your terminal screen.

How to install

Refer to INSTALL.md for installing quokka.

License

Refer to

wcwidth.js a javascript porting of C's wcwidth()

wcwidth.js is a simple javascript porting of wcwidth() implemented in C by Markus Kuhn.

wcwidth() and its string version, wcswidth() are defined by IEEE Std 1002.1-2001, a.k.a. POSIX.1-2001, and return the number of columns used to represent a wide character and string on fixed-width output devices like terminals. Markus's implementation assumes wide characters to be encoded in ISO 10646, which is almost true for JavaScript; almost because JavaScript uses UCS-2 and has problems with surrogate pairs. wcwidth.js converts surrogate pairs to Unicode code points to handle them correctly.

Following the original implementation, this library defines the column width of an ISO 10646 character as follows:

  • the null character (U+0000) has a column width of opts.null (whose default value is 0);
  • other C0/C1 control characters and DEL will lead to a column width of opts.control (whose default value is 0);
  • non-spacing and enclosing combining characters (general category code Mn or Me) in the Unicode database) have a column width of 0;
  • SOFT HYPHEN (U+00AD) has a column width of 1;
  • other format characters (general category code Cf in the Unicode database) and ZERO WIDTH SPACE (U+200B) have a column width of 0;
  • Hangul Jamo medial vowels and final consonants (U+1160-U+11FF) have a column width of 0;
  • spacing characters in the East Asian Wide (W) or East Asian Full-width (F) category as defined in Unicode Technical Report #11 have a column width of 2; and
  • all remaining characters (including all printable ISO 8859-1 and WGL4 characters, Unicode control characters, etc.) have a column width of 1.

A surrogate high or low value which constitutes no pair is considered to have a column width of 1 according to the behavior of widespread terminals.

See the documentation from the C implementation for details.

wcwidth.js is simple to use:

var wcwidth = require('wcwidth.js')

wcwidth('한글')    // 4
wcwidth('\0')      // 0; NUL
wcwidth('\t')      // 0; control characters

If you plan to replace NUL or control characters with, say, ??? before printing, use wcwidth.config() that returns a closure to run wcwidth with your configuration:

var mywidth = wcwidth.config({
    nul:     3,
    control: 3
})

mywidth('\0\f')      // 6
mywidth('한\t글')    // 7

Setting these options to -1 gives a function that returns -1 for a string containing an instance of NUL or control characters:

mywidth = wcwidth.config({
    nul:     0,
    control: -1
})

mywidth('java\0script')    // 10
mywidth('java\tscript')    // -1

This is useful when detecting if a string has non-printable characters.

Due to the risk of monkey-patching, no String getter is provided anymore. Even if discouraged, you can still monkey-patch by yourself as follows:

String.prototype.__defineGetter__('wcwidth', function () {
    return wcwidth(this);
})
'한글'.wcwidth    // 4

JavaScript has no character type, thus meaningless to have two versions of wcwidth while POSIX does for C. wcwidth also accepts a code value obtained by charCodeAt():

wcwidth('한')                  // prints 2
wcwidth('글'.charCodeAt(0))    // prints 2

How to install

Refer to INSTALL.md for installing wcwidth.js.

License

Refer to

ontime a human-readable cron

ontime is a cron-like job scheduler with readable time expressions.

For example, the following code invokes the given function on 4:30AM and 9AM every day (i.e., twice a day).

var ontime = require('ontime')

ontime({
    cycle: [ '04:30:00', '9:00:00' ]
}, function (ot) {
    // do your job here
    ot.done()
    return
})

It supports:

  • to describe jobs that should get done yearly, monthly, weekly, daily, every minute, every second or on specified times;
  • to skip running a job based on a specified step; e.g., to run it every 2 weeks;
  • to use a local time or UTC;
  • to track the last day of a month; possible to run a job on the last day of every month; and
  • to wait for the currently running job to get finished, which ensures that at most only one instance of your job be running at a time

but does not yet support:

Options

Options to ontime control the cycle of a job, choose between a local time and UTC, enable to keep track of the last day of a month and so on. In explaining options, each section header shows the option it explains and its default value in parentheses.

Time expressions (cycle: '')

ontime determines the cycle of a job based on the format of time expressions. The time expression basically has the form of an ISO-8601 Date Format, YYYY-MM-DDThh:mm:ss where YYYY indicates a year, MM a month, DD a day of the month, hh an hour, mm a minute and ss a second, except that:

  • A unit can be omitted only when units greater than that are also omitted, which means the day part(DD) cannot be omitted unless the year and month parts(YYYY-MM-) are. This makes ontime's time expression differ from the original ISO-8601 format because the later allows smaller units to be omitted in times. For example, 12 and 12:00 denote hh and hh:mm respectively in the ISO-8601 format while ss and mm:ss respectively in the ontime's format;
  • A space can be used to separate the time part from the date part instead of T as in 2010-01-09 11:00:00; and
  • ontime allows digits not to be zero-padded; for example, it accepts 2014-5-4T0:0:0 as well as 2014-05-04T00:00:00.

The time expression is given to ontime through the cycle option. You can give a single expression of the string type like '01-01T12:00:00' or multiple ones as an array of strings like [ '01-01T12:00:00', '7-1T0:0:0' ].

Yearly jobs

The year part(YYYY-) should be omitted to specify yearly jobs.

ontime({
    cycle: '2-9T00:00:00'
}, function (ot) {
    console.log('my birthday!')
    ot.done()
    return
})

This code prints on February 9 every year.

Note how the last day of February is handled on a leap year. If you set the time expression to February 29 as in '2-29T00:00:00', the job will be triggered only in leap years. See the keepLast option to change this behavior.

Monthly jobs

The year and month parts(YYYY-MM-) should be omitted for monthly jobs.

ontime({
    cycle: [ '1T12:00:00', '15T12:00:00' ]
}, function (ot) {
    console.log('review the project')
    ot.done()
    return
})

This code prints on the 1th and 15th days of each month.

Note how the last day of a month is handled. If you set the time expression to the 31th day as in 31 23:59:59, the job will run only on January, March, May, July, August, October and December since others have no 31th day. Use the keepLast option to change this behavior.

Daily jobs

The whole date part(YYYY-MM-DDT) should be omitted for daily jobs; note that the separator T should be also dropped.

ontime({
    cycle: '12:00:00'
}, function (ot) {
    console.log('lunch time!')
    ot.done()
    return
})

This code prints on noon every day.

Weekly jobs

Weekly jobs have a different format to specify a day of a week.

ontime({
    cycle: [ 'Sunday 12:00:00', 'sat 12:00:00' ]
}, function (ot) {
    console.log('weekend!')
    ot.done()
    return
})

This code prints on Saturday and Sunday every week.

Hourly jobs

The date and hour parts(YYYY-MM-DDThh:) should be omitted for hourly jobs.

ontime({
    cycle: [ '00:00', '30:00' ]
}, function (ot) {
    console.log('30 mins to next run')
    ot.done()
    return
})

This code prints every 30 minutes (twice an hour).

Jobs on every minute

By omitting all units except for seconds, a job can be invoked every minute.

ontime({
    cycle: [ '10', '30', '50' ]
}, function (ot) {
    console.log('20 secs to next run')
})

This code prints on the 10th, 30th and 50th seconds of every minute.

Jobs on every second

An empty string denotes jobs that get started every second.

var count = 0

ontime({
    cycle: '',
}, function (ot) {
    console.log(++count)
    ot.done()
    return
})

This counts up every second.

Jobs on specified times

You can trigger your job on explicitly specified times.

ontime({
    cycle: [ '2100-1-9 9:00:00',
             '2200-1-9 9:0:0' ]
}, function (ot) {
    console.log('what is this day?')
    ot.done()
    return
})

This prints on 9AM of 9 January 2100 and the same time of 2200 if you're using node.js until then.

Mixing different cycles

In order to keep the model and the interface simple, a single type of job cycle is allowed for each invocation of ontime. Mixing different cycle types can be achieved by introducing multiple invocations to ontime as in:

ontime({
    cycle: [ '01-09 11:30:00',        // yearly
             'Saturday 12:00:00' ]    // weekly
}, job)    // mixed types of cycle result in error

ontime({
    cycle: '01-09 11:30:00'    // yearly
}, job)
ontime({
    cycle: 'Sat 12:0:0'    // weekly
}, job)

Skipping steps (step: 1)

The step option enables a job to be skipped periodically. Setting it to n forces ontime to skip a given job n-1 times after a run, which leads to launching the job eveny n cycles.

ontime({
    cycle:    '31T00:00:00',
    keepLast: true,
    step:     3
}, function (ot) {
    console.log('every 3 months')
    ot.done()
    return
})

This prints on the last day of a month every three months.

Note how this option interacts with the single option.

A local time vs. UTC (utc: false)

Setting the utc option to true changes ontime to interpret the time expressions as UTC.

This is useful

  • when you cannot be sure of what the time zone on your system is; and
  • when you do not want to miss or run twice your job when the time shifts back or forward an hour for DST.

Preserving a single instance (single: false)

ontime launches a job on its scheduled time. If the job takes longer than the time interval of the cycle, more than one instance of the job may run at the same time. The single option keeps another instance of a job from starting if there is already a running one.

To be precise, with single set to false, ontime schedules the next run at the start of the current run. Changing that to true has the next run scheduled when the ot.done() method is invoked by a user.

The following two diagrams show the difference, where labelled | and + denote time spots to start new instances, and * indicates their execution.

ontime({
    cycle:  [ A, B ],
    single: false    // default
}, function (ot) {
    // ...
    ot.done()
    return
})

     A          A    B     A         BA
- - -|----------|----+-----|---------+|- - -
     *************
                *********
                     ****
                           *****

ontime({
    cycle:  [ A, B ],
    single: true
}, function (ot) {
    // ...
    ot.done()
    return
})

     A          A    B     A         BA
- - -|----------|----+-----|---------+|- - -
     *************   *********       ****

Exclusiveness of job execution is guaranteed only within a single invocation to ontime. Two difference invocations to ontime cannot interpose with each other.

Keeping the last day of a month (keepLast: false)

It is sometimes necessary to run a job on the last day of each month, which have been replaced with doing it on the first day of each month instead. By setting the keepLast option to true, ontime automatically adjusts the date part(DD) to the last day according to the value of the month part(MM) if necessary.

ontime({
    cycle:    '31T10:00:00',
    keepLast: true
}, function (ot) {
    console.log('the last day')
    ot.done()
    return
})

This code prints on the 31th day of a month when the month has the 31th day, on the 28th or 29th when February, or on the 30th day otherwise. Another example goes for yearly jobs:

ontime({
    cycle:    '2-29T10:00:00',
    keepLast: true
}, function (ot) {
    console.log('the last day of Feb')
    ot.done()
    return
})

This code prints on 29 February on a leap year and on 28 February otherwise.

Logging messages (log: false)

ontime has a very simple form of logging that is useful when checking if your configuration works as intended. It can be turned on by setting the log option to true.

Methods

A job function should be defined as to accept at least one argument that is referred to as ot in this document. The argument contains these methods:

  • ot.done(): should be called after the job has been finished. This is important especially when single is set to true because scheduling the next run is done in the method.
  • ot.cancel(): clears timers for scheduling jobs that the ontime instance knows. This does not terminate the current execution of a job; you still need to call ot.done() for that purpose.

How to install

Refer to INSTALL.md for installing ontime.

License

Refer to

canary a music streaming server/client

canary is a package of a music streaming server and its companion iOS client that run upon DAAP. Employing DAAP for streaming and mDNS/DNS-SD for service advertisement let canary work perfectly with iTunes.

This document explains the server. See the files in the client directory for the client.

The server supports, among other things:

  • iTunes as a client,
  • rescan of songs based on a schedule,
    • it cleverly does nothing unless files or directories change
  • authorization via a password,
  • delievery of mp3/ogg files if your client can play them and
  • multiple paths to contain your songs

but does not support yet:

  • adding or editing smart playlists and
  • what I don't know yet but you do
    • please let me know about them!

The initial scan of songs is fairly fast thanks to high performance of the music-metadata module; 7 mins with 4,500+ songs on my Gentoo machine with Intel Atom D525, 4GB RAM and a 5400-rpm HDD. Once the database has been built, rescanning is even faster; 30 secs on the same condition. The server remembers the mtime, modification time of files and reads only added or modified files.

Prerequisites

canary can run with avahi or dns-sd, or launch its own instance of mDNS/DNS-SD service implemented in pure JavaScript (node-mdns-js) when you have neither installed.

Having more than one instance of mDNS/DNS-SD service on the same machine confuses the service to prevent it from properly working.

The value for mdns in server.conf (see below) chooses a service for mDNS publication.

  • avahi: avahi-publish-service is probed to execute;
  • dns-sd: dns-sd is probed to execute;
  • mdns-js: mdns-js is launched without probing the two above;
  • auto: canary tries to execute either of avahi-publish-service or dns-sd, and launches mdns-js on failure. This is the default behavior;
  • off: no service advertisement activated.

If your system have avahi or dns-sd, please make sure that avahi-publish-service or dns-sd is accessible not specifying a path from the location canary runs.

Whenever avahi or dns-sd fails to start, mdns-js is selected as a fallback.

If you are not able to get the service advertisement to work with any of these options, please let me know to help you.

Configuration

Two configuration files need to be provided for the server, one for its database and the other for the server itself.

The server configuration, config/server.json looks like:

{
    "name":     "canary music",
    "port":     3689,
    "runAs": {
        "uid": "userid",
        "gid": "groupid"
    },
    "password": "password",
    "scan": {
        "path":  [ "/path/to/mp3/files" ],
        "cycle": [ "17:00:00" ],
        "utc":   false
    },
    "mdns":  "auto",
    "debug": false
}
  • the name of the server will be published and broadcast via Avahi. If your client knows DNS-SD, it will appear on it;
  • the server will run on port; it must be set to the default port 3689 for iTunes to work with the server;
  • runAs, if specified, makes the server drop privileges by changing its uid and gid to the given ones, which is useful when the server initially runs as root, for example, by an init.d script. If not specified, running the server as root will be warned;
  • if password is a non-empty string, the server requires a client to send the password on every request. This, for example, forces iTunes ask a password on its initial connection to the server;
  • scan specifies the schedule for rescanning files:
    • path is an array of directories of music files to serve;
    • cycle and utc: clear what these mean from their names but you can refer to ontime for how to specify the rescanning schedule. canary-server accepts other options for ontime except single that is always set to true;
  • mdns selects a service for mDNS advertisement. Possible values are auto, avahi, dns-sd, mdns-js and off. See Prerequisites section above;
  • debug controls the server's log level. Setting this to true makes the server verbose.

config/db.json contains:

{
    "host":          "localhost",
    "port":          27017,
    "db":            "canary",
    "user":          "user",
    "password":      "password",
    "reconnectTime": 2
}

The options from host to password inclusive specify basic information for DB connection. If no authentication is required, user and password can be omitted.

reconnectTime specifies a time interval in seconds for which the server waits before trying to reconnect when disconnected from the DB.

How to run

As other node.js programs, you can run canary-server with

node server.js -c config/

where the -c option (or --config) specifies a configuration directory the server will use.

Clients tested

The following DAAP clients have been tested with canary-server. If your favorite client is not on the list or does not work with the server, please open a new issue to describe the problem concisely.

Help needed

canary-server is implemented in a very short time. It already works well but needs many improvements that include, but not limited to:

  • support for other DBs, especially MySQL and sqlite; MongoDB is getting popular but there are still many who don't have or are not familiar with it;
  • testing for files with various and sometimes weird meta data; metadata of my files are normalized so not enough samples to push the server's metadata handling.

How to install

Refer to INSTALL.md for installing canary.

License

Refer to