C Library Experiment

This document describes an investigation into replacing the existing C library in L4Re with another one.

  1. Providing Another C Library
  2. Porting Newlib
    1. Source File Specialisation
    2. Build Modes
    3. Program Initialisation
    4. Wildcards, Inputs and Outputs
    5. Mathematical Routines
    6. Functionality Alternatives
    7. Problematic Files
    8. Configuration Settings
    9. System-Specific Resources
    10. IPC and C Library Dependencies
  3. Package Manifest

L4Re provides a C library based on uClibc across a collection of different packages:

Package Purpose
pkg/l4re-core/uclibc C library distribution and core support
pkg/l4re-core/uclibc-headers C library header files
pkg/l4re-core/uclibc-minimal Selected C library functionality
pkg/l4re-core/libc_backends Integration with system mechanisms
pkg/libc_be_stdin Standard input support

Although some integration with the underlying system is done in the libc_backends package, a substantial amount of customisation is also done in the uclibc package itself.

One notable file is __uClibc_main.c (found in pkg/l4re-core/uclibc/lib/contrib/uclibc/libc/misc/internals), defining various program initialisation routines:

Routine Purpose
__uClibc_init Library initialisation
__uClibc_main Program initialisation
__uClibc_fini Program finalisation

The __uClibc_init function may be called from the dynamic loader, but it is also called by __uClibc_main to perform initialisation in situations where the dynamic loader is not used. The __uClibc_main function is called by crt1.S, provided for each architecture in uClibc, seemingly introduced into executables by resources provided by the pkg/l4re-core/ldscripts package.

(A useful introduction to crt1.S can be found at pkg/l4re-core/uclibc/lib/contrib/uclibc/docs/crt.txt in the L4Re distribution as part of uClibc. The accompanying PORTING file is also informative.)

Providing Another C Library

It is worth investigating other C library implementations to see if they are organised in a more understandable way. In principle, a lot of C library functionality should be fairly generic, with only the parts interacting with underlying system mechanisms needing specialisation.

A few implementations were considered:

The GNU C Library (glibc) and uClibc-ng were briefly but not seriously considered.

The criteria for evaluation included the following:

Practical efforts to construct L4Re packages were attempted with dietlibc, musl-libc and Newlib. Since musl-libc's internal characteristics were regarded as potentially troublesome, efforts continued on building packages providing limited C library functionality using dietlibc and Newlib. Eventually, a decision was taken to concentrate solely on Newlib.

Porting Newlib

Newlib employs automake resources to describe the essence of the eventual Makefiles that are generated for building the software. Although automake resources are arguably unsuitable for L4Re packages, their descriptions of the build inputs and products are fairly concise and informative for the construction of appropriate Makefiles for the L4Re build system.

Thus, as has been done in the adaptation of the libext2fs software to L4Re, it becomes feasible to consider writing dedicated build resources for L4Re, partly to maintain technological cohesion with many of the other packages, also to exercise control over the functionality that will be made available. The aim was to start with a minimal subset of Newlib, demonstrating the fundamental mechanisms, showing working programs, and validating the exercise.

As can be seen in the pkg/libc_newlib package, a number of techniques are employed to reproduce Newlib's original build system behaviour and to generate appropriately configured library files.

Source File Specialisation

A single file such as stdio/vfprintf.c is used to produce several different objects, each configured in a different way through the use of compiler definitions. To integrate this technique with the L4Re build system, special Makefile rules were added to create extra source files such as stdio/svfprintf.c (purely by copying each "master" file), with each of these extra source files being mentioned in the normal source file variables (such as SRC_C) handled by the build system.

Accompanying the file-copying rule for each generated source file is a file-specific definition that customises the compilation of that file. This provides the means of configuration needed to generate an object from a "master" file that is different from other objects originating from that same file. Consequently, the build system generates a list of object files that includes all such extra objects, employs its own rules to request their compilation, and customises those rules by applying the file-specific definition provided for each corresponding source file.

For example:

stdio/svfprintf.c: stdio/vfprintf.c
        $(CP) $< $@

CPPFLAGS_stdio/svfprintf.c += -DSTRING_ONLY -I$(THISDIR)/stdio

It would appear that special rules such as these must appear after the build system include statements. Otherwise, the build system mechanisms appear to be disrupted.

Build Modes

In the src/l4/mk directory within the L4Re distribution, a file called modes.inc defines a number of build modes that determine how files are to be compiled and linked. A few distinctions are made between different modes:

Clearly, it is not desirable to make a replacement C library dependent on the existing C library. Nor should programs built to use the replacement C library link against the existing C library. Consequently, the following build modes were added:

Mode Properties
minimal Statically link to l4re only; employ GCC headers
libminimal Avoid linking to the existing C library
static_newlib Statically link to the replacement C library

The minimal mode was used to investigate the limits of linkage restrictions and whether it was possible to build and run a simple "Hello world" program using only low-level L4Re facilities.

The libminimal mode is used to build the replacement C library itself. Meanwhile, the static_newlib mode is used to build programs against the replacement C library.

A revised modes.inc file can be found in the pkg/libc_newlib/mk directory.

Program Initialisation

Since infrastructure already exists in L4Re for launching programs, and since such infrastructure appears to rely on the presence of __uClibc_main and other such functions (discussed above), a temporary measure to permit the building and use of programs using the replacement C library involves defining an extra file containing those functions.

(In the pkg/test_newlib package, a file called crt0.c is used to provide the functions.)

Ultimately, the inclusion of these initialisation functions will be automated, presumably by modifying the pkg/l4re-core/ldscripts package.

Wildcards, Inputs and Outputs

The L4Re build system appears to favour explicitly-named source files in the SRC variables. However, it is not particularly convenient to list all Newlib source files explicitly. Consequently, wildcard expansion is used to select files from subdirectories. For example...

$(wildcard $(THISDIR)/stdio/*.c)

...selects the C source files from the pkg/libc_newlib/libc/stdio directory.

One complicating factor appears to be the current directory when the Makefile is invoked. In some circumstances, the package source directory appears to be used, permitting the original source directory (stdio in the example above) to be located. However, in other circumstances, the package build directory appears to be used, rendering relative paths and defeating the wildcard expansion.

To ensure that files are always found, the variable THISDIR is defined as the absolute path to the package sources where the Makefile is situated:

THISDIR = $(PKGDIR_ABS)/libc

Here, PKGDIR_ABS is a variable set by the build system itself. By using this technique, the wildcard expansion will always locate the indicated files.

Another complicating factor involves the way that certain variables are processed by the build system. Once a list of source files has been constructed, these files must then be converted to relative paths before being assigned to the SRC_C variable. It appears that SRC_C must only contain relative paths since it builds other paths directly from them.

Mathematical Routines

The core C library functionality in Newlib (libc) also relies directly on mathematical routines (libm) with certain routines being built directly into the C library. The files from libm involved were identified and have been explicitly listed in the libc Makefile.

(Note that the involvement of these incorporated routines is separate from a complete libm which is currently not built.)

Functionality Alternatives

A few different choices can be made with Newlib to select functionality appropriate for particular deployment environments. The principal areas of concern are the standard input/output (stdio) library's routines for formatting and scanning and the standard library's (stdlib) routines for memory allocation (malloc).

Here, it appeared to be more convenient to use the conventional stdio routines, discarding the alternative "nano" routines, but to use the "nano" implementation of malloc instead of the conventional version. It should be noted that both of these areas of functionality specialise source files to produce further variants amongst the generated objects:

stdio
vfprintf.c, vfwprintf.c, vfscanf.c, vfwscanf.c
stdlib
nano-mallocr.c

Problematic Files

A number of files are excluded from the resulting C library due to problems building them. Some are platform-specific and are unlikely to be used on most architectures (xdr/xdr_float_vax.c, for instance). Others may need further investigation (posix/engine.c, for instance).

Configuration Settings

The following configuration definitions were asserted for the C library:

The possibilities associated with reentrancy are documented in the file pkg/libc_newlib_headers/include/reent.h, and the above use of REENTRANT_SYSCALLS_PROVIDED indicates that system-level functionality is provided using functions that accept a parameter referring to thread-specific data. The intention is to avoid multiple threads corrupting shared global state (such as errno).

The __DYNAMIC_REENT__ definition is apparently more usually asserted in the pkg/libc_newlib_headers/include/sys/config.h file. This file contains a lot of complexity that should probably be eliminated, focusing on defining a single set of L4Re-appropriate settings that will most likely apply to all supported architectures.

System-Specific Resources

The libc/l4re directory has been created in the libc_newlib package to hold functionality integrating Newlib with L4Re. Such a location has been chosen to be rather more obvious than a subdirectory of libc/sys, which contains a fairly haphazard collection of system- and/or architecture-specific entries.

The intention within the libc/l4re directory is to provide definitions of functions called by the rest of the C library implementation. These functions will, in turn, invoke system-level mechanisms in order to deliver the necessary results back to the C library and ultimately to callers of C library functions.

For instance, the _open_r function is a re-entrant implementation of the functionality required to perform file-opening operations within the broader L4Re system. To achieve this, it invokes filesystem client functions that are able to interact with objects within L4Re, these functions being provided by a separate library. Due to the involvement of this "client" library, the _open_r function and other filesystem-related functions are found in the sys_client.c file.

IPC and C Library Dependencies

As noted in the context of providing system-level integration within the C library, additional libraries are needed to provide code that knows how to interact with other elements of the system. Ultimately, two such libraries are involved:

Both of these libraries are labelled "minimal" due to the way they are built - not relying on a complete C library - and due to the original development of their functionality employing the conventional L4Re libraries. Eventually, these "minimal" forms should become the standard forms of these libraries, and the original library dependencies will be forgotten.

A frustrating circular dependency issue can arise with these libraries. Since they may require functionality provided by the C library (such as standard string handling functions), yet these libraries are needed to build the C library itself, problems occur when statically linking programs against the stack of libraries. An informative article about this is "Library order in static linking".

A solution employed by existing L4Re components is to introduce the following linking-related definition in the Makefiles for the "minimal" C library dependencies:

PC_EXTRA = Link_Libs= %{static:-lc_newlib}

As a consequence, the necessary C library functionality will be available to these libraries when incorporated into a statically linked program, even if no other part of the program requires such functionality.

Package Manifest

In addition to packages provided for the filesystem experiment, the following packages are currently used to explore this work:

The idl4re tool is also required.