S-Lang provides built-in supports for two different I/O facilities.
The simplest interface is modeled upon the C language stdio
streams interface and consists of functions such as fopen
,
fgets
, etc. The other interface is modeled on a lower level
POSIX interface consisting of functions such as open
,
read
, etc. In addition to permitting more control, the lower
level interface permits one to access network objects as well as disk
files.
The stdio
interface consists of the following functions:
fopen
, which opens a file for read or writing.
fclose
, which closes a file opened by fopen
.
fgets
, used to read a line from the file.
fputs
, which writes text to the file.
fprintf
, used to write formatted text to the file.
fwrite
, which may be used to write objects to the
file.
fread
, which reads a specified number of objects from
the file.
feof
, which is used to test whether the file pointer is at the
end of the file.
ferror
, which is used to see whether or not the stream
associated with the file has an error.
clearerr
, which clears the end-of-file and error
indicators for the stream.
fflush
, used to force all buffered data associated with
the stream to be written out.
ftell
, which is used to query the file position indicator
of the stream.
fseek
, which is used to set the position of the file
position indicator of the stream.
fgetslines
, which reads all the lines in a text file and
returns them as an array of strings.
In addition, the interface supports the popen
and pclose
functions on systems where the corresponding C functions are available.
Before reading or writing to a file, it must first be opened using
the fopen
function. The only exceptions to this rule involves
use of the pre-opened streams: stdin
, stdout
, and
stderr
. fopen
accepts two arguments: a file name and a
string argument that indicates how the file is to be opened, e.g.,
for reading, writing, update, etc. It returns a File_Type
stream object that is used as an argument to all other functions of
the stdio
interface. Upon failure, it returns NULL
. See the
reference manual for more information about fopen
.
In this section, some simple examples of the use of the stdio
interface is presented. It is important to realize that all the
functions of the interface return something, and that return value
must be dealt with.
The first example involves writing a function to count the number of lines in a text file. To do this, we shall read in the lines, one by one, and count them:
define count_lines_in_file (file)
{
variable fp, line, count;
fp = fopen (file, "r"); % Open the file for reading
if (fp == NULL)
verror ("%s failed to open", file);
count = 0;
while (-1 != fgets (&line, fp))
count++;
() = fclose (fp);
return count;
}
Note that &line
was passed to the fgets
function. When
fgets
returns, line
will contain the line of text read in
from the file. Also note how the return value from fclose
was
handled.
Although the preceding example closed the file via fclose
,
there is no need to explicitly close a file because S-Lang will
automatically close the file when it is no longer referenced. Since
the only variable to reference the file is fp
, it would have
automatically been closed when the function returned.
Suppose that it is desired to count the number of characters in the
file instead of the number of lines. To do this, the while
loop could be modified to count the characters as follows:
while (-1 != fgets (&line, fp))
count += strlen (line);
The main difficulty with this approach is that it will not work for
binary files, i.e., files that contain null characters. For such
files, the file should be opened in binary mode via
fp = fopen (file, "rb");
and then the data read in using the fread
function:
while (-1 != fread (&line, Char_Type, 1024, fp))
count += bstrlen (line);
The fread
function requires two additional arguments: the type
of object to read (Char_Type
in the case), and the number of
such objects to read. The function returns the number of objects
actually read, or -1 upon failure. The bstrlen
function was
used to compute the length of line
because for Char_Type
or UChar_Type
objects, the fread
function assigns a
binary string (BString_Type
) to line
.
The foreach
construct also works with File_Type
objects.
For example, the number of characters in a file may be counted via
foreach (fp) using ("char")
{
ch = ();
count++;
}
To count the number of lines, one can use:
foreach (fp) using ("line")
{
line = ();
num_lines++;
count += strlen (line);
}
Finally, it should be mentioned that neither of these examples should
be used to count the number of characters in a file when that
information is more readily accessible by another means. For
example, it is preferable to get this information via the
stat_file
function:
define count_chars_in_file (file)
{
variable st;
st = stat_file (file);
if (st == NULL)
error ("stat_file failed.");
return st.st_size;
}
The previous examples illustrate how to read and write objects of a single data-type from a file, e.g.,
num = fread (&a, Double_Type, 20, fp);
would result in a Double_Type[num]
array being assigned to
a
if successful. However, suppose that the binary data file
consists of numbers in a specified byte-order. How can one read
such objects with the proper byte swapping? The answer is to use
the fread
function to read the objects as Char_Type
and
then unpack the resulting string into the specified data type,
or types. This process is facilitated using the pack
and
unpack
functions.
The pack
function follows the syntax
BString_Type pack (format-string, item-list);
and combines the objects in the item-list according to
format-string into a binary string and returns the result.
Likewise, the unpack
function may be used to convert a binary
string into separate data objects:
(variable-list) = unpack (format-string, binary-string);
The format string consists of one or more data-type specification characters, and each may be followed by an optional decimal length specifier. Specifically, the data-types are specified according to the following table:
c char
C unsigned char
h short
H unsigned short
i int
I unsigned int
l long
L unsigned long
j 16 bit int
J 16 unsigned int
k 32 bit int
K 32 bit unsigned int
f float
d double
F 32 bit float
D 64 bit float
s character string, null padded
S character string, space padded
x a null pad character
A decimal length specifier may follow the data-type specifier. With
the exception of the s
and S
specifiers, the length
specifier indicates how many objects of that data type are to be
packed or unpacked from the string. When used with the s
or
S
specifiers, it indicates the field width to be used. If the
length specifier is not present, the length defaults to one.
With the exception of c
, C
, s
, S
, and
x
, each of these may be prefixed by a character that indicates
the byte-order of the object:
> big-endian order (network order)
< little-endian order
= native byte-order
The default is native byte order.
Here are a few examples that should make this more clear:
a = pack ("cc", 'A', 'B'); % ==> a = "AB";
a = pack ("c2", 'A', 'B'); % ==> a = "AB";
a = pack ("xxcxxc", 'A', 'B'); % ==> a = "\0\0A\0\0B";
a = pack ("h2", 'A', 'B'); % ==> a = "\0A\0B" or "\0B\0A"
a = pack (">h2", 'A', 'B'); % ==> a = "\0\xA\0\xB"
a = pack ("<h2", 'A', 'B'); % ==> a = "\0B\0A"
a = pack ("s4", "AB", "CD"); % ==> a = "AB\0\0"
a = pack ("s4s2", "AB", "CD"); % ==> a = "AB\0\0CD"
a = pack ("S4", "AB", "CD"); % ==> a = "AB "
a = pack ("S4S2", "AB", "CD"); % ==> a = "AB CD"
When unpacking, if the length specifier is greater than one, then an
array of that length will be returned. In addition, trailing
whitespace and null character are stripped when unpacking an object
given by the S
specifier. Here are a few examples:
(x,y) = unpack ("cc", "AB"); % ==> x = 'A', y = 'B'
x = unpack ("c2", "AB"); % ==> x = ['A', 'B']
x = unpack ("x<H", "\0\xAB\xCD"); % ==> x = 0xCDABuh
x = unpack ("xxs4", "a b c\0d e f"); % ==> x = "b c\0"
x = unpack ("xxS4", "a b c\0d e f"); % ==> x = "b c"
Consider the task of reading the Unix system file
/var/log/utmp
, which contains login records about who logged
onto the system. This file format is documented in section 5 of the
online Unix man pages, and consists of a sequence of entries
formatted according to the C structure utmp
defined in the
utmp.h
C header file. The actual details of the structure
may vary from one version of Unix to the other. For the purposes of
this example, consider its definition under the Linux operating
system running on an Intel processor:
struct utmp {
short ut_type; /* type of login */
pid_t ut_pid; /* pid of process */
char ut_line[12]; /* device name of tty - "/dev/" */
char ut_id[2]; /* init id or abbrev. ttyname */
time_t ut_time; /* login time */
char ut_user[8]; /* user name */
char ut_host[16]; /* host name for remote login */
long ut_addr; /* IP addr of remote host */
};
On this system, pid_t
is defined to be an int
and
time_t
is a long
. Hence, a format specifier for the
pack
and unpack
functions is easily constructed to be:
"h i S12 S2 l S8 S16 l"
However, this particular definition is naive because it does not
allow for structure padding performed by the C compiler in order to
align the data types on suitable word boundaries. Fortunately, the
intrinsic function pad_pack_format
may be used to modify a
format by adding the correct amount of padding in the right places.
In fact, pad_pack_format
applied to the above format on an
Intel-based Linux system produces the result:
"h x2 i S12 S2 x2 l S8 S16 l"
Here we see that 4 bytes of padding were added.
The other missing piece of information is the size of the structure.
This is useful because we would like to read in one structure at a
time using the fread
function. Knowing the size of the
various data types makes this easy; however it is even easier to use
the sizeof_pack
intrinsic function, which returns the size (in
bytes) of the structure described by the pack format.
So, with all the pieces in place, it is rather straightforward to write the code:
variable format, size, fp, buf;
typedef struct
{
ut_type, ut_pid, ut_line, ut_id,
ut_time, ut_user, ut_host, ut_addr
} UTMP_Type;
format = pad_pack_format ("h i S12 S2 l S8 S16 l");
size = sizeof_pack (format);
define print_utmp (u)
{
() = fprintf (stdout, "%-16s %-12s %-16s %s\n",
u.ut_user, u.ut_line, u.ut_host, ctime (u.ut_time));
}
fp = fopen ("/var/log/utmp", "rb");
if (fp == NULL)
error ("Unable to open utmp file");
() = fprintf (stdout, "%-16s %-12s %-16s %s\n",
"USER", "TTY", "FROM", "LOGIN@");
variable U = @UTMP_Type;
while (-1 != fread (&buf, Char_Type, size, fp))
{
set_struct_fields (U, unpack (format, buf));
print_utmp (U);
}
() = fclose (fp);
A few comments about this example are in order. First of all, note
that a new data type called UTMP_Type
was created, although
this was not really necessary. We also opened the file in binary
mode, but this too is optional under a Unix system where there is no
distinction between binary and text modes. The print_utmp
function does not print all of the structure fields. Finally, last
but not least, the return values from fprintf
and fclose
were dealt with.