nonterm: nonterm1 nonterm2 nonterm3 ;The corresponding action calls an external procedure named for the left hand side and taking the values of the right side non-terminals as arguments.
{$$=nonterm(parsestate,$1,$2,$3);}Note that this form of parsing action was requested by John Caron so that the same .y file could be used for C and Java parsers. In line with this, all non-terminals are defined to return a type of "Object", which is "void*" for C parsers and "Object" for Java parsers. The cost is the use of a lot of casting in the action procedures.
Note the extra "parsestate" argument. The parsers are constructed as reentrant and this argument contains the per-parser state information.
The bodies of the action procedures is defined in a separate file called "dapparselex.c". That file also contains the lexer required by the parser. Note that lex was not used because of the simplicity of the lexemes.
One of the issues that must be addressed by any bottom-up parser is handling the accumulation of sets of items (nodes, etc.)
The canonical way that this is handled in the oc parsers is to use the following form of production.
1 declarations: 2 /* empty */ {$$=declarations(parsestate,NULL,NULL);} 3 | declarations declaration {$$=declarations(parsestate,$1,$2);} 4 ;The base case (line 2) action is called with NULL arguments to indicate the base case. The recursive case (line 3) is called with the values of the two right side non-terminals.
The corresponding action code is defined as follows.
1 Object 2 declarations(DAPparsestate* state, Object decls, Object decl) 3 { 4 Oclist* alist = (Oclist*)decls; 5 if(alist == NULL) alist = oclistnew(); 6 else oclistpush(alist,(ocelem)decl); 7 return alist; 8 }The base case is handled in line 5. It creates and returns a Sequence instance; a Sequence is a dynamically extendible array of arbitrary items (see below). The recursive case is in line 6, where it is assumed that the Sequence argument is defined and there is a decl object that should be inserted into the sequence.
This pattern, in various forms, is ubiquitous in the parsers.
Currently, there is no need for this parser, so it is included in the source tree, but is not used.
unsigned int magic | – | A magic number to identify this structure. | ||||||||||||
OCtype octype | – | Defines the general kind of node. | ||||||||||||
OCtype etype | – | Used for attribute nodes and primitive nodes to define the primitive type. | ||||||||||||
char* name | – | From the DDS. | ||||||||||||
char* fullname | – | Fully qualified name such as a.b.c. | ||||||||||||
OCnode* container | – | Parent node of this node. | ||||||||||||
OCnode* root | – | root node of the tree containing this node. | ||||||||||||
OCnode* datadds | – | The correlated DATA DDS node, if any */ | ||||||||||||
OCdiminfo dim | – | Extra information about dimension nodes. | ||||||||||||
OCarrayinfo array | – | Extra information about nodes that have rank > 0. | ||||||||||||
OClist* subnodes | – | (SequenceSequence* attributes | – | (Sequence | struct OCSKIP skip | –
| Extra information about the node vis-a-vis the datadds data
to improve access times.
| OCtypeinfo | –
| Extra information about type definitions
if netcdf-4 is being supported.
| OCtypeinfo | –
| Extra information about group definitions
if netcdf-4 is being supported.
| |
The auxilliary structs are as follows.
Struct | Field | Description | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
struct OCdiminfo | ||||||||||||||||||||||||||||
OCnode* array | – | The defining array node, if known | ||||||||||||||||||||||||||
unsigned int arrayindex | – | The rank position in the defining array node, if known | ||||||||||||||||||||||||||
ocindex_t declsize | – | Dimension size as specified in the (data)DDS | ||||||||||||||||||||||||||
struct OCarrayinfo | ||||||||||||||||||||||||||||
OClist* dimensions | – | (Sequence
| unsigned int rank
| –
| |dimensions|
| struct OCattribute
|
| char* name
| –
| of the attribute
|
| OCtype etype
| –
| primitive type of the attribute
|
| char* name
| –
| of the attribute
|
| size_t nvalues
| –
| Length of the values field
|
| char** values
| –
| List of values associated with the attribute
| |
Note that the totalsize of sequences is not possible to compute because the number of records is unknown before any data is fetched.
The skip values are pre-computed recursively in the procedure occomputeskipdata in ocnode.c and should be consulted to see how the computation is carried out.
A good analog is to the FILE object used by C standard IO. Like a FILE, an OCstate provides the context for some operation or object.
The state is used for a variety of purposes and is as a rule the first argument of any of the API procedures.
unsigned int magic | – | A magic number to identify this structure. |
CURL* curl | – | The handle to a CURL connection. Its lifetime is that of the OCstate structure. |
OClist* trees | – | The set of root objects for previously fetched DAP requests. See OC Trees. |
OCURI* uri | – | URI for fetching data. |
OCbytes* packet | – | buffer for temporary storage of fetched data. |
OCcontent* contentlist | – | linked list of all created OCcontent objects. |
struct OCerrdata error | – | A struct to hold error return info from server (see below). |
struct OCcurlflags curlflags | – | The curl flags to set before fetch. (see below). |
struct OCSSL ssl | – | SSL related Authorization and authentication information. (see below). |
struct OCproxy proxy | – | Proxy information. (see below). |
struct OCcredentials creds | – | Credentials for BASIC (i.e. password-based) authentication (see below). |
The auxilliary structs are as follows. For the curl flags, the curl documentation should be consulted.
Struct | Field | Description | |
---|---|---|---|
struct OCerrdata | |||
char* code | – | A numeric error code (in ascii) from the dap server | |
long httpcode | – | Any HTTP error code returned (i.e. 404) | |
struct OCcurlflags | |||
int compress | – | CURLOPT_ENCODING | |
int verbose | – | CURLOPT_VERBOSE | |
int timeout | – | CURLOPT_TIMEOUT | |
int followlocation | – | CURLOPT_FOLLOWLOCATION | |
int maxredirs | – | CURLOPT_MAXREDIRS (=10) | |
char* useragent | – | ||
char* cookiejar | – | CURLOPT_COOKIESESSION, CURLOPT_COOKIEJAR | |
char* cookiefile | – | CURLOPT_COOKIEFILE | |
struct OCSSL | |||
int validate | – | CURLOPT_SSL_VERIFYPEER | |
char* certificate | – | CURLOPT_SSLCERT | |
char* key | – | CURLOPT_SSLKEY | |
char* keypasswd | – | CURLOPT_KEYPASSWD | |
char* cainfo | – | CURLOPT_CAINFO | |
char* capath | – | CURLOPT_CAPATH | |
int verifypeer | – | CURLOPT_SSL_VERIFYPEER | |
struct OCproxy | |||
char* host | – | The proxy host name | |
int port | – | The proxy port number | |
struct OCcredentials | |||
char* username | – | The username for logging into the proxy | |
char* password | – | The pass word for logging into the proxy |
Associated with the root node of every tree is an instance of OCtree, which is used to store information about the fetch and the tree.
The OCtree structure contains the following fields.
OCdxd dxdclass | – | Enumeration instance: one of OCDAS, OCDDS or OCDATADDS. |
char* constraint | – | The constraint string used when fetching the DAP object. |
char* text | – | The text of the DAP object as received from the server. |
OCnode* root | – | Cross link to the root node to which this OCtree instance is attached. |
OCstat* state | – | Cross link to the state containing the root. |
OClist* nodes | – | A list of all nodes in the tree rooted at root. |
When the dxdclass is OCDATADDS, the the following additional fields are defined and used.
unsigned long bod | – | offset in the datadds packet to the beginning of the binary XDR data. |
char* filename | – | name of the temporary file for holding datadds data. |
FILE* file | – | FILE object for the temporary file. |
unsigned long filesize | – | size of the temporary file. |
XDR* xdrs | – | XDR handle for walking the temporary file. |
OCmemdata* memdata | – | root of the compiled datadds packet. |
One important thing to understand is that the externally visible API hides the actual definitions of the OCstate, OCnode, and OCcontent types. This is accomplished by defining alternate, externally visible, types that are internally mapped to the appropriate actual type and are the values passed into and out of the API procedures.
The types and mapping are as follows.
It is important to be able to verify for each API that its arguments are semantically correct This is handled by the macro OCVERIFY.
If OC_FASTCONSISTENCY is defined, then OCVERIFY will check, by casting, for an expected magic number at the beginning of the external object. If OC_FASTCONSISTENCY is not defined, then a table of all created objects is searched. Since the fast consistency check is preferable, the option of using the object map is only useful in certain debugging situations when it might be desirable to track all of the created object.
Once an API argument is verified, it needs to be cast to the appropriate internal type. This is accomplished using the OCDEREF macro, which casts the argument to the proper type and stores it in a specified local variable of internal type.
A navigational interface has been defined that allows for simplified walking of the data dds packet data. The navigational interface has been modified multiple times, and the one described here is a variation on the one designed by Patrick West for the IDL client for OPeNDAP.
The oc user's manual (ocuserman.html) should be read to obtain a working understanding of the navigational interface (the oc_data_XXX procedures). This section discusses the complexities underlying that interface.
In addition to the OCstate structure and the OCnode structure, the navigational interface defines an OCcontent structure.
unsigned int magic | – | A magic number to identify this structure. |
OCmode mode | – | The access mode (see below). |
OCstate* state | – | the state object to which this content is associated. |
OCnode* node | – | the OCnode that serves as template for the data pointed to by this content object. |
OCtree tree | – | The specific tree of nodes, typically refers to the DDS tree associated with a DATADS fetch. |
int packed | – | True if this content points to packed data, which means that the node octype is OC_PRIMITIVE, its etype is OC_BYTE or OC_CHAR, and it is not a scalar object. |
struct OCCACHE | – | Cache to track last index and xdr positions (see below). |
struct OCcontent* next | – | link to next OCcontent object; allows reclamation and reuse. |
The OCcontent object represents a subset of the data (aka an instance) within the data part of a DATADDS response. The node field serves as a template for accessing the data (in xdr format) pointed to by the OCcontent object.
The mapping between nodes and contents is one-to-many. That is, there often will be multiple data instances of a given node type in a DATADDS response. Consider the following example.
Dataset { Structure { int16 f11[2]; float32 f12; } S1; Structure { int16 f21; float32 f22[2]; } S2[3] } D1;If we have a data response with this DDS, then the following instances will exist.
Class | Count | Instances |
---|---|---|
D1 | 1 | D1 |
S1 | 1 | D1.S1 |
f11 | 2 | D1.S1.f11[0] D1.S1.f11[1] |
f12 | 1 | D1.S1.f12 |
S2 | 3 | D1.S2[0] D1.S2[1] D1.S2[2] |
f21 | 3 | D1.S2[0].f21 D1.S2[1].f21 D1.S2[2].f21 |
f22 | 6 | D1.S2[0].f22[0] D1.S2[0].f22[1] D1.S2[1].f22[0] D1.S2[1].f22[1] D1.S2[2].f22[0] D1.S2[2].f22[1] |
The goal is to allow the user to navigate to all of the instances contained in a given DATADDS data packet and, when desired, extract the instance as usable data. Note however, that only primitive typed arrays (or scalars) can have their data extracted. It is not possible in the current interface to, for example, extract a whole Structure object; rather it must be be done by extracting each field in turn. This may require recursion if one of the fields is itself, for example, a Grid, Structure, or Sequence.
The most important internal procedures are as follows.
Procedure | Abbreviated Semantics | |
---|---|---|
OCcontent* ocnewcontent(OCstate* state) | – | Obtain a unused OCcontent object; either off the free list or using malloc(). |
void ocfreecontent(OCstate* state, OCcontent* content) | – | Release a content object onto the free list for later reuse |
int ocrootdata(struct OCstate*, struct OCnode*, struct OCcontent*) | – | Obtain an OCcontent object that points to the data dds as a whole |
int ocdataith(struct OCstate*, OCcontent*, size_t, OCcontent*) | – | Move to the i'th "position" of this object as controlled by the object's type and a mode. |
int ocgetcontent(struct OCstate*, struct OCcontent*, void* memory, size_t memsize, size_t start, size_t count) | – | Extract the data associated with the current content. As mentioned above, this can only be done for primitive array or scalar data. |
int ocxdrread(struct OCcontent*, XXDR*, char* memory, size_t, ocindex_t index, ocindex_t count) | – | This is the workhorse internal procedure to actually extract the xdr formatted data and convert it to the proper form in memory. |
int ocskipinstance(OCnode* node, XXDR* xdrs, int state, int* tagp) | – | In order to get to some point in the data, it is often necessary to skip over preceding data. This can be a complex activity when sequences and strings are involved. This procedure handles the skipping over of arbitrary data. |
OCmode modetransition(OCnode* node, OCmode srcmode) | – | This procedure determines the mode of the new content returned by the ocdataith procedure. |
One note about OCcontent objects. The reason that there are explicit create and destroy operations is to allow/force the user to control the number of created OCcontent objects and to reuse previously created OCcontent objects. If the API created a new object for every call to, say, ocdimcontent, then there would be an explosion of OCcontent objects equal to the product of the dimension. There would be no way to reclaim them either because it would be impossible to know which are still actively in use.
It is important to understand the modetransition procedure in order to understand how the navigation works. The idea is that we have the following pieces of information:
The transition table has three columns.
Case | Current Mode | Current OCtype | New Mode |
---|---|---|---|
1 | OCARRAYMODE | OC_Grid | OCFIELDMODE |
2 | OC_Structure | OCFIELDMODE | |
3 | OC_Sequence | OCSEQUENCEMODE | |
4 | OCSEQUENCEMODE | any mode | OCFIELDMODE |
5 | OCFIELDMODE | OC_Sequence | OCARRAYMODE |
6 | OC_Grid | OCARRAYMODE | |
7 | OC_Structure | OCARRAYMODE | |
8 | OC_Primitive | OCPRIMITIVEMODE |
The general idea is that given a set of objects (i.e. an array of them or a sequence of them), asking for the i'th element should cause transition to pointing to the actual i'th data item in the sequence. This is seen in cases 1, 2, and 3, where we are transitioning from referencing an array of Grids or Structures or Sequences to referencing a specific Grid/Structure/Sequence in the array. Not that, for purposes of the transitions, scalars are considered arrays of size 1. Also note that arrays of sequences are supported here, but are illegal according to the DAP 2 specification.
Case 4 also shows the same kind of transition, but here the transition is from a pointer to a whole Sequence to the fields of a specific (i'th) record in the Sequence.
Cases 5, 6, 7, and 8 occur when we are moving from to a specific i'th field of a Grid object, Structure Object, or Sequence record. If the field octype is OC_Structure or OC_Grid, we assume that we are moving to an array of those objects, hence the new mode is OCARRAYMODE. If the field type is OC_Sequence, then we are moving to the Sequence object, hence the mode becomes OC_Sequence. If the field type is OC_Primitive, then we have reached the point where actual data extraction is possible, so the mode becomes OCPRIMITIVEMODE.
The ocdataith and ocskipinstance procedures use the OCSKIP and OCCACHE information to efficiently point to, or skip over, objects in the xdr data cache. For example, if the user is trying to reach the i'th element in a primitive typed array field inside a structure, and the offset of the field is known in the OCSKIP information, then a simple calculation will immediately produce a pointer into the xdr data packet to the beginning of that primitive typed field. At that point, oc_data_get can quickly extract the data directly from the xdr data packet.
Even if the offset is not known, other information such as the total object size, or even the instance size, can speed up access by changing what would otherwise be a series of data reads (looking for counts or record tags, for example) into a mix of data reads and repositionings that is faster than the reads alone.
Further, by caching the last referenced index and its corresponding xdr data packet offset, the OCCACHE information can speed up a call to oc_data_ith to access the index'th + 1 object because the search can start with the cached information rather than having to begin at position zero.
The logging interface is defined by the following procedures, but they are just the internal versions of the ones described in ocuserman.html
The uri is assumed to be (most generally) of the form
[param=...,param=...,...]protocol://username:password@host:port/file?constraintThe constraint, in turn is composed of projections and selections.
?projection,projection,...&selection&selection...
The OCURI structure contains the following fields.
char* uri | – | The uri as originally passed in to the parser | |
char* protocol | – | Protocol field (e.g. "https") of the uri | |
char* user | – | User name field; NULL if not present | |
char* password | – | Password field; NULL if not present | |
char* host | – | Host field | |
char* port | – | Port number; 0 if not present | |
char* file | – | File part of the uri, with the leading '/' | |
char* constraint | – | Constraint (not including leading '?'); NULL if not present | |
char* projection | – | The projections in the constraint; NULL if not present. | |
char* selection | – | The selections in the constraint; NULL if not present. | |
char* params | – | The parameters in the constraint; NULL if not present. | |
char** paramlist | – | A "compiled" version of the params in envv format, where paramlist[i] is the param name and paramlist[i+1] is the param value. The whole list is NULL terminated. It is assumed that the name part and the value part are never NULL. Rather, the empty string ("") is used to indicated no value. |
The most important parts of the ocuri API are as follows.
Operation | Semantics | |
---|---|---|
int ocuriparse(const char* uri, OCURI** ocurip) | – | Creates an instance of OCURI, stores the pointer to it in ocurip, and fills the created instance with data from parsing the uri string into its component parts. It returns 0 if fails, 1 otherwise. |
void ocurifree(OCURI* ocuri) | – | Free all the memory associated with the argument, including the argument instance. |
int ocuridecodeparams(OCURI* ocuri) | – | Parses ocuri->params into ocuri->parmlist |
const char* ocurilookup(OCURI* ocuri, const char* param) | – | Searches ocuri->paramlist for a match to param. If not found, then return NULL, otherwise return the value associated with the param; an empty value is represented by the zero-length string "", not by NULL. |
char* ocuriencode(char* s, char* allowable); | – | Applies URL character encoding and returns a new encoded instance of s. The set of characters to not encode is specified by the allowable argument. |
char* ocuribuild(OCURI* ocuri, const char* prefix, const char* suffix, int flags) | – | Construct a url string from the fields in ocuri;
the new url is prefixed (before any parameters are added)
with the prefix argument
and suffixed (before any constraints are added) with the
suffix argument; the protocol, host, port, and file
parts are always included, and the flags argument
(possibly an or of multiple flags)
determines what other parts are included as follows
|
The canonical code for non-destructive walking of a Sequence
OCbytes provides two ways to access its internal buffer of characters.
One is "ocbytescontents()", which returns a direct pointer to the buffer,
and the other is "ocbytesdup()", which returns a malloc'd string containing
the contents and null terminated.
Suppose we have the DDS field
A particular point in the three dimensions, say [x][y][z], is reduced to
a number in the range 0..29 by computing
for(i=0;i<oclistlength(list);i++) {
T* element = (T*)oclistget(list,i);
...
}
Multi-Dimensional Array Handling
Within a data packet, the DAP protocol "linearizes" multi-dimensional
arrays into a single dimension. The rule for converting a multi-dimensional
array to a single dimensions is as follows.
Int F[2][5][3];
.
There are obviously a total of 2 X 5 X 3 = 30 integers in F.
Thus, these three dimensions will be reduced to a single dimension of size 30.
((x*5)+y)*3+z
.
The corresponding general C code is as follows.
size_t
dimmap(int rank, size_t* indices, size_t* sizes)
{
int i;
size_t count = 0;
for(i=0;i<rank;i++) {
count *= sizes[i];
count += indices[i];
}
return count;
}
In this code, the indices variable corresponds to the x,y, and z.
The sizes variable corresponds to the 2,5, and 3.
Change Log
Copyright
Copyright 2009, UCAR/Unidata and OPeNDAP, Inc.