build/rc/notes.rst

Work notes
==========

These are my notes on the evolution of rc. I used to keep these in separate files on my development
machine, but it makes more sense to include them here. Warning: geeky stuff ahead.

Unlike in most parsers, the unary minus operator is currently not part of the 'data' rules, but
  of 'integer' and 'float'. When expression support is more complete, it may make more sense to put
  this in 'data'. However, 'integer' is used in other places as well and these places allow
  negative numbers too. Maybe we can replace the calls to 'integer' with 'data' and allow the data
  to the integer only (by always trying to cast it, perhaps). Then you can do stuff like
  `resource(10 + 1) 123;` But maybe that goes too far.

When filling in boolean fields of user-defined types or messages, you can do either
  `field = true` or `field = false`. I thought it would be nice if you could also do just
  `field` and `!field`. However, that introduced a reduce/reduce conflict in the parser. You see,
  the name of the field is an IDENT token, but we already have something like that in the 'type'
  rule of 'data'. The parser doesn't know what the IDENT is supposed to be: the name of the boolean
  field or the name of the type. Maybe there is a solution for this by shuffling around the parser
  rules a bit.

Support for the built-in types point, rect, and rgb_color is currently hardcoded into the
  decompiler. The other built-in types -- app_flags, mini_icon, etc -- are not supported at all.
  It would be better to use the type symbol table for this as well. Then the decompiler can also
  support user-defined types (although these type definitions must be provided to the decompiler
  somehow). This is advanced stuff that probably no one will ever use.

The builtin types are added to the symbol table "by hand". You can see this near the bottom of
  'parser.y'. This is a bit cumbersome, so I have devised an alternative. We put the builtin type
  definitions in an rdef script and install this in a "standard include dir", for example:
  ~/config/include/rc. Before it compiles the user's rdef files, the compiler first loads all
  scripts from that standard folder. (This also allows us to use these rdef files for decompiling,
  and users can simply install their own. See above.)

In "auto names" mode, the decompiler currently does not use the enum symbol table. So if two
  resources have the same name and that name is a valid C/C++ identifier, the decompiler will add
  two conflicting symbols to the enum statement. This can also happen when multiple input file
  have conflicting resource IDs.

When you decompile certain apps (BeMail, Slayer) and then compile these rdef files again, the
  archive and message fields in the new .rsrc file are larger than the original's. I think this is
  because rc doesn't add the message fields as "fixedSize" (see the BMessage docs from the BeBook).
  This doesn't really hurt, though, since the BMessage will be properly unflattened regardless.

Right now, archives are treated as messages. Maybe we should give them their own type,
  B_ARCHIVED_OBJECT (type code 'ARCV').

New options, stolen from other resource compilers (rez and beres):

-D --define symbol[=value]
    set the value of symbol to value (or 1 if no value supplied)

--no-names
    do not write any names in resource file

-h
    write resource as C struct in a header file

-d
    dump resource to stdout

Attributes. I would be nice to have a tool that can take text-based descriptions of attributes and
write out an attribute file. Of course, rc is the ideal candidate for this, since it already does
the same for resources. However, resources and attributes are not the same. Attributes are
name/data pairs. The name should be unique. They don't have IDs. They do have a type code, but it
isn't used to identify the attribute. It is probably best to add a new kind of definition:
attribute(). Should this statement allow only simple data, or must attributes be able to handle
flattened messages, arrays, etc too? A source file should either contain resource() statements or
attribute() statements, not both.

User-defined symbolic constants. To keep things simple, adding a #define keyword would suffice,
although this always creates global symbols so there is a chance of name collisions. In addition
(or instead) we can extend user-defined types to have their own (local) defines too. If we use the
#define keyword, we should infer the type of the constant from the data (100 is integer, 10.5 is a
float, etc). This is necessary because we don't have a separate preprocessor step like C/C++ does --
that is why our symbolic constants are typed.