YOSH Syntax Concepts
The YOSH design is still evolving: frankly, there are some fundamental problems with it I just haven't solved yet. I am trying to design syntax for a still-developing feature set, incorporate useful or familiar ideas from traditional shells, and resolve all the features against each other so the result will be a consistent, reliable, and secure syntax. However, of the ideas I'm trying to incorporate some I am more attached to than others: these are the ones that will almost certainly make it into the final cut.
Traditional POSIX-style Shell Constructs
The basic concept of YOSH is to add new functionality to the classic UNIX shell methodology: similar to what Microsoft did with Power Shell, except more suited to UNIX sensibilities. I want to use these concepts so the functionality will be familiar and comfortable to people used to UNIX shells - myself included. However, the way they work in YOSH may be very different. The classic concepts I want to use include:
- Basic Redirection Syntax: Redirection to or from a file (>, <), through another process (|) and so on.
- Command Sequencing and Job Control: Specifically, using semicolons to separate commands on a line and ampersands to run a job in the background.
- GNU-style Command Line Arguments: Short options prefixed by a single dash, long options prefixed by a double-dash, etc.
- General Command Format: In classic shells commands and arguments are simply separated by spaces. I want to embrace this style and apply it to YOSH's objects.
- Variable Syntax: The dollar-sign notation is a useful way to distinguish environment variables from other values.
- Array Behavior: In shells that have support for arrays, they are usually implemented such that an array of one item is equivalent to just that one item. Because YOSH will make a datatype's form explicitly known, I feel like this will be much more convenient than mandating that any piece of code that might potentially yield multiple values must always yield an array...
Other Programming Language Concepts
- C-style Command Blocks: I feel like BASH's structured programming constructs are horrendous. I much prefer just using some syntax or keywords to enclose a group of statements into a block, rather than have to have directives like "done" or "elsif" or "fi"...
- Smalltalk-style Object Interactions: I feel the Smalltalk concept of sending signals to an object could be a good fit for putting objects into a POSIX-style shell. The YOSH version may work a bit differently, however.
- Evaluation Rules for a Command Argument are Similar to the Evaluation Rules for the Command Name: BASH treats your first command token as a command name and (generally) all the rest as strings. If you refer to ./foo.txt at the beginning of a command, I think it should mean the same thing as ./foo.txt anywhere else in your command.
- Parentheses for Grouping Expressions: BASH, etc. provide similar facilities through slightly different syntax: I feel it'd be simplest to just use plain parens, if possible.
- Expression Evaluation with Infix Operators: A shell with parens and traditional shell-style command structure would almost look like LISP when you start embedding expressions in other expressions... LISP holds a special place in my heart but it's not what I'm going for here. I want to be able to put mathematical expressions in the middle of a command and have them interact with environment variables, commands, and objects. BASH can do this using the $(()) syntax, of course, but I want a solution that's more natural.
- Prolog/Haskell-style Positional Solving: In most languages, a function that might normally be expressed as F(x)→y can only be evaluated to get the value of y. In Prolog, the expression can be evaluated to obtain the value of x for a given value of y. This isn't something I expect to be able to work universally, but I'd like to implement it as a syntax form that objects could potentially support. Haskell doesn't support this kind of evaluation, but it allows constructors that are applied to be stripped off by pattern-matching. Both ideas appeal to me: I'd like to be able to express things similarly whether I'm testing a condition or asking the shell to make the condition true.
- Lazy Evaluation (sort-of): Haskell works on the idea of "Lazy Evaluation": a function generating a sequence of values is normally evaluated only far enough to generate the data that's being consumed. This idea is useful for expressing certain types of problems: it allows the user to express their sequence-generating functions as though they simply run forever, when in fact they'll only be run to the extent that's needed. Traditional UNIX shells have something similar: it's not true lazy evaluation, but it can work similarly. For instance, run primes 2 | head -5, then run ps to see if primes is still running. It's not, because head terminated. primes's output file descriptor was closed, and so it stopped generating output. primes doesn't strictly limit its output to what's consumed by head in this case, but its output will block if the output buffer fills, and it'll stop when nobody wants its output any more. This idea affects the design of YOSH data exchange formats: specifically, it must be a streaming format, and preferably one that can be versatile about what's written and when: the goal is to be able to treat a large class of problems as though shell evaluation were truly lazy.
Cool Stuff
- Formal Notation Characters as Operators: Basically I want to incorporate some symbols from formal mathematical notation in the syntax of the language. I understand the problems of taking this too far: people won't understand it. But some ideas I think are simple enough that they can be picked up pretty quickly. For instance, Haskell uses <- as to represent ∈, which is a set membership relationship. Similarly I'd like to be able to use -> to represent →, to say something "yields" some value... There are other such concepts I think would be nice to use - and I would also like to be able to display the formal symbols when on a terminal with that capability...
YOSH Syntax Characters
- Type Coersion Operator ":": I have chosen the colon as the "Type Coersion Operator". This is similar to the use of the colon operator in Pascal, except that due to YOSH's more flexible type system in is more like a type coersion than a type declaration. I sometimes also call this operator a "constructor application operator" as the form is roughly equivalent to calling the constructor (the right-hand side of the expression) with the data object (the left-hand side of the expression) as an argument.
- Referencing/Dereferending Operator "@": Currently I'm thinking of using the at-sign to represent ideas like references or symbolic links on the filesystem. I haven't quite worked out the details of its use, however.
- Short and Long Symbol Literal Prefixes "-", "--": Command-line arguments in YOSH are handled differently than in traditional shells. Commands are expected to provide enough information about the available command-line options to allow for tab-completion, for instance. So the basic idea is that command-line arguments are actually symbols, like the symbol type in Lisp, not strings - and the use of the symbol literal syntax helps inform YOSH of what stuff it should leave alone. The two forms behave similarly to GNU calling argument formats: a dash followed by multiple characters is actually a sequence of symbols expressed in short form, while a double-dash followed by a symbol is a long-form symbol.
Internationalization Concerns
- Dollar-sign Notation, etc.: Ideally I'd like YOSH to work in foreign character sets: I don't know if foreign character sets include the dollar-sign or even all the other characters that are common in shell syntax. I do know that in the C programming language, for instance, there are alternate forms of the Boolean operations because some character sets didn't have the required characters...
Characters I Won't Use as Syntax
- Dot: Many languages use the dot in order to specify that a field of a struct or class is being accessed. However, the dot is too common in filenames: treating it specially would cause problems with handling filenames.
- Backslash: It would be more precise to say that I want to treat the backslash as equivalent to the slash. I don't like that Windows still uses backslash as a directory separator - basically just because I prefer slashes, and because it frustrates me when I try to use tools that expect backslashes together with tools that expect slashes and possibly treat backslashes as escape characters. This happens, for instance, when I use win32 xemacs in order to run a compilation command through BASH: xemacs tab completion generates backslashes, but BASH thinks backslashes are an escape character... I also feel the distinction between slashes and backslashes will tend to confuse people who are used to one form or the other: so I want to at least avoid relying on the backslash.
Unfortunate Compromises
Based on the current work-in-progress design, there are some things I know about the shell syntax which aren't really ideal: what can you do, right?
- Files (not on path) must be referenced by explicit relative or absolute path: In YOSH the distinction between data files and executables is slightly blurred: a data file may be invoked the same way as an executable, as an evaluation. If such a data file isn't on the PATH, it must be referenced with explicit relative or absolute path: so far so good, that's typical. But also, executables may be evaluated as arguments to another command, and executables from the PATH may be referenced by subdirectory relative to the PATH - that, and other necessary disambiguation cases, means that a file reference must be explicitly identified as being relative to the filesystem root or the current directory in all cases. In other words, you can't say "cat file.txt", it must be "cat ./file.txt". I'm hoping this will be acceptable.
George Caswell
Last modified: Thu Nov 15 20:49:13 EST 2007