Rich Datatypes Without Datatype Fascism

Introduction

The basic idea of the YOSH shell is to introduce a richer set of datatypes to the command shell and the programs that run in it. Some people who I've discussed the idea with have assumed that this means that programs that would operate in this environment must communicate in a common data format mandated by the shell. In a sense this is true: but the key idea is that programs must use some common format to identify how their data is encoded: it is not necessary that the programs actually change how they encode their data.

Conformity vs. Inspectability

Basically there are two ways to make different tools work together automatically... The first is to make them use the same datatype, while the second is to make them be very explicit about what data type they're communicating with, so their peers will know how to translate the data, if necessary...

Conformity is generally a problem because people using old tools don't want to rewrite them to use a different data format, and people writing new tools want to choose (or design) the format that's best for the job, rather than getting roped into some less optimal format just because it's the local standard.

What interests me more than conformity is inspectability: the power to know just enough about the formatting of a piece of data to know what to do with it, if it's not in the format you expect. This accomplishes a number of useful things:

Of course, this can't be accomplished without a certain level of conformity. Processes may communicate the data itself however they like, but they must agree to a standard format for identifying the data structure they've used.

YOSH approaches to providing inspectability

YOSH-native communication format

Programs targetted at the YOSH environment may, of course, be specifically written for it. Though naturally I know there won't be many programs written like this for a long time, if ever... But this capability is an important building block for other approaches. Naturally, this native format will be written such that, to the extent possible, it will never be misidentified as another file format and vice versa. I also figure that the format should be written such that, when data of some other format is wrapped in it, it's fairly easy to extract again.

Type Identification Channels

YOSH will provide ways to describe the behavior of a program statically: this information about the behavior and data formats used by the program (which may include calling arguments, input and output formats, etc.) could be attached to the executable itself through file extended attributes, through a set of centrally-located rules, through an interface script on the path, or could possibly be embedded into ELF binaries themselves.

Default Assumptions

For processes which are not written for or adapted to the YOSH environment, some baseline assumptions will still be available to bind these programs to YOSH in useful ways:


George Caswell
Last modified: Thu Nov 15 20:36:14 EST 2007