Saturday, 17 March 2012

"Fabric theory": talking about cultural and computational diversity with the same words

In recent months, I have been pondering a lot about certain similarities between human languages, cultures, programming languages and computing platforms: they are all abstract constructs capable of giving a unique form or flavor to anything that is made with them or stems from them. Different human languages encourage different types of ideas, ways of expression, metaphors and poetry while discouraging others. Different programming languages encourage different programming paradigms, design philosophies and algorithms while discouraging others. The different characteristics of different computing platforms, musical instruments, human cultures, ideologies, religions or subcultural groups all similarly lead to specific "built-in" preferences in expression.

I'm sure this sounds quite meta, vague or superficial when explained this way, but I'm convinced that the similarities are far more profound than most people assume. In order to bring these concepts together, I've chosen to use the English word "fabric" to refer to the set of form-giving characteristics of languages, computers or just about anything. I've picked this word partly because of its dual meaning, i.e. you can consider a fabric a separate, underlying, form-giving framework just as well as an actual material from which the different artifacts are made. You may suggest a better word if you find one.

Fabrics

The fabric of a human language stems (primarily) from its grammar and vocabulary. The principle of lingustic relativity, also known as the Sapir-Whorf hypothesis, suggests that language defines a lot about how our ways of thinking end up being like, and there is even a bunch of experimental support for this idea. The stronger, classical version of the hypothesis, stating that languages build hard barriers that actually restrict what kind of ideas are possible, is very probably false, however. I believe that all human languages are "human-complete", i.e. they are all able to express the same complete range of human thoughts, although the expression may become very cumbersome in some cases. In most Indo-European languages, for example, it is very difficult to talk about people without mentioning their real or assumed genders all the time, and it may be very challenging to communicate mathematical ideas in an Aboriginal language that has a very rudimentary number system.

Many programmers seem to believe that the Sapir-Whorf hypothesis also works with programming languages. Edsger Dijkstra, for example, was definitely quite Whorfian when stating that teaching BASIC programming to students made them "mentally mutilated beyond hope of regeneration". The fabric of a programming language stems from its abstract structure, not much unlike those of natural languages, although a major difference is that the fabrics of programming languages tend to be much "purer" and more clear-cut, as they are typically geared towards specific application areas, computation paradigms and software development philosophies.

Beyond programming languages there are computer platforms. In the context of audiovisual computer art, the fabric of a hardware platform stems both from its "general-purpose" computational capabilities and the characteristics of its special-purpose circuitry, especially the video and sound hardware. The effects of the fabric tend to be the clearest in the most restricted platforms, such as 8-bit home computers and video game consoles. The different fabrics ("limitations") of different platforms are something that demoscene artists have traditionally been concerned about. Nowadays, there is even an academic discipline with an expanding series of books, "Platform Studies", that asks how video games and other forms of computer art have been shaped by the fabrics of the platforms they've been made for.

The fabric of a human culture stems from a wide memetic mess including things like taboos, traditions, codes of conduct, and, of course, language. In modern societies, a lot stems from bureaucratic, economic and regulatory mechanisms. Behavior-shaping mechanisms are also very prominent in things like video games, user interfaces and interactive websites, where they form a major part of the fabric. The fabric of a musical instrument stems partly from its user interface and partly from its different acoustic ranges and other "limitations". It is indeed possible to extend the "fabric theory" to quite a wide variety of concepts, even though it may get a little bit far-fetched at times.

Noticing one's own box

In many cases, a fabric can become transparent or even invisible. Those who only speak one language can find it difficult to think beyond its fabric. Likewise, those who only know about one culture, one worldview, one programming language, one technique for a specific task or one just-about-anything need some considerable effort to even notice the fabric, let alone expand their horizons beyond it. History shows that this kind of mental poverty leads even some very capable minds into quite disastrous thoughts, ranging from general narrow-mindedness and false sense of objectivity to straightforward religious dogmatism and racism.

In the world of computing, difficult-to-notice fabrics come out as standards, de-facto standards and "best practices". Jaron Lanier warns about "lock-ins", restrictive standards that are difficult to outthink. MIDI, for example, enforces a specific, finite formalization of musical notes, effectively narrowing the expressive range of a lot of music. A major concern risen by "You are not a gadget" is that technological lock-ins of on-line communication (e.g. those prominent in Facebook) may end up trivializing humanity in a way similar to how MIDI trivializes music.

Of course, there's nothing wrong with standards per se. Standards, also including constructs such as lingua francas and social norms, can be very helpful or even vital to humanity. However, when a standard becomes an unquestionable dogma, there's a good chance for something evil to happen. In order to avoid this, we always need individuals who challenge and deconstruct the standards, keeping people aware of the alternatives. Before we can think outside the box, we must first realize that we are in a box in the first place.

Constraints

In order to make a fabric more visible and tangible, it is often useful to introduce artificial constraints to "tighten it up". In a human language, for example, one can adopt a form of constrained writing, such as a type of poetry, to bring up some otherwise-invisible aspects of the linguistic fabric. In normal, everyday prose, words are little more than arbitrary sequences of symbols, but when working under tight constraints, their elementary structures and mutual relationships become important. This is very similar to what happens when programming in a constrained environment: previously irrelevant aspects, such as machine code instruction lengths, suddenly become relevant.

Constrained programming has long traditions in a multitude of hacker subcultures, including the demoscene, where it has obtained a very prominent role. Perhaps the most popular type of constraint in all hacker subcultures in general is the program length constraint, which sets an upper limit to the size of either the source code or the executable. It seems to be a general rule that working with ever smaller program sizes brings the programmer ever closer to the underlying fabric: in larger programs, it is possible to abstract away a lot of it, but under tight constraints, the programmer-artist must learn to avoid abstraction and embrace the fabric the way it is. In the smallest size classes, even such details as the ordering of sound and video registers in the I/O space become form-giving, as seen in the sub-32-byte C-64 demos by 4mat of Ate Bit, for example.

Mind-benders

Sometimes a language or a platform feels tight enough even without any additional constraints. A lot of this feeling is subjective, caused by the inability to express oneself in the previously learned way. When learning a new human language that is completely different to one's mother tongue, one may feel restricted when there's no counterpart for a specific word or grammatical cosntruct. When encountering such a "boundary", the learner needs to rethink the idea in a way that goes around it. This often requires some mind-bending. The same phenomenon can be encountered when learning different programming languages, e.g. learning a declarative language after only knowing imperative ones.

Among both human and programming languages, there are experimental languages that have been deliberately constructed as "mind-benders", having the kind of features and limitations that force the user to rethink a lot of things when trying to express an idea. Among constructed human languages, a good example is Sonja Elen Kisa's minimalistic "Toki Pona" that builds everything from just over 120 basic words. Among programming languages, the mind-bending experiments are called "esoteric programming languages", with the likes of Brainfuck and Befunge often mentioned as examples.

In computer platforms, there's also a lot of variance in "objective tightness". Large amounts of general-purpose computing resources make it possible to accurately emulate smaller computers; that is, a looser fabric may sometimes completely engulf a tighter one. Because of this, the experience of learning a "bigger" platform after a "smaller" one is not usually very mind-bending compared to the opposite direction.

Nothing is neutral

Now, would it be possible to create a language or a computer that would be totally neutral, objective and universal? I don't think so. Trying to create something that lacks fabric is like trying to sculpt thin air, and fabrics are always built from arbitrarities. Whenever something feels neutral, the feeling is usually deceptive.

Popular fabrics are often perceived as neutral, although they are just as arbitrary and biased as the other ones. A tribe that doesn't have very much contact with other tribes typically regards its own language and culture as "the right one" and everyone else as strange and deviant. When several tribes come together, they may choose one language as their supposedly neutral lingua franca, and a sufficiently advanced group of tribes may even construct a simplified, bland mix-up of all of its member languages, an "Esperanto". But even in this case, the language is by no means universal; the fabric that is common between the source languages is still very much present. Even if the language is based on logical principles, i.e. a "Lojban", the chosen set of principles is arbitrary, not to mention all the choices made when implementing those principles.

Powerful computers can usually emulate many less powerful ones, but this does not make them any less arbitrary. On the contrary, modern IBM PC compatibles are full of arbitrary desgin choices stacked on one another, forming a complex spaghetti of historical trials and errors that would make no sense at all if designed from scratch. The modern IBM PC platform therefore has a very prominent fabric, and the main reason why it feels so neutral is its popularity. Another reason is that the other platforms have many a lot of the same design choices, making today's computer platforms much less diverse than what they were a couple of decades ago. For example, how many modern platforms can you name that use something other than RGB as their primary colorspace, or something other than a power of two as their word length?

Diversity is diminishing in many other areas as well. In countries with an astounding diversity, like Papua-New-Guinea, many groups are abandoning their unique native languages and cultures in favor of bigger and more prestigious ones. I see some of that even in my own country, where many young and intelligent people take pride in "thinking in English", erroreusnly assuming that second-language English would be somehow more expressive for them than their mother tongue. In a dystopian vision, the diversity of millennia-old languages and cultures is getting replaced by a global English-language monoculture where all the diversity is subcultural at best.

Conclusion

It indeed seems to be possible to talk about human languages, cultures, programming languages, computing platforms and many other things with similar concepts. These concepts also seem so useful at times that I'm probably going to use them in subsequent articles as well. I also hope that this article, despite its length, gives some food for thought to someone.

Now, go to the world and embrace the mind-bending diversity!

Friday, 30 December 2011

IBNIZ - a hardcore audiovisual virtual machine and an esoteric programming language

Some days ago, I finished the first public version of my audiovisual virtual machine, IBNIZ. I also showed it off on YouTube with the following video:

As demonstrated by the video, IBNIZ (Ideally Bare Numeric Impression giZmo) is a virtual machine and a programming language that generates video and audio from very short strings of code. Technically, it is a two-stack machine somewhat similar to Forth, but with the major execption that the stack is cyclical and also used at an output buffer. Also, as every IBNIZ program is implicitly inside a loop that pushes a set of loop variables on the stack on every cycle, even an empty program outputs something (i.e. a changing gradient as video and a constant sawtooth wave as audio).


How does it work?

To illustrate how IBNIZ works, here's how the program ^xp is executed, step by step:

So, in short: on every loop cycle, the VM pushes the values T, Y and X. The operation ^ XORs the values Y and X and xp pops off the remaining value (T). Thus, the stack gets filled by color values where the Y coordinate is XORed by the X coordinate, resulting in the ill-famous "XOR texture".

The representation in the figure was somewhat simplified, however. In reality, IBNIZ uses 32-bit fixed-point arithmetic where the values for Y and X fall between -1 and +1. IBNIZ also runs the program in two separate contexts with separate stacks and internal registers: the video context and the audio context. To illustrate this, here's how an empty program is executed in the video context:

The colorspace is YUV, with the integer part of the pixel value interpreted as U and V (roughly corresponding to hue) and the fractional part interpreted as Y (brightness). The empty program runs in the so-called T-mode where all the loop variables -- T, Y and X -- are entered in the same word (16 bits of T in the integer part and 8+8 bits of Y and X in the fractional). In the audio context, the same program executes as follows:

Just like in the T-mode of the video context, the VM pushes one word per loop cycle. However, in this case, there is no Y or X; the whole word represents T. Also, when interpreting the stack contents as audio, the integer part is ignored altogether and the fractional part is taken as an unsigned 16-bit PCM value.

Also, in the audio context, T increments in steps of 0000.0040 while the step is only 0000.0001 in the video context. This is because we need to calculate 256x256 pixel values per frame (nearly 4 million pixels if there are 60 frames per second) but suffice with considerably fewer PCM samples. In the current implementation, we calculate 61440 audio samples per second (60*65536/64) which is then downscaled to 44100 Hz.

The scheduling and main-looping logic is the only somewhat complex thing in IBNIZ. All the rest is very elementary, something that can be found as instructions in the x86 architecture or as words in the core Forth vocabulary. Basic arithmetic and stack-shuffling. Memory load and store. An if/then/else structure, two kinds of loop structures and subroutine definition/calling. Also an instruction for retrieving user input from keyboard or pointing device. Everything needs to be built from these basic building blocks. And yes, it is Turing complete, and no, you are not restricted to the rendering order provided by the implicit main loop.

The full instruction set is described in the documentation. Feel free to check it out experiment with IBNIZ on your own!


So, what's the point?

The IBNIZ project started in 2007 with the codename "EDAM" (Extreme-Density Art Machine). My goal was to participate in the esoteric programming language competition at the same year's Alternative Party, but I didn't finish the VM at time. The project therefore fell to the background. Every now and then, I returned to the project for a short while, maybe revising the instruction set a little bit or experimenting with different colorspaces and loop variable formats. There was no great driving force to insppire me to finish the VM until mid-2011 after some quite succesful experiments with very short audiovisual programs. Once some of my musical experiments spawned a trend that eventually even got a name of its own, "bytebeat", I really had to push myself to finally finishing IBNIZ.

The main goal of IBNIZ, from the very beginning, was to provide a new platform for the demoscene. Something without the usual fallbacks of the real-world platforms when writing extremely small demos. No headers, no program size overhead in video/audio access, extremely high code density, enough processing power and preferrably a machine language that is fun to program with. Something that would have the potential to displace MS-DOS as the primary platform for sub-256-byte demoscene productions.

There are also other considerations. One of them is educational: modern computing platforms tend to be mind-bogglingly complex and highly abstracted and lack the immediacy and tangibility of the old-school home computers. I am somewhat concerned that young people whose mindset would have made them great programmers in the eighties find their mindset totally incompatible with today's mainstream technology and therefore get completely driven away from programming. IBNIZ will hopefully be able to serve as an "oldschool-style platform" in a way that is rewarding enough for today's beginninng programming hobbyists. Also, as the demoscene needs all the new blood it can get, I envision that IBNIZ could serve as a gateway to the demoscene.

I also see that IBNIZ has potential for glitch art and livecoding. By taking a nondeterministic approach to experimentation with IBNIZ, the user may encounter a lot of interesting visual and aural glitch patterns. As for livecoding, I suspect that the compactness of the code as well as the immediate visibility of the changes could make an IBNIZ programming performance quite enjoyable to watch. The live gigs of the chip music scene, for example, might also find use for IBNIZ.


About some design choices and future plans

IBNIZ was originally designed with an esoteric programming language competition in mind, and indeed, the language has already been likened to the classic esoteric language Brainfuck by several critical commentators. I'm not that sure about the similarity with Brainfuck, but it does have strong conceptual similarities with FALSE, the esoteric programming language that inspired Brainfuck. Both IBNIZ and FALSE are based on Forth and use one-character-long instructions, and the perceived awkwardness of both comes from unusual, punctuation-based syntax rather than deliberate attempts at making the language difficult.

When contrasting esotericity with usefulness, it should be noted that many useful, mature and well-liked languages, such as C and Perl, also tend to look like total "line noise" to the uninitiated. Forth, on the other hand, tends to look like mess of random unrelated strings to people unfamiliar with the RPN syntax. I therefore don't see how the esotericity of IBNIZ would hinder its usefulness any more than the usefulness of C, Perl or Forth is hindered by their syntaxes. A more relevant concern would be, for example, the lack of label and variable names in IBNIZ.

There are some design choices that often get questioned, so I'll perhaps explain the rationale for them:

  • The colors: the color format has been chosen so that more sensible and neutral colors are more likely than "coder colors". YUV has been chosen over HSV because there is relatively universal hardware support for YUV buffers (and I also think it is easier to get richer gradients with YUV than with HSV).
  • Trigonometric functions: I pondered for a long while whether to include SIN and ATAN2 and I finally decided to do so. A lot of demoscene tricks depend, including all kinds of rotating and bouncing things as well as more advanced stuff such as raycasting, depends on the availability of trigonometry. Both of these operations can be found in the FPU instruction set of the x86 and are relatively fundamental mathematical stuff, so we're not going into library bloat here.

  • Floating point vs fixed point: I considered floating point for a long while as it would have simplified some advanced tricks. However, IBNIZ code is likely to use a lot of bitwise operations, modular bitwise arithmetic and indefinitely running counters which may end up being problematic with floating-point. Fixed point makes the arithmetic more concrete and also improves the implementability of IBNIZ on low-end platforms that lack FPU.
  • Different coordinate formats: TYX-video uses signed coordinates because most effects look better when the origin is at the center of the screen. The 'U' opcode (userinput), on the other hand, gives the mouse coordinates in unsigned format to ease up pixel-plotting (you can directly use the mouse coordinates as part of the framebuffer memory address). T-video uses unsigned coordinates for making the values linear and also for easier coupling with the unsigned coordinates provided by 'U'.

Right now, all the existing implementations of IBNIZ are rather slow. The C implementation is completely interpretive without any optimization phase prior to execution. However, a faster implementation with some clever static analysis is quite high on the to-do list, and I expect a considerable performance boost once native-code JIT compilers come into use. After all, if we are ever planning to displace MS-DOS as a sizecoding platform, we will need to get IBNIZ to run at least faster than DOSBOX.

The use of externally-provided coordinate and time values will make it possible to scale a considerable portion of IBNIZ programs to a vast range of different resolutions from character-cell framebuffers on 8-bit platforms to today's highest higher-than-high-definition standards. I suspect that a lot of IBNIZ programs can be automatically compiled into shader code or fast C-64 machine language (yes, I've made some preliminary calculations for "Ibniz 64" as well). The currently implemented resolution, 256x256, however, will remain as the default resolution that will ensure compatibility. This resolution, by the way, has been chosen because it is in the same class with 320x200, the most popular resolution of tiny MS-DOS demos.

At some point of time, it will also become necessary to introduce a compact binary representation of IBNIZ code -- with variable bit lengths primarily based on the frequency of each instruction. The byte-per-character representation already has a higher code density than the 16-bit x86 machine language, and I expect that a bit-length-optimized representation will really break some boundaries for low size classes.

An important milestone will be a fast and complete version that runs in a web brower. I expect this to make IBNIZ much more available and accessible than it is now, and I'm also planning to host an IBNIZ programming contest once a sufficient web implementation is on-line. There is already a Javascript implementation but it is rather slow and doesn't support sound, so we will still have to wait for a while. But stay tuned!

Tuesday, 15 November 2011

Materiality and the demoscene: when does a platform feel real?

I've just finished reading Daniel Botz's 428-page PhD dissertation "Kunst, Code und Maschine: Die Ästhetik der Computer-Demoszene".

The book is easily the best literary coverage of the demoscene I've seen so far. It is basically a history of demos as an artform with a particular emphasis on the esthetical aspects of demos, going very deeply into different styles and techniques and their development, often in relation to the features of the three "main" demoscene platforms (C-64, Amiga and PC).

What impressed me the most in the book and gave me most food for thought, however, was the theoretical insight. Botz uses late Friedrich Kittler's conception of media materiality as a theoretical device to explain how the demoscene relates to the hardware platforms it uses, often contrasting the relationship to that of the mainstream media art. In short: the demoscene cares about the materiality of the platforms, while the mainstream art world ignores it.

To elaborate: mainstream computer artists regard computers as tools, universal "anything machines" that can translate pure, immaterial, technology-independent ideas into something that can be seen, heard or otherwise experienced. Thus, ideas come before technology. Demosceners, however, have an opposite point of view; for them, technology comes before ideas. A computer platform is seen as a material that can be brought into different states, in a way comparable to how a sculptor brings blocks of stone into different forms. The possibilities of a material can be explored with direct, uncompromising interaction such as low-level programming. The platform is not neutral, its characteristics are essential to what demos written for it end up being like. While a piece of traditional computer art can often be safely removed from its specific technological context, a demo is no longer a demo if the platform is neglected.

The focus on materiality also results in a somewhat unusual relationship with technology. For most people, computer platforms are just evolutionary stages on a timeline of innovation and obsolescence. A device serves for a couple of years before getting abandoned in favor of a new model that is essentially the same with higher specs. The characteristics of a digital device boil down to numerical statistics in the spirit of "bigger is better". The demoscene, however, sees its platforms as something more multi-faceted. An old computer or gaming console may be interesting as an artistic material just because of its unique combination of features and limitations. It is fine to have historical, personal or even political reasons for choosing a specific platform, but they're not necessary; the features of the system alone are enough to grow someone's creative enthusiasm. As so many people misunderstand the relationship between demoscene and old hardware as a form of "retrocomputing", it is very delightful to see such an accurate insight to it.

But is it really that simple?

I'm not entirely familiar with the semantic extent of "materiality" in media studies, but it is apparent that it primarily refers to physicality and concreteness. In many occasions, Botz contrasts materiality against virtuality, which, I think, is an idea that stems from Gilles Deleuze. This dichotomy is simple and appealing, but I disagree with Botz in how central it is to what the demoscene is doing. After all, there are, for example, quite many 8-bit-oriented demoscene artists who totally approve virtualization. Artists who don't care whether their works are shown with emulators or real hardware at parties, as long as the logical functionality is correct. Some even produce art for the C-64 without having ever owned a material C-64. Therefore, virtualization is definitely not something that is universally frowned upon on the demoscene. It is apparently also possible to develop a low-level, concrete material relationship with an emulated machine, a kind of "material" that is totally virtual to begin with!

Computer programming is always somewhat virtual, even in its most down-to-the-metal incarnations. Bits aren't physical objects; concentrations of electrons only get the role of bits from how they interact with the transistors that form the logical circuits. A low-level programmer who strives for a total, optimal control of a processor doesn't need to be familiar with these material interactions; just knowing the virtual level of bits, registers, opcodes and pipelines is enough. The number of abstraction layers between the actual bit-twiddling and the layer visible to the programmer doesn't change how programming a processor feels like. A software emulator or an FPGA reimplementation of the C-64 can deliver the same "material feeling" to the programmer as the original, NMOS-based C-64. Also, if the virtualization is perfect enough to model the visible and audible artifacts that stem from the non-binary aspects of the original microchips, even a highly experienced enthusiast can be fooled.

I therefore think it is more appropriate to consider the "feel of materiality" that demosceners experience to stem from the abstract characteristics of the platform than its physicality. Programming an Atari VCS emulator running in an X86 PC on top of an operating system may very well feel more concrete than programming the same PC directly with the X86 assembly language. When working with a VCS, even a virtualized one, a programmer needs to be aware of the bit-level machine state at all times. There's no display memory in the VCS; the only way to draw something on the screen is by telling the processor to put specific values in specific video chip registers at specific clock cycles. The PC, however, does have a display memory that holds the pixel values of the on-screen picture, as well as a video chip that automatically refreshes its contents to the screen. A PC programmer can therefore use very generic algorithms to render graphics in the display memory without caring about the underlying hardware, while on the VCS everything needs to be thought out from the specific point of view of the video chip and the CPU.

It seems that the "feel of materiality" has particularly much to do with complexity -- of both the platform and the manipulated data. A high-resolution picture, taking up megabytes of display memory, looks nearly identical on a computer screen regardless of whether it is internally represented in RGB or YUV colorspace. However, when we get a pixel artist to create versions of the same picture for various formats that use less than ten kilobytes of display memory, such as PC textmode or C-64 multicolor, the specific features and constraints of each format shine out very clearly. High levels of complexity allow for generic, platform-independent and general-purpose techniques whereas low levels of complexity require the artist to form a "material relationship" with the format.

Low complexity and the "feel of materiality" are also closely related to the "feel of total control" which I regard as an important state that demosceners tend to reach for. The lower the complexity of a platform, the easier it is to reach a total understanding of its functionality. Quite often, coders working on complex platforms choose to deliberately lower the perceived complexity by concentrating on a reduced, "essential" subset of the programming interface and ignoring the rest. Someone who codes for a modern PC, for example, may want to ignore the polygonal framework of the 3D API altogether and exclusively concentrate on shader code. Those who write softsynths, even for tiny size classes, tend to ignore high-level synthesis frameworks that may be available on the OS and just use a low-level PCM-soundbuffer API. Subsets that provide nice collections of powerful "Lego blocks" are the way to go. Even though bloated system libraries may very well contain useful routines that can be discovered and abused in things like 4-kilobyte demos, most democoders frown upon this idea and may even consider it cheating.

Emulators, virtual platforms and reduced programming interfaces are ways of creating pockets of lowered complexity within highly complex systems -- pockets that feel very "material" and controllable for a crafty programmer. Even virtual platforms that are highly abstract, idealistic and mathematical may feel "material". The "oneliner music platform", merely defined as C-like expression syntax that calculates PCM sample values, is a recent example of this. All of its elements are defined on a relatively high level, no specification of any kind of low-level machine, virtual or otherwise. Nevertheless, a kind of "material characteristic" or "immanent esthetics" still emerges from this "platform", both in how the sort formulas tend to sound like and what kind of hacks and optimizations are better than others.

The "oneliner music platform" is perhaps an extreme example, but in general, purely virtual platforms have been there for a while already. Things like Java demos, as well as multi-platform portable demos, have been around since the late 1990s, although they've usually remained quite marginal. For some reason, however, Botz seems to ignore this aspect of the demoscene nearly completely, merely stating that multi-platform demos have started to appear "in recent years" and that the phenomenon may grow bigger in the future. Perhaps this is a deliberate bias chosen to avoid topics that don't fit well within Botz's framework. Or maybe it's just an accident. I don't know.

Conclusion

To summarize: when Botz talks about the materiality of demoscene platforms, he often refers to phenomena that, in my opinion, could be more fruitfully analyzed with different conceptual devices, especially complexity. Wherever the dichotomy of materiality and immateriality comes up, I see at least three separate conceptual dimensions working under the hood:

1. Art vs craft (or "idea-first" vs "material-first"). This is the area where Botz's theory works very well: demoscene is, indeed, more crafty or "material-first" than most other communities of computer art. However, the material (i.e. the demo platform) doesn't need to be material (i.e. physical); the crafty approach works equally well with emulated and purely virtual platforms. The "artsy" approach, leading to conceptual and "avant-garde" demos, has gradually become more and more accepted, however there's still a lot of crafty attitude in "art demos" as well. I consider chip musicians, circuit-benders and homebrew 8-bit developers about as crafty on average as demosceners, by the way.

2. Physicality vs virtuality. There's a strong presence of classic hardware enthusiasm on the demoscene as well as people who build their own hardware, and they definitely are in the right place. However, I don't think the physical hardware aspect is as important in the demoscene as, for example in the chip music, retrogaming and circuit-bending communities. On the demoscene, it is more important to demonstrate the ability to do impressive things in limited environments than to be an owner of specific physical gear or to know how to solder. A C-64 demo can be good even if it is produced with an emulator and a cross-compiler. Also, as demo platforms can be very abstract and purely virtual as well and still be appealing to the subculture, I don't think there's any profound dogma that would drive demosceners towards physicality.

3. Complexity. The possibility of forming a "material relationship" with an emulated platform shows that the perception of "materiality", "physicality" and "controllability" is more related to the characteristics of the logical platform than to how many abstraction layers there are under the implementation. A low computational complexity, either in the form of platform complexity or program size, seems to correlate with a "feeling of concreteness" as well as the prominence of "emergent platform-specific esthetics". What I see as the core methodology of the demoscene seems to work better at low than high levels of complexity and this is why "pockets of lowered complexity" are often preferred by sceners.

Don't take me wrong: despite all the disagreements and my somewhat Platonist attitude to abstract ideas in general, I still think virtuality and immateriality have been getting too much emphasis in today's world and we need some kind of a countercultural force that defends the material. Botz also covers possible countercultural aspects of the demoscene, deriving them from the older hacker culture, and I found all of them very relevant. My basic disagreement comes from the fact that Botz's theory doesn't entirely match with how I perceive the demoscene to operate, and the subculture as a whole cannot therefore be put under a generalizing label such as "defenders and lovers of the materiality of the computer".

Anyway, I really enjoyed reading Botz's book and especially appreciated the theoretical insight. I recommend the book to everyone who is interested in the demoscene, its history and esthetic variety, AND who reads German well. I studied the language for about five years at school but I still found the text quite difficult to decipher at places. I therefore sincerely hope that my problems with the language haven't led me to any critical misunderstandings.

Friday, 28 October 2011

Some deep analysis of one-line music programs.

It is now a month since I posted the YouTube video "Experimental music from very short C programs" and three weeks since I blogged about it. Now that the initial craze seems to be over, it's a good time to look back what has been done and consider what could be done in the future.

The developments since my last post can be summarized by my third video. It still represents the current state of the art quite well and includes a good variety of different types of formulas.


The videos only show off a portion of all the formulas that could be included. To compensate, I've created a text file where I've collected all the "worthy" formulas I've encountered so far. Most of them can be tested in the on-line JavaScript and ActionScript test tools. Some of them don't even work directly in C code, as they depend on JS/AS-specific features.

As I'm sure that many people still find these formulas rather magical and mysterious, I've decided to give you a detailed technical analysis and explanation on the essential techniques. As I'm completely self-educated in music theory, please pardon my notation and terminology that may be unorthodox at times. You should also have a grasp of C-like expression syntax and binary arithmetic to understand most of the things I'm going to talk about.

I've sorted my formula collection by length. By comparing the shortest and longest formulas, it is apparent that the longest formulas show a much more constructivist approach, including musical data stored in constants as well as entire piece-by-piece-constructed softsynths. The shortest formulas, on the other hand, are very often discovered via non-deterministic testing, from educated guesses to pure trial-and-error. One of my aims with this essay is to bring some understanding and determinism to the short side as well.

Pitches and scales

A class of formulas that is quite prominent among the shortest ones is what I call the 't* class'. The formulas of this type multiply the time counter t with some expression, resulting in a sawtooth wave that changes its pitch according to that expression.

A simple example of a t*-class formula would be t*(t>>10) which outputs a rising and falling sound (accompanied by some aliasing artifacts that create their own sounds). Now, if we introduce an AND operator to this formula, we can restrict the set of pitches and thus create melodies. An example that has been individually discovered by several people, is the so-called "Forty-Two Melody": t*(42&t>>10) or t*2*(21&t>>11).

The numbers that indicate pitches are not semitones or anything like that, but multiplies of a base frequency (sampling rate divided by 256, i.e. 31.25 Hz at the default 8 kHz rate). Here is a table that maps the integer pitches 1..31 to cents and Western note names. The pitches on a gray background don't have good counterparts in the traditional Western system, so I've used quarter-tone flat and sharp symbols to give them approximate names.


By using this table, we can decode the Forty-Two Melody into a human-readable form. The melody is 32 steps long and consists of eight unique pitch multipliers (including zero which gives out silence).


The "Forty-Two Melody" contains some intervals that make it sound a little bit silly, detuned or "Arabic" to Western ears. If we want to avoid this effect, we need to design our formulas so that they only yield pitches that are at familiar intervals from one another. A simple solution is to include a modulo operator to wrap larger numbers to the range where simple integer ratios are more probable. Modifying the Forty-Two Melody into t*((42&t>>10)%14), for example, completely transforms the latter half of the melody into something that sounds a little bit nicer to Western ears. Bitwise AND is also useful for limiting the pitch set to a specific scale; for example t*(5+((t>>11)&5)) produces pitch multipliers of 4, 5, 8 and 9, which correspond to E3, G3, C4 and D4.

Ryg's 44.1 kHz formula presented in the third video contains two different melody generators:

((t*("36364689"[t>>13&7]&15))/12&128)
+(((((t>>12)^(t>>12)-2)%11*t)/4|t>>13)&127)

The first generator, in the first half of the formula, is based on a string constant that contains a straight-forward list of pitches. This list is used for the bass pattern. The other generator, whose core is the subexpression ((t>>12)^(t>>12)-2)%11, is more interesting, as it generates a rather deep self-similar melody structure with just three operators (subtraction, exclusive or, modulo). Rather impressive despite its profound repetitiveness. Here's an analysis of the series it generates:

It is often a good idea to post-process the waveform output of a plain t* formula. The sawtooth wave tends to produce a lot of aliasing artifacts, particularly at low sampling rates. Attaching a '&128' or '&64' in the end of a t* formula switches the output to square wave which usually sounds a little bit cleaner. An example of this would be Niklas Roy's t*(t>>9|t>>13)&16 which sounds a lot noisier without the AND (although most of the noise in this case comes from the unbounded multiplication arithmetic, not from aliasing).

Bitwise waveforms and harmonies

Another class of formulas that is very prominent among the short ones is the bitwise formula. At its purest, such a formula only uses bitwise operations (shifts, negation, AND, OR, XOR) combined with constants and t. A simple example is t&t>>8 -- the "Sierpinski Harmony". Sierpinski triangles appear very often in plotted visualizations of bitwise waveforms, and t&t>>8 represents the simplest type of formula that renders into a nice Sierpinski triangle.

Bitwise formulas often sound surprisingly multitonal for their length. This is based on the fact that an 8-bit sawtooth wave can be thought of consisting of eight square waves, each an octave apart from its neighbor. Usually, these components fuse together in the human brain, forming the harmonics of a single timbre, but if we turn them on and off a couple of times per second or slower, the brain might perceive them as separate tones. For example, t&48 sounds quite monotonal, but in t&48&t>>8, the exactly same waveform sounds bitonal because it abruptly extends the harmonic content of the previous waveform.

The loudest of the eight square-wave components of an 8-bit wave is, naturally, the one represented by the most significant bit (&128). In the sawtooth wave, it is also the longest in wavelength. The second highest bit (&64) represents a square wave that has half the wavelength and amplitude, the third highest halves the parameters once more, and so on. By using this principle, we can analyze the musical structure of the Sierpinski Harmony:


The introduction of ever lower square-wave components can be easily heard. One can also hear quite well that every newly introduced component is considerably lower in pitch than the previous one. However, if we include a prime multiplier in the Sierpinski Harmony, we will encounter an anomaly. In (t*3)&t>>8, the loudest tone actually goes higher at a specific point (and the interval isn't an octave either).

This phenomenon can be explained with aliasing artifacts and how they are processed by the brain. The main wavelength in t*3 is not constant but alternates between two values, 42 and 43, averaging to 42.67 (256/3). The human mind interprets this kind of sound as a waveform of the average length (42.67 samples) accompanied by an extra sound that represents the "error" (or the difference from the ideal wave). In the t*3 example, this extra sound has a period of 256 samples and sounds like a buzzer when listened separately.

The smaller the wavelengths we are dealing with are, the more prominent these aliasing artifacts become, eventually dominating over their parent waveforms. By listening to (t*3)&128, (t*3)&64 and (t*3)&32, we notice an interval of an octave between them. However, when we step over from (t*3)&32 to (t*3)&16, the interval is definitely not an octave. This is the threshold where the artifact wave becomes dominant. This is why t&t>>8, (t*3)&t>>8 and (t*5)&t>>8 sound so different. It is also the reason why high-pitched melodies may sound very detuned.

Variants of the Sierpinski harmony can be combined to produce melodies. Examples of this approach include:

t*5&(t>>7)|t*3&(t*4>>10) (from miiro)

(t*5&t>>7)|(t*3&t>>10) (from viznut)

t*9&t>>4|t*5&t>>7|t*3&t/1024 (from stephth)

Different counters are the driving force of bitwise formulas. At their simplest, counters are just bitshifted versions of the main counter (t). These are implicitly synchronized with each other and work on different temporal levels of the musical piece. However, it has also been fruitful to experiment with counters that don't have a simple common denominator, and even with ones whose speeds are nearly identical. For example, t&t%255 brings a 256-cycle counter and a 255-cycle counter together with an AND operation, resulting in an ambient drone sound that sounds like something achievable with pulse-width modulation. This approach seems to be more useful for loosely structured soundscapes than clear-cut rhythms or melodies.

Some oneliner songs attach a bitwise operation to a melody generator for transposing the output by whole octaves. A simple example is Rrrola's t*(0xCA98>>(t>>9&14)&15)|t>>8 which would just loop a simple series of notes without the trailing '|t>>8'. This part gradually fixes the upper bits of the output to 1s, effectively raising the pitch of the melody and fading its volume out. Also the formulas from Ryg and Kb in my third video use this technique. The most advanced use of it I've seen so far, however, is in Mu6k's song (the last one in the 3rd video) which synthesizes its lead melody (along with some accompanying beeps) by taking the bassline and selectively turning its bits on and off. This takes place within the subexpression (t>>8^t>>10|t>>14|x)&63 where the waveform of the bass is input as x.

Modular wrap-arounds and other synthesis techniques

All the examples presented so far only use counters and bitwise operations to synthesize the actual waveforms. It's therefore necessary to talk a little bit about other operations and their potential as well.

By accompanying a bitwise formula with a simple addition or substraction, it is possible to create modular wrap-around artifacts that produce totally different sounds. Tiny, nearly inaudible sounds may become very dominant. Harmonious sounds often become noisy and percussive. By extending the short Sierpinski harmony t&t>>4 into (t&t>>4)-5, something that sounds like an "8-bit" drum appears on top of it. The same principle can also be applied to more complex Sierpinski harmony derivatives as well as other bitwise formulas:

(t*9&t>>4|t*5&t>>7|t*3&t/1024)-1

I'm not going into a deep analysis of how modular wrap-arounds affect the harmonic structure of a sound, as I guess someone has already done the math before. However, modular addition can be used for something that sounds like oscillator hard-sync in analog synthesizers, although its technical basis is different.

Perhaps the most obvious use for summing in a softsynth, however, is the one where modular wrap-around is not very useful: mixing of several sound sources together. A straight-forward recipe for this is (A&127)+(B&127), which may be a little long-winded when aiming at minimalism. Often, just a simple XOR operation is enough to replace it, although it usually produces artifacts that may sound good or bad depending on the case. XOR can also be used for effects that sound like hard-sync.

Of course, modular wrap-around effects are also achievable with multiplication and division, and on the other hand, even without addition or subtraction. I'll illustrate this with just a couple of interesting-sounding examples:

t>>4|t&((t>>5)/(t>>7-(t>>15)&-t>>7-(t>>15))) (from droid, js/as only)

(int)(t/1e7*t*t+t)%127|t>>4|t>>5|t%127+(t>>16)|t (from bst)

t>>6&1?t>>5:-t>>4 (from droid)

There's a lot in these and other synthesis algorithms that could be discussed, but as they already belong to a zone where traditional sound synthesis lore applies, I choose to go on.

Deterministic composition

When looking at the longest formulas in the collection, it is apparent that there's a lot of intelligent design behind most of them. Long constants and tables, sometimes several of them, containing scales, melodies, basslines and drum patterns. The longest formula in the collection is "Long Line Theory", a cover of the soundtrack of the 64K demo "Chaos Theory" by Conspiracy. The original version by mu6k was over 600 characters long, from which the people on Pouet.net optimized it down to 300 characters, with some arguable quality tradeoffs.

It is, of course, possible to synthesize just about anything with a formula, especially if there's no upper limit for the length. Synthesis and sequencing logic can be built section by section, using rather generic algorithms and proven engineering techniques. There's no magic in it. But on the other hand, there's no magic in pure non-determinism either: it is very difficult to find anything outstanding with totally random experimentation after the initial discovery phase is over.

Many of the more sophisticated formulas seem to have a good balance between random experimentation and deterministic composition. It is often apparent in their structure that some elements are results of random discoveries while others have been built with an engineer's mindset. Let's look at Mu6k's song (presented in the end of the 3rd video, 32 kHz):

(((int)(3e3/(y=t&16383))&1)*35) +
(x=t*("6689"[t>>16&3]&15)/24&127)*y/4e4 +
((t>>8^t>>10|t>>14|x)&63)

I've split the formula on three lines according to the three instruments therein: drum, bass and lead.

My assumption is that the song has been built around the lead formula that was discovered first, probably in the form of t>>6^t>>8|t>>12|t&63 or something (the original version of this formula ran at 8 kHz). As usual with pure bitwise formulas, all the intervals are octaves, but in this case, the musical structure is very nice.

As it is possible to transpose a bit-masking melody simply by transposing the carrier wave, it's a good idea to generate a bassline and reuse it as the carrier. Unlike the lead generator, the bassline generator is very straight-forward in appearance, consisting of four pitch values stored in a string constant. A sawtooth wave is generated, stored to a variable (so that it can be reused by the lead melody generator) and amplitude-modulated.

Finally, there's a simple drum beat that is generated by a combination of division and bit extraction. The extracted bit is scaled to the amplitude of 35. Simple drums are often synthesized by using fast downward pitch-slides and the division approach does this very well.

In the case of Ryg's formula I discussed some sections earlier, I might also guess that the melody generator, the most chaotic element of the system, was the central piece which was later coupled with a bassline generator whose pitches were deliberately chosen to harmonize with the generated melody.

The future

I have been contacted by quite many people who have brought up different ideas of future development. We should, for example, have a social website where anyone could enter new formulas, listen to the in a playlist-like manner and rate them. Another branch of ideas is about the production of new rateable formulas by random generation or by breeding old ones together with genetic algorithms.

All of these ideas are definitely interesting, but I don't think the time is yet right for them. I have been developing my audiovisual virtual machine, which is the main reason why I did these experiments in the first place. I regard the current concept of "oneliner music" as a mere placeholder for the system that is yet to be released. There are too many problems with the C-like infix syntax and other aspects of the concept, so I think it's wiser to first develop a better toy and then think about a community mechanism. However, these are just my own priorities. If someone feels like building the kind of on-line community I described, I'll support the idea.

I've mentioned this toy before. It was previously called EDAM, but now I've chosen to name it IBNIZ (Ideally Bare Numeric Impression giZmo). One of the I letters could also stand for "immediate" or "interactive", as I'm going to emphasize an immediate, hands-on modifiability of the code. IBNIZ will hopefully be relevant as a demoscene platform for extreme size classes, as a test bed for esoteric algorithmic trickery, as an appealing introduction to hard-core minimalist programming, and also as a fun toy to just jam around with. Here's a little screenshot of the current state:


In my previous post, I mentioned the possibility of opening a door for 256-byte demos that are interesting both graphically and musically. The oneliner music project and IBNIZ will provide valuable research for the high-level, algorithmic aspects of this project, but I've also made some
hands-on tests on the platform-level feasability of the idea. It is now apparent that a stand-alone MS-DOS program that generates PCM sound and synchronized real-time graphics can easily fit in less then 96 bytes, so there's a lot of room left for both music and graphics in the 256-byte size
class. I'll probably release a 128- or 256-byte demo as a proof-of-concept, utilizing something derived from a nice oneliner music formula as the soundtrack.

I would like to thank everyone who has been interested in the oneliner music project, as all the hype made me very determined to continue my quests for unleashing the potential of the bit and the byte. My next post regarding this quest will probably appear once there's a version of IBNIZ worth releasing to the public.