Showing posts with label computationally minimal art. Show all posts
Showing posts with label computationally minimal art. Show all posts

Thursday, 25 September 2014

Choosing low-tech visual styles for games

A month ago, I participated in Ludum Dare, a 48-hour game development contest. This was the first time I finished a game-like project since about 2005.

The theme of the contest was "connected worlds". I made a game called Quantum Dash that experiments with parallel universes as a central game mechanic. The player operates in three universes at the same time, and when connecting "interdimensional cords", the differences between these universes explosively cancel each other. The "Dash" part in the name refers to the Boulder Dash style grid physics I used. I found the creation process very refreshing, I am quite happy with the result considering the circumstances, and I will very likely continue making games (or at least rapid prototypes thereof).



My relationship with computer games became somewhat dissonant during the nineties. At that time, the commercial industry became radically more centralized and profit-oriented. Eccentric European coder-auteur-heroes disappeared from computer magazines, giving way to American industry giants and their campaigns. There was also the rise of the "gamer" subculture that I considered rather repulsive from early on due to its glorification of hardware upgrades and disinterest towards real computer skills.

Profit maximization in the so-called serious game industry is largely driven by a specific, Hollywood-style "bigger is better" approach to audiovisual esthetics. That is, a strive for photorealism. This approach is, of course, very appealing to shareholders: It is easy to imagine the grail -- everyone knows what the real world looks like -- but no one will ever reach it despite getting closer all the time. Increases in processing power and development budgets quite predictably map to increases in photorealism. There is also inherent obsolescence: yesterday's near-photorealism looks bad compared to today's near-photorealism, so it is easy to make consumers desire revamped versions of earlier titles instead of anything new.

In the early noughties, the cult of photorealism was still so dominant that even non-commercial and small-scale game productions followed it. Thus, independent games often looked like inadequate, "poor man's" versions of AAA games. But the cult was starting to lose its grip: independent games were already looking for new paths. In his spring 2014 paper, game researcher Jesper Juul gives 2005 as an important year in this respect: since 2005, the Grand Prize winners of the Independent Games Festival have invariably followed styles that diverge from the industrial mainstream.

Juul defines "Independent Style" as follows: "Independent Style is a representation of a representation. It uses contemporary technology to emulate low-tech and usually “cheap” graphical materials and visual styles, signaling that a game with this style is more immediate, authentic and honest than are big-budget titles with high-end 3-dimensional graphics."

The most prominent genre within I.S. is what Juul calls "pixel style", reminiscent of older video game technology and also overlapping with the concept of "Computationally Minimal Art" I formulated a few years ago. My game, Quantum Dash, also fits in this substyle. I found the stylistic approach appealing because it is quick and easy to implement from scratch in a limited time. Part of this easiness stems from the fact that CMA is native to the basic fabric of digital electronic computers. Another attracting aspect is the long tradition of low-tech video games which makes it easy to reflect prior work and use the established esthetic language.

Another widely used approach simulates art made with physical materials such as cut-out paper (And Yet It Moves) or wax pastels on paper (Crayon Physics). Both this approach and the aforementioned pixel style apparently refer to older technologies, which makes it tempting to generalize the idea of past references to other genres of I.S. as well. However, I think Juul somewhat stumbles with this attempt with styles that don't have a real historical predecessor: "The pixel style 3d games Minecraft and Fez also cannot refer to an earlier time when 3d games were commonly made out of large volumetric pixels (voxels), so like Crayon Physics Deluxe, the historical reference is somewhat counterfactual, but still suggests a simpler, if nonexistent, earlier technology."

I think it would be more fruitful to concentrate on complexity than history when analyzing Independent Style. The esthetic possibility space of modern computing is mind-bogglingly large. It is easy to get lost in all the available potential complexity. However, by introducing constraints and stylistic choices that dramatically reduce the complexity, it is easier even for a solo artist to explore and grasp the space. The contraints and choices don't need to refer to any kind of history -- real or counterfactual -- to be effective.

The voxel style in Minecraft can still be considered somewhat historical -- a 3D expansion of grid-based 2D games such as Boulder Dash. However, I suspect that the esthetic experimentation in independent games will eventually lead to a much wider variety of styles and constraints -- including a bunch that cannot be explained with historical references.

The demoscene has been experimenting with different visual styles for a long time. Even at times when technical innovation was the primary concern, the goal was to find new things that just look good -- and realism was just one possible way of looking good. In 1996, when realtime raytracing was a hot new photorealistic thing among democoders, there was a production called Paper by Psychic Link that dropped jaws with its paper-inspired visuals -- a decade before paper simulation became trendy in the independent games scene. Now that the new PC hardware no longer challenges the demo artist the way it used to, there is much more emphasis on stylistic experimentation in non-constrained PC demos.

Because of this longer history of active experimentation, I think it would be useful for many more independent game developers to look for stylistic inspiration in demoscene works. Of course, not all the tricks and effects adapt well to games, but the technological and social conditions in their production are quite similar to those in low-budget games. After all, demos are real-time-rendering computer programs produced by small groups without budgets, usually over relatively short time periods, so there's very little room for "big-budget practices" there.

Here's a short list of demos with unique esthetic elements that might be able to inspire game esthetics as well. Two of them are for 8-bit computers and the rest for (semi-)modern PCs.
I'm expanding into game design and development primarily because I want to experiment with the power of interactivity, especially in relation to some of my greater-than-life goals. So, audiovisuals will be a secondary concern.

Still, due to my background, I want to take effort in choosing a set of simple and lightweight esthetic approaches to be used. They will definitely be computationally minimal, but I want to choose some fresh techniques in order to contrast favorably against the square-pixel style that is already quite mainstream in independent games. But that'll be a topic for another post.

Thursday, 19 April 2012

The relationship between "New Aesthetic" and Computationally Minimal Art

A couple of weeks ago, something called "New Aesthetic" was brought to my attention. It is difficult to find any sort of coherent definition for the idea, but it seems like an umbrella label for a wide variety of visual things that somehow look computational, often in not-so-computational contexts. The main spreader of the meme is apparently a Tumblr blog that collects pictures of things such as pixellated glitches in textiles, real-life voxel sculptures, mugs decorated with website graphics, digitally glitched photographs, satellite images as well as all kinds of other things that evocate suitably futuristic associations.



Despite the profound vagueness of the umbrella term, it is not difficult to notice the general trend it refers to. Just a decade ago, a computationally inspired real-life object would have been a unique novelty item, but nowadays there are such things all around us. I mentioned an aspect of this trend back in 2010 in my article on Computationally Minimal Art, where I noticed that "retrocomputing esthetics" is not just thriving in its respective subcultures (such as demoscene or chip music scene) but popping up every now and then in mainstream contexts as well -- often completely without the historical or nostalgic vibe usually associated with retrocomputing.

As the concept of "New Aesthetic" overlaps a lot of my ponderings, I now feel like building some semantics in order to relate the ideas to one another:

"New Aesthetics", as I see it, is a rather vague umbrella term that contains a wide variety of things but has a major subset that could be called "Computationally Inspired".

"Computationally Inspired" is anything that brings the concepts and building blocks of the "digital world" into non-native contexts. T-shirts, mugs and other real-life objects decorated with big-pixel art or website imagery are obvious examples. In a wide sense, even anything that makes the basic digital building blocks more visible within a digital context might be "Computationally Inspired" as well: big-pixel low-fi computer graphics on a new high-end computer, for example.

"Computationally Minimal" is anything that uses a very low amount of computational resources, often making the digital building blocks such as pixels very discernible. Two years ago, I defined "Computationally Minimal Art" as follows: "[A] form of discrete art governed by a low computational complexity in the domains of time, description length and temporary storage. The most essential features of Computationally Minimal Art are those that persist the longest when the various levels of complexity approach zero."

We can see that Computationally Inspired and Computationally Minimal have a lot of overlap but neither is a subset of another. Cross-stitch patterns are CM almost by definition as they have a limited number of discrete "pixels" with a limited number of different colors, but they are not CI unless they depict something that comes from the "computer world", such as video game characters. On the other hand, a sculpture based on a large amount of digitally corrupted data is definitely CI but falls out of the definition of CM due to the size of the source data.

What CM and CI and especially their intersection have in common, however, is the tendency of showing off discrete digital data and/or computational processes, which gives them a lot of esthetic similarity. In CI, this is usually a goal in itself, while in CM, it is most often a side-effect of the related goal of low computational complexity. In either case, however, the visual result often looks like big-pixel graphics. This has caused confusion among many New Aesthetic bloggers who use adjectives such as "retro", "8-bit" or "nostalgic" when referring to this phenomenon, when what they are witnessing is just a way how the essence of digital technology tends to manifest visually.

There has been a lot of on-line discussion revolving New Aesthetic during the past month, and a lot of it seems like pseudo-intellectual, reality-detached mumbo-jumbo to me. In order to gain some insight and substance, I would like to recommend all the bloggers to take serious a look into the demoscene and other established forms of computer-centric expression. You may also find out that a lot of this stuff is actually not that new to begin with, it has just been gaining a lot of new momentum recently.

Friday, 30 December 2011

IBNIZ - a hardcore audiovisual virtual machine and an esoteric programming language

Some days ago, I finished the first public version of my audiovisual virtual machine, IBNIZ. I also showed it off on YouTube with the following video:

As demonstrated by the video, IBNIZ (Ideally Bare Numeric Impression giZmo) is a virtual machine and a programming language that generates video and audio from very short strings of code. Technically, it is a two-stack machine somewhat similar to Forth, but with the major execption that the stack is cyclical and also used at an output buffer. Also, as every IBNIZ program is implicitly inside a loop that pushes a set of loop variables on the stack on every cycle, even an empty program outputs something (i.e. a changing gradient as video and a constant sawtooth wave as audio).


How does it work?

To illustrate how IBNIZ works, here's how the program ^xp is executed, step by step:

So, in short: on every loop cycle, the VM pushes the values T, Y and X. The operation ^ XORs the values Y and X and xp pops off the remaining value (T). Thus, the stack gets filled by color values where the Y coordinate is XORed by the X coordinate, resulting in the ill-famous "XOR texture".

The representation in the figure was somewhat simplified, however. In reality, IBNIZ uses 32-bit fixed-point arithmetic where the values for Y and X fall between -1 and +1. IBNIZ also runs the program in two separate contexts with separate stacks and internal registers: the video context and the audio context. To illustrate this, here's how an empty program is executed in the video context:

The colorspace is YUV, with the integer part of the pixel value interpreted as U and V (roughly corresponding to hue) and the fractional part interpreted as Y (brightness). The empty program runs in the so-called T-mode where all the loop variables -- T, Y and X -- are entered in the same word (16 bits of T in the integer part and 8+8 bits of Y and X in the fractional). In the audio context, the same program executes as follows:

Just like in the T-mode of the video context, the VM pushes one word per loop cycle. However, in this case, there is no Y or X; the whole word represents T. Also, when interpreting the stack contents as audio, the integer part is ignored altogether and the fractional part is taken as an unsigned 16-bit PCM value.

Also, in the audio context, T increments in steps of 0000.0040 while the step is only 0000.0001 in the video context. This is because we need to calculate 256x256 pixel values per frame (nearly 4 million pixels if there are 60 frames per second) but suffice with considerably fewer PCM samples. In the current implementation, we calculate 61440 audio samples per second (60*65536/64) which is then downscaled to 44100 Hz.

The scheduling and main-looping logic is the only somewhat complex thing in IBNIZ. All the rest is very elementary, something that can be found as instructions in the x86 architecture or as words in the core Forth vocabulary. Basic arithmetic and stack-shuffling. Memory load and store. An if/then/else structure, two kinds of loop structures and subroutine definition/calling. Also an instruction for retrieving user input from keyboard or pointing device. Everything needs to be built from these basic building blocks. And yes, it is Turing complete, and no, you are not restricted to the rendering order provided by the implicit main loop.

The full instruction set is described in the documentation. Feel free to check it out experiment with IBNIZ on your own!


So, what's the point?

The IBNIZ project started in 2007 with the codename "EDAM" (Extreme-Density Art Machine). My goal was to participate in the esoteric programming language competition at the same year's Alternative Party, but I didn't finish the VM at time. The project therefore fell to the background. Every now and then, I returned to the project for a short while, maybe revising the instruction set a little bit or experimenting with different colorspaces and loop variable formats. There was no great driving force to insppire me to finish the VM until mid-2011 after some quite succesful experiments with very short audiovisual programs. Once some of my musical experiments spawned a trend that eventually even got a name of its own, "bytebeat", I really had to push myself to finally finishing IBNIZ.

The main goal of IBNIZ, from the very beginning, was to provide a new platform for the demoscene. Something without the usual fallbacks of the real-world platforms when writing extremely small demos. No headers, no program size overhead in video/audio access, extremely high code density, enough processing power and preferrably a machine language that is fun to program with. Something that would have the potential to displace MS-DOS as the primary platform for sub-256-byte demoscene productions.

There are also other considerations. One of them is educational: modern computing platforms tend to be mind-bogglingly complex and highly abstracted and lack the immediacy and tangibility of the old-school home computers. I am somewhat concerned that young people whose mindset would have made them great programmers in the eighties find their mindset totally incompatible with today's mainstream technology and therefore get completely driven away from programming. IBNIZ will hopefully be able to serve as an "oldschool-style platform" in a way that is rewarding enough for today's beginninng programming hobbyists. Also, as the demoscene needs all the new blood it can get, I envision that IBNIZ could serve as a gateway to the demoscene.

I also see that IBNIZ has potential for glitch art and livecoding. By taking a nondeterministic approach to experimentation with IBNIZ, the user may encounter a lot of interesting visual and aural glitch patterns. As for livecoding, I suspect that the compactness of the code as well as the immediate visibility of the changes could make an IBNIZ programming performance quite enjoyable to watch. The live gigs of the chip music scene, for example, might also find use for IBNIZ.


About some design choices and future plans

IBNIZ was originally designed with an esoteric programming language competition in mind, and indeed, the language has already been likened to the classic esoteric language Brainfuck by several critical commentators. I'm not that sure about the similarity with Brainfuck, but it does have strong conceptual similarities with FALSE, the esoteric programming language that inspired Brainfuck. Both IBNIZ and FALSE are based on Forth and use one-character-long instructions, and the perceived awkwardness of both comes from unusual, punctuation-based syntax rather than deliberate attempts at making the language difficult.

When contrasting esotericity with usefulness, it should be noted that many useful, mature and well-liked languages, such as C and Perl, also tend to look like total "line noise" to the uninitiated. Forth, on the other hand, tends to look like mess of random unrelated strings to people unfamiliar with the RPN syntax. I therefore don't see how the esotericity of IBNIZ would hinder its usefulness any more than the usefulness of C, Perl or Forth is hindered by their syntaxes. A more relevant concern would be, for example, the lack of label and variable names in IBNIZ.

There are some design choices that often get questioned, so I'll perhaps explain the rationale for them:

  • The colors: the color format has been chosen so that more sensible and neutral colors are more likely than "coder colors". YUV has been chosen over HSV because there is relatively universal hardware support for YUV buffers (and I also think it is easier to get richer gradients with YUV than with HSV).
  • Trigonometric functions: I pondered for a long while whether to include SIN and ATAN2 and I finally decided to do so. A lot of demoscene tricks depend, including all kinds of rotating and bouncing things as well as more advanced stuff such as raycasting, depends on the availability of trigonometry. Both of these operations can be found in the FPU instruction set of the x86 and are relatively fundamental mathematical stuff, so we're not going into library bloat here.

  • Floating point vs fixed point: I considered floating point for a long while as it would have simplified some advanced tricks. However, IBNIZ code is likely to use a lot of bitwise operations, modular bitwise arithmetic and indefinitely running counters which may end up being problematic with floating-point. Fixed point makes the arithmetic more concrete and also improves the implementability of IBNIZ on low-end platforms that lack FPU.
  • Different coordinate formats: TYX-video uses signed coordinates because most effects look better when the origin is at the center of the screen. The 'U' opcode (userinput), on the other hand, gives the mouse coordinates in unsigned format to ease up pixel-plotting (you can directly use the mouse coordinates as part of the framebuffer memory address). T-video uses unsigned coordinates for making the values linear and also for easier coupling with the unsigned coordinates provided by 'U'.

Right now, all the existing implementations of IBNIZ are rather slow. The C implementation is completely interpretive without any optimization phase prior to execution. However, a faster implementation with some clever static analysis is quite high on the to-do list, and I expect a considerable performance boost once native-code JIT compilers come into use. After all, if we are ever planning to displace MS-DOS as a sizecoding platform, we will need to get IBNIZ to run at least faster than DOSBOX.

The use of externally-provided coordinate and time values will make it possible to scale a considerable portion of IBNIZ programs to a vast range of different resolutions from character-cell framebuffers on 8-bit platforms to today's highest higher-than-high-definition standards. I suspect that a lot of IBNIZ programs can be automatically compiled into shader code or fast C-64 machine language (yes, I've made some preliminary calculations for "Ibniz 64" as well). The currently implemented resolution, 256x256, however, will remain as the default resolution that will ensure compatibility. This resolution, by the way, has been chosen because it is in the same class with 320x200, the most popular resolution of tiny MS-DOS demos.

At some point of time, it will also become necessary to introduce a compact binary representation of IBNIZ code -- with variable bit lengths primarily based on the frequency of each instruction. The byte-per-character representation already has a higher code density than the 16-bit x86 machine language, and I expect that a bit-length-optimized representation will really break some boundaries for low size classes.

An important milestone will be a fast and complete version that runs in a web brower. I expect this to make IBNIZ much more available and accessible than it is now, and I'm also planning to host an IBNIZ programming contest once a sufficient web implementation is on-line. There is already a Javascript implementation but it is rather slow and doesn't support sound, so we will still have to wait for a while. But stay tuned!

Friday, 28 October 2011

Some deep analysis of one-line music programs.

It is now a month since I posted the YouTube video "Experimental music from very short C programs" and three weeks since I blogged about it. Now that the initial craze seems to be over, it's a good time to look back what has been done and consider what could be done in the future.

The developments since my last post can be summarized by my third video. It still represents the current state of the art quite well and includes a good variety of different types of formulas.


The videos only show off a portion of all the formulas that could be included. To compensate, I've created a text file where I've collected all the "worthy" formulas I've encountered so far. Most of them can be tested in the on-line JavaScript and ActionScript test tools. Some of them don't even work directly in C code, as they depend on JS/AS-specific features.

As I'm sure that many people still find these formulas rather magical and mysterious, I've decided to give you a detailed technical analysis and explanation on the essential techniques. As I'm completely self-educated in music theory, please pardon my notation and terminology that may be unorthodox at times. You should also have a grasp of C-like expression syntax and binary arithmetic to understand most of the things I'm going to talk about.

I've sorted my formula collection by length. By comparing the shortest and longest formulas, it is apparent that the longest formulas show a much more constructivist approach, including musical data stored in constants as well as entire piece-by-piece-constructed softsynths. The shortest formulas, on the other hand, are very often discovered via non-deterministic testing, from educated guesses to pure trial-and-error. One of my aims with this essay is to bring some understanding and determinism to the short side as well.

Pitches and scales

A class of formulas that is quite prominent among the shortest ones is what I call the 't* class'. The formulas of this type multiply the time counter t with some expression, resulting in a sawtooth wave that changes its pitch according to that expression.

A simple example of a t*-class formula would be t*(t>>10) which outputs a rising and falling sound (accompanied by some aliasing artifacts that create their own sounds). Now, if we introduce an AND operator to this formula, we can restrict the set of pitches and thus create melodies. An example that has been individually discovered by several people, is the so-called "Forty-Two Melody": t*(42&t>>10) or t*2*(21&t>>11).

The numbers that indicate pitches are not semitones or anything like that, but multiplies of a base frequency (sampling rate divided by 256, i.e. 31.25 Hz at the default 8 kHz rate). Here is a table that maps the integer pitches 1..31 to cents and Western note names. The pitches on a gray background don't have good counterparts in the traditional Western system, so I've used quarter-tone flat and sharp symbols to give them approximate names.


By using this table, we can decode the Forty-Two Melody into a human-readable form. The melody is 32 steps long and consists of eight unique pitch multipliers (including zero which gives out silence).


The "Forty-Two Melody" contains some intervals that make it sound a little bit silly, detuned or "Arabic" to Western ears. If we want to avoid this effect, we need to design our formulas so that they only yield pitches that are at familiar intervals from one another. A simple solution is to include a modulo operator to wrap larger numbers to the range where simple integer ratios are more probable. Modifying the Forty-Two Melody into t*((42&t>>10)%14), for example, completely transforms the latter half of the melody into something that sounds a little bit nicer to Western ears. Bitwise AND is also useful for limiting the pitch set to a specific scale; for example t*(5+((t>>11)&5)) produces pitch multipliers of 4, 5, 8 and 9, which correspond to E3, G3, C4 and D4.

Ryg's 44.1 kHz formula presented in the third video contains two different melody generators:

((t*("36364689"[t>>13&7]&15))/12&128)
+(((((t>>12)^(t>>12)-2)%11*t)/4|t>>13)&127)

The first generator, in the first half of the formula, is based on a string constant that contains a straight-forward list of pitches. This list is used for the bass pattern. The other generator, whose core is the subexpression ((t>>12)^(t>>12)-2)%11, is more interesting, as it generates a rather deep self-similar melody structure with just three operators (subtraction, exclusive or, modulo). Rather impressive despite its profound repetitiveness. Here's an analysis of the series it generates:

It is often a good idea to post-process the waveform output of a plain t* formula. The sawtooth wave tends to produce a lot of aliasing artifacts, particularly at low sampling rates. Attaching a '&128' or '&64' in the end of a t* formula switches the output to square wave which usually sounds a little bit cleaner. An example of this would be Niklas Roy's t*(t>>9|t>>13)&16 which sounds a lot noisier without the AND (although most of the noise in this case comes from the unbounded multiplication arithmetic, not from aliasing).

Bitwise waveforms and harmonies

Another class of formulas that is very prominent among the short ones is the bitwise formula. At its purest, such a formula only uses bitwise operations (shifts, negation, AND, OR, XOR) combined with constants and t. A simple example is t&t>>8 -- the "Sierpinski Harmony". Sierpinski triangles appear very often in plotted visualizations of bitwise waveforms, and t&t>>8 represents the simplest type of formula that renders into a nice Sierpinski triangle.

Bitwise formulas often sound surprisingly multitonal for their length. This is based on the fact that an 8-bit sawtooth wave can be thought of consisting of eight square waves, each an octave apart from its neighbor. Usually, these components fuse together in the human brain, forming the harmonics of a single timbre, but if we turn them on and off a couple of times per second or slower, the brain might perceive them as separate tones. For example, t&48 sounds quite monotonal, but in t&48&t>>8, the exactly same waveform sounds bitonal because it abruptly extends the harmonic content of the previous waveform.

The loudest of the eight square-wave components of an 8-bit wave is, naturally, the one represented by the most significant bit (&128). In the sawtooth wave, it is also the longest in wavelength. The second highest bit (&64) represents a square wave that has half the wavelength and amplitude, the third highest halves the parameters once more, and so on. By using this principle, we can analyze the musical structure of the Sierpinski Harmony:


The introduction of ever lower square-wave components can be easily heard. One can also hear quite well that every newly introduced component is considerably lower in pitch than the previous one. However, if we include a prime multiplier in the Sierpinski Harmony, we will encounter an anomaly. In (t*3)&t>>8, the loudest tone actually goes higher at a specific point (and the interval isn't an octave either).

This phenomenon can be explained with aliasing artifacts and how they are processed by the brain. The main wavelength in t*3 is not constant but alternates between two values, 42 and 43, averaging to 42.67 (256/3). The human mind interprets this kind of sound as a waveform of the average length (42.67 samples) accompanied by an extra sound that represents the "error" (or the difference from the ideal wave). In the t*3 example, this extra sound has a period of 256 samples and sounds like a buzzer when listened separately.

The smaller the wavelengths we are dealing with are, the more prominent these aliasing artifacts become, eventually dominating over their parent waveforms. By listening to (t*3)&128, (t*3)&64 and (t*3)&32, we notice an interval of an octave between them. However, when we step over from (t*3)&32 to (t*3)&16, the interval is definitely not an octave. This is the threshold where the artifact wave becomes dominant. This is why t&t>>8, (t*3)&t>>8 and (t*5)&t>>8 sound so different. It is also the reason why high-pitched melodies may sound very detuned.

Variants of the Sierpinski harmony can be combined to produce melodies. Examples of this approach include:

t*5&(t>>7)|t*3&(t*4>>10) (from miiro)

(t*5&t>>7)|(t*3&t>>10) (from viznut)

t*9&t>>4|t*5&t>>7|t*3&t/1024 (from stephth)

Different counters are the driving force of bitwise formulas. At their simplest, counters are just bitshifted versions of the main counter (t). These are implicitly synchronized with each other and work on different temporal levels of the musical piece. However, it has also been fruitful to experiment with counters that don't have a simple common denominator, and even with ones whose speeds are nearly identical. For example, t&t%255 brings a 256-cycle counter and a 255-cycle counter together with an AND operation, resulting in an ambient drone sound that sounds like something achievable with pulse-width modulation. This approach seems to be more useful for loosely structured soundscapes than clear-cut rhythms or melodies.

Some oneliner songs attach a bitwise operation to a melody generator for transposing the output by whole octaves. A simple example is Rrrola's t*(0xCA98>>(t>>9&14)&15)|t>>8 which would just loop a simple series of notes without the trailing '|t>>8'. This part gradually fixes the upper bits of the output to 1s, effectively raising the pitch of the melody and fading its volume out. Also the formulas from Ryg and Kb in my third video use this technique. The most advanced use of it I've seen so far, however, is in Mu6k's song (the last one in the 3rd video) which synthesizes its lead melody (along with some accompanying beeps) by taking the bassline and selectively turning its bits on and off. This takes place within the subexpression (t>>8^t>>10|t>>14|x)&63 where the waveform of the bass is input as x.

Modular wrap-arounds and other synthesis techniques

All the examples presented so far only use counters and bitwise operations to synthesize the actual waveforms. It's therefore necessary to talk a little bit about other operations and their potential as well.

By accompanying a bitwise formula with a simple addition or substraction, it is possible to create modular wrap-around artifacts that produce totally different sounds. Tiny, nearly inaudible sounds may become very dominant. Harmonious sounds often become noisy and percussive. By extending the short Sierpinski harmony t&t>>4 into (t&t>>4)-5, something that sounds like an "8-bit" drum appears on top of it. The same principle can also be applied to more complex Sierpinski harmony derivatives as well as other bitwise formulas:

(t*9&t>>4|t*5&t>>7|t*3&t/1024)-1

I'm not going into a deep analysis of how modular wrap-arounds affect the harmonic structure of a sound, as I guess someone has already done the math before. However, modular addition can be used for something that sounds like oscillator hard-sync in analog synthesizers, although its technical basis is different.

Perhaps the most obvious use for summing in a softsynth, however, is the one where modular wrap-around is not very useful: mixing of several sound sources together. A straight-forward recipe for this is (A&127)+(B&127), which may be a little long-winded when aiming at minimalism. Often, just a simple XOR operation is enough to replace it, although it usually produces artifacts that may sound good or bad depending on the case. XOR can also be used for effects that sound like hard-sync.

Of course, modular wrap-around effects are also achievable with multiplication and division, and on the other hand, even without addition or subtraction. I'll illustrate this with just a couple of interesting-sounding examples:

t>>4|t&((t>>5)/(t>>7-(t>>15)&-t>>7-(t>>15))) (from droid, js/as only)

(int)(t/1e7*t*t+t)%127|t>>4|t>>5|t%127+(t>>16)|t (from bst)

t>>6&1?t>>5:-t>>4 (from droid)

There's a lot in these and other synthesis algorithms that could be discussed, but as they already belong to a zone where traditional sound synthesis lore applies, I choose to go on.

Deterministic composition

When looking at the longest formulas in the collection, it is apparent that there's a lot of intelligent design behind most of them. Long constants and tables, sometimes several of them, containing scales, melodies, basslines and drum patterns. The longest formula in the collection is "Long Line Theory", a cover of the soundtrack of the 64K demo "Chaos Theory" by Conspiracy. The original version by mu6k was over 600 characters long, from which the people on Pouet.net optimized it down to 300 characters, with some arguable quality tradeoffs.

It is, of course, possible to synthesize just about anything with a formula, especially if there's no upper limit for the length. Synthesis and sequencing logic can be built section by section, using rather generic algorithms and proven engineering techniques. There's no magic in it. But on the other hand, there's no magic in pure non-determinism either: it is very difficult to find anything outstanding with totally random experimentation after the initial discovery phase is over.

Many of the more sophisticated formulas seem to have a good balance between random experimentation and deterministic composition. It is often apparent in their structure that some elements are results of random discoveries while others have been built with an engineer's mindset. Let's look at Mu6k's song (presented in the end of the 3rd video, 32 kHz):

(((int)(3e3/(y=t&16383))&1)*35) +
(x=t*("6689"[t>>16&3]&15)/24&127)*y/4e4 +
((t>>8^t>>10|t>>14|x)&63)

I've split the formula on three lines according to the three instruments therein: drum, bass and lead.

My assumption is that the song has been built around the lead formula that was discovered first, probably in the form of t>>6^t>>8|t>>12|t&63 or something (the original version of this formula ran at 8 kHz). As usual with pure bitwise formulas, all the intervals are octaves, but in this case, the musical structure is very nice.

As it is possible to transpose a bit-masking melody simply by transposing the carrier wave, it's a good idea to generate a bassline and reuse it as the carrier. Unlike the lead generator, the bassline generator is very straight-forward in appearance, consisting of four pitch values stored in a string constant. A sawtooth wave is generated, stored to a variable (so that it can be reused by the lead melody generator) and amplitude-modulated.

Finally, there's a simple drum beat that is generated by a combination of division and bit extraction. The extracted bit is scaled to the amplitude of 35. Simple drums are often synthesized by using fast downward pitch-slides and the division approach does this very well.

In the case of Ryg's formula I discussed some sections earlier, I might also guess that the melody generator, the most chaotic element of the system, was the central piece which was later coupled with a bassline generator whose pitches were deliberately chosen to harmonize with the generated melody.

The future

I have been contacted by quite many people who have brought up different ideas of future development. We should, for example, have a social website where anyone could enter new formulas, listen to the in a playlist-like manner and rate them. Another branch of ideas is about the production of new rateable formulas by random generation or by breeding old ones together with genetic algorithms.

All of these ideas are definitely interesting, but I don't think the time is yet right for them. I have been developing my audiovisual virtual machine, which is the main reason why I did these experiments in the first place. I regard the current concept of "oneliner music" as a mere placeholder for the system that is yet to be released. There are too many problems with the C-like infix syntax and other aspects of the concept, so I think it's wiser to first develop a better toy and then think about a community mechanism. However, these are just my own priorities. If someone feels like building the kind of on-line community I described, I'll support the idea.

I've mentioned this toy before. It was previously called EDAM, but now I've chosen to name it IBNIZ (Ideally Bare Numeric Impression giZmo). One of the I letters could also stand for "immediate" or "interactive", as I'm going to emphasize an immediate, hands-on modifiability of the code. IBNIZ will hopefully be relevant as a demoscene platform for extreme size classes, as a test bed for esoteric algorithmic trickery, as an appealing introduction to hard-core minimalist programming, and also as a fun toy to just jam around with. Here's a little screenshot of the current state:


In my previous post, I mentioned the possibility of opening a door for 256-byte demos that are interesting both graphically and musically. The oneliner music project and IBNIZ will provide valuable research for the high-level, algorithmic aspects of this project, but I've also made some
hands-on tests on the platform-level feasability of the idea. It is now apparent that a stand-alone MS-DOS program that generates PCM sound and synchronized real-time graphics can easily fit in less then 96 bytes, so there's a lot of room left for both music and graphics in the 256-byte size
class. I'll probably release a 128- or 256-byte demo as a proof-of-concept, utilizing something derived from a nice oneliner music formula as the soundtrack.

I would like to thank everyone who has been interested in the oneliner music project, as all the hype made me very determined to continue my quests for unleashing the potential of the bit and the byte. My next post regarding this quest will probably appear once there's a version of IBNIZ worth releasing to the public.

Sunday, 2 October 2011

Algorithmic symphonies from one line of code -- how and why?

Lately, there has been a lot of experimentation with very short programs that synthesize something that sounds like music. I now want to share some information and thoughts about these experiments.

First, some background. On 2011-09-26, I released the following video on Youtube, presenting seven programs and their musical output:


This video gathered a lot of interest, inspiring many programmers to experiment on their own and share their findings. This was further boosted by Bemmu's on-line Javascript utility that made it easy for anyone (even non-programmers, I guess) to jump in the bandwagon. In just a couple of days, people had found so many new formulas that I just had to release another video to show them off.


Edit 2011-10-10: note that there's now a third video as well! http://www.youtube.com/watch?v=tCRPUv8V22o

It all started a couple of months ago, when I encountered a 23-byte C-64 demo, Wallflower by 4mat of Ate Bit, that was like nothing I had ever seen on that size class on any platform. Glitchy, yes, but it had a musical structure that vastly outgrew its size. I started to experiment on my own and came up with a 16-byte VIC-20 program whose musical output totally blew my mind. My earlier blog post, "The 16-byte frontier", reports these findings and speculates why they work.

Some time later, I resumed the experimentation with a slightly more scientific mindset. In order to better understand what was going on, I needed a simpler and "purer" environment. Something that lacked the arbitrary quirks and hidden complexities of 8-bit soundchips and processors. I chose to experiment with short C programs that dump raw PCM audio data. I had written tiny "/dev/dsp softsynths" before, and I had even had one in my email/usenet signature in the late 1990s. However, the programs I would now be experimenting with would be shorter and less planned than my previous ones.

I chose to replicate the essentials of my earlier 8-bit experiments: a wave generator whose pitch is controlled by a function consisting of shifts and logical operators. The simplest waveform for /dev/dsp programs is sawtooth. A simple for(;;)putchar(t++) generates a sawtooth wave with a cycle length of 256 bytes, resulting in a frequency of 31.25 Hz when using the the default sample rate of 8000 Hz. The pitch can be changed with multiplication. t++*2 is an octave higher, t++*3 goes up by 7 semitones from there, t++*(t>>8) produces a rising sound. After a couple of trials, I came up with something that I wanted to share on an IRC channel:

main(t){for(t=0;;t++)putchar(t*(((t>>12)|(t>>8))&(63&(t>>4))));}

In just over an hour, Visy and Tejeez had contributed six more programs on the channel, mostly varying the constants and changing some parts of the function. On the following day, Visy shared our discoveries on Google+. I reshared them. A surprising flood of interested comments came up. Some people wanted to hear an MP3 rendering, so I produced one. All these reactions eventually led me to release the MP3 rendering on Youtube with some accompanying text screens. (In case you are wondering, I generated the screens with an old piece of code that simulates a non-existing text mode device, so it's just as "fakebit" as the sounds are).

When the first video was released, I was still unsure whether it would be possible for one line of C code to reach the sophistication of the earlier 8-bit experiments. Simultaneities, percussions, where are they? It would also have been great to find nice basslines and progressions as well, as those would be useful for tiny demoscene productions.

At some point of time, some people noticed that by getting rid of the t* part altogether and just applying logical operators on shifted time values one could get percussion patterns as well as some harmonies. Even a formula as simple as t&t>>8, an aural corollary of "munching squares", has interesting harmonic properties. Some small features can be made loud by adding a constant to the output. A simple logical operator is enough for combining two good-sounding formulas together (often with interesting artifacts that add to the richness of the sound). All this provided material for the "second iteration" video.

If the experimentation continues at this pace, it won't take many weeks until we have found the grail: a very short program, maybe even shorter than a Spotify link, that synthesizes all the elements commonly associated with a pop song: rhythm, melody, bassline, harmonic progression, macrostructure. Perhaps even something that sounds a little bit like vocals? We'll see.

Hasn't this been done before?

We've had the technology for all this for decades. People have been building musical circuits that operate on digital logic, creating short pieces of software that output music, experimenting with chaotic audiovisual programs and trying out various algorithms for musical composition. Mathematical theory of music has a history of over two millennia. Based on this, I find it quite mind-boggling that I have never before encountered anything similar to our discoveries despite my very long interest in computing and algorithmic sound synthesis. I've made some Google Scholar searches for related papers but haven't find anything. Still, I'm quite sure that at many individuals have come up with these formulas before, but, for some reason, their discoveries remained in obscurity.

Maybe it's just about technological mismatch: to builders of digital musical circuits, things like LFSRs may have been more appealing than very wide sequential counters. In the early days of the microcomputer, there was already enough RAM available to hold some musical structure, so there was never a real urge to simulate it with simple logic. Or maybe it's about the problems of an avant-garde mindset: if you're someone who likes to experiment with random circuit configurations or strange bit-shifting formulas, you're likely someone who has learned to appreciate the glitch esthetics and never really wants to go far beyond that.

Demoscene is in a special position here, as technological mismatch is irrelevant there. In the era of gigabytes and terabytes, demoscene coders are exploring the potential of ever shorter program sizes. And despite this, the sense of esthetics is more traditional than with circuit-benders and avant-garde artists. The hack value of a tiny softsynth depends on how much its output resembles "real, big music" such as Italo disco.

The softsynths used in the 4-kilobyte size class are still quite engineered. They often use tight code to simulate the construction of an analog synthesizer controlled by a stored sequence of musical events. However, as 256 bytes is becoming the new 4K, there has been ever more need to play decent music in the 256-byte size class. It is still possible to follow the constructivist approach in this size class -- for example, I've coded some simple 128-byte players for the VIC-20 when I had very little memory left. However, since the recent findings suggest that an approach with a lot of random experimentation may give better results than deterministic hacking, people have been competing in finding more and more impressive musical formulas. Perhaps all this was something that just had to come out of the demoscene and nowhere else.

Something I particularly like in this "movement" is its immediate, hands-on collaborative nature, with people sharing the source code of their findings and basing their own experimentation on other people's efforts. Anyone can participate in it and discover new, mind-boggling stuff, even with very little programming expertise. I don't know how long this exploration phase is going to last, but things like this might be useful for a "Pan-Hacker movement" that advocates hands-on hard-core hacking to greater masses. I definitely want to see more projects like this.

How profound is this?

Apart from some deterministic efforts that quickly bloat the code up to hundreds of source-code characters, the exploration process so far has been mostly trial-and-error. Some trial-and-error experimenters, however, seem to have been gradually developing an intuitive sense of what kind of formulas can serve as ingredients for something greater. Perhaps, at some time in the future, someone will release some enlightening mathematical and music-theoretical analysis that will explain why and how our algorithms work.

It already seems apparent, however, that stuff like this stuff works in contexts far beyond PCM audio. The earlier 8-bit experiments, such as the C-64 Wallflower, quite blindly write values to sound and video chip registers and still manage to produce interesting output. Media artist Kyle McDonald has rendered the first bunch of sounds into monochrome bitmaps that show an interesting, "glitchy" structure. Usually, music looks quite bad when rendered as bitmaps -- and this applies even to small chiptunes that sound a lot like our experiments, so it was interesting to notice the visual potential as well.


I envision that, in the context of generative audiovisual works, simple bitwise formulas could generate source data not only for the musical output but also drive various visual parameters as a function of time. This would make it possible, for example, for a 256-byte demoscene production to have an interesting and varying audiovisual structure with a strong, inherent synchronization between the effects and the music. As the formulas we've been experimenting with can produce both microstructure and macrostructure, we might assume that they can be used to drive low-level and high-level parameters equally well. From wave amplitudes and pixel colors to layer selection, camera paths, and 3D scene construction. But so far, this is mere speculation, until someone extends the experimentation to these parameters.

I can't really tell if there's anything very profound in this stuff -- after all, we already have fractals and chaos theory. But at least it's great for the kind of art I'm involved with, and that's what matters to me. I'll probably be exploring and embracing the audiovisual potential for some time, and you can expect me to blog about it as well.

Edit 2011-10-29: There's now a more detailed analysis available of some formulas and techniques.

Tuesday, 21 June 2011

The 16-byte frontier: extreme results from extremely small programs.

While mainstream software has been getting bigger and more bloated year after year, the algorithmic artists of the demoscene have been following the opposite route: building ever smaller programs to generate ever more impressive audiovisual show-offs.

The traditional competition categories for size-limited demos are 4K and 64K, limiting the size of the stand-alone executable to 4096 and 65536 bytes, respectively. However, as development techniques have gone forward, the 4K size class has adopted many features of the 64K class, or as someone summarized it a couple of years ago, "4K is the new 64K". There are development tools and frameworks specifically designed for 4K demos. Low-level byte-squeezing and specialized algorithmic beauty have given way to high-level frameworks and general-purpose routines. This has moved a lot of "sizecoding" activity into more extreme categories: 256B has become the new 4K. For a fine example of a modern 256-byter, see Puls by Rrrrola.



The next hexadecimal order of magnitude down from 256 bytes is 16 bytes. Yes, there are some 16-byte demos, but this size class has not yet established its status on the scene. At the time of writing this, the smallest size category in the pouet.net database is 32B. What's the deal? Is the 16-byte limit too tight for anything interesting? What prevents 16B from becoming the new 256B?

Perhaps the most important platform for "bytetros" is MS-DOS, using the no-nonsense .COM format that has no headers or mandatory initialization at all. Also, in .COM files we only need a couple of bytes to obtain access to most of the vital things such as the graphics framebuffer. At the 16-byte size class, however, these "couples of bytes" quickly fill up the available space, leaving very little room for the actual substance. For example, here's a disassembly of a "TV noise" effect (by myself) in fifteen bytes:
addr  bytes     asm
0100 B0 13 MOV AL,13H
0102 CD 10 INT 10H
0104 68 00 A0 PUSH A000H
0107 07 POP ES
0108 11 C7 ADC DI,AX
010A 14 63 ADC AL,63H
010C AA STOSB
010D EB F9 JMP 0108H

The first four lines, summing up to a total of eight bytes, initialize the popular 13h graphics mode (320x200 pixels with 256 colors) and set the segment register ES to point in the beginning of this framebuffer. While these bytes would be marginal in a 256-byte demo, they eat up a half of the available space in the 16-byte size class. Assuming that the infinite loop (requiring a JMP) and the "putpixel" (STOSB) are also part of the framework, we are only left with five (5) bytes to play around with! It is possible to find some interesting results besides TV noise, but it doesn't require many hours from the coder to get the feeling that there's nothing more left to explore.

What about other platforms, then? Practically all modern mainstream platforms and a considerable portion of older ones are out of the question because of the need for long headers and startup stubs. Some platforms, however, are very suitable for the 16-byte size class and even have considerable advantages over MS-DOS. The hardware registers of the Commodore 64, for example, are more readily accessible and can be manipulated in quite unorthodox ways without risking compatibility. This spares a lot of precious bytes compared to MS-DOS and thus opens a much wider space of possibilities for the artist to explore.

So, what is there to be found in the 16-byte possibility space? Is it all about raster effects, simple per-pixel formulas and glitches? Inferior and uglier versions of the things that have already made in 32 or 64 bytes? Is it possible to make a "killer demo" in sixteen bytes? A recent 23-byte Commodore 64 demo, Wallflower by 4mat of Ate Bit, suggests that this might be possible:



The most groundbreaking aspect in this demo is that it is not just a simple effect but appears to have a structure reminiscent of bigger demos. It even has an end. The structure is both musical and visual. The visuals are quite glitchy, but the music has a noticeable rhythm and macrostructure. Technically, this has been achieved by using the two lowest-order bytes of the system timer to calculate values that indicate how to manipulate the sound and video chip registers. The code of the demo follows:
* = $7c
ora $a2
and #$3f
tay
sbc $a1
eor $a2
ora $a2
and #$7f
sta $d400,y
sta $cfd7,y
bvc $7c

When I looked into the code, I noticed that it is not very optimized. The line "eor $a2", for example, seems completely redundant. This inspired me to attempt a similar trick within the sixteen-byte limitation. I experimented with both C-64 and VIC-20, and here's something I came up with for the VIC-20:
* = $7c
lda $a1
eor $9004,x
ora $a2
ror
inx
sta $8ffe,x
bvc $7c

Sixteen bytes, including the two-byte PRG header. The visual side is not that interesting, but the musical output blew my mind when I first started the program in the emulator. Unfortunately, the demo doesn't work that well in real VIC-20s (due to an unemulated aspect of the I/O space). I used a real VIC-20 to come up with good-sounding alternatives, but this one is still the best I've been able to find. Here's an MP3 recording of the emulator output (with some equalization to silent out the the noisy low frequencies).

And no, I wasn't the only one who was inspired by Wallflower. Quite soon after it came out, some sceners came up with "ports" to ZX Spectrum (in 12 or 15 bytes + TAP header) and Atari XL (17 bytes of code + 6-byte header). However, I don't think they're as good in the esthetic sense as the original C-64 Wallflower.

So, how and why does it work? I haven't studied the ZX and XL versions, but here's what I've figured out of 4mat's original C-64 version and my VIC-20 experiment:

The layout of the zero page, which contains all kinds of system variables, is quite similar in VIC-20 and C-64. On both platforms, the byte at the address $A2 contains a counter that is incremented 60 times per second by the system timer interrupt. When this byte wraps over (every 256 steps), the byte at the address $A1 is incremented. This happens every 256/60 = 4.27 seconds, which is also the length of the basic macrostructural unit in both demos.

In music, especially in the rhythms and timings of Western pop music, binary structures are quite prominent. Oldschool homecomputer music takes advantage of this in order to maximize simplicity and efficiency: in a typical tracker song, for example, four rows comprise a beat, four beats (16 rows) comprise a bar, and four bars (64 rows) comprise a pattern, which is the basic building block for the high-level song structure. The macro-units in our demos correspond quite well to tracker patterns in terms of duration and number of beats.

The contents of the patterns, in both demos, are calculated using a formula that can be split into two parts: a "chaotic" part (which contains additions, XORs, feedbacks and bit rotations), and an "orderly" part (which, in both demos, contains an OR operation). The OR operation produces most of the basic rhythm, timbres and rising melody-like elements by forcing certain bits to 1 at the ends of patterns and smaller subunits. The chaotic part, on the other hand, introduces an unpredictable element that makes the output interesting.

It is almost a given that the outcomes of this approach are esthetically closer to glitch art than to the traditional "smooth" demoscene esthetics. Like in glitching and circuit-bending, hardware details have a very prominent effect in "Wallflower variants": a small change in register layout can cause a considerable difference in what the output of a given algorithm looks and sounds like. Demoscene esthetics is far from completely absent in "Wallflower variants", however. When the artist chooses the best candidate among countless of experiments, the judgement process strongly favors those programs that resemble actual demos and appear to squeeze a ridiculous amount of content in a low number of bytes.

When dealing with very short programs that escape straightforward rational understanding by appearing to outgrow their length, we are dealing with chaotic systems. Programs like this aren't anything new. The HAKMEM repository from the seventies provides several examples of short audiovisual hacks for the PDP-10 mainframe, and many of these are adaptations of earlier PDP-1 hacks, such as Munching Squares, dating back to the early sixties. Fractals, likewise producing a lot of detail from simple formulas, also fall under the label of chaotic systems.

When churning art out of mathematical chaos, be that fractal formulas or short machine-code programs, it is often easiest for the artist to just randomly try out all kinds of alternatives without attempting to understand the underlying logic. However, this easiness does not mean that there is no room for talent, technical progress or rational approach in the 16-byte size class. Random toying is just a characteristic of the first stages of discovery, and once a substantial set of easily discoverable programs have been found, I'm sure that it will become much more difficult to find new and groundbreaking ones.

Some years ago, I made a preliminary design for a virtual machine called "Extreme-Density Art Machine" (or EDAM for short). The primary purpose of this new platform was to facilitate the creation of extremely small demoscene productions by removing all the related problems and obstacles present in real-world platforms. There is no code/format overhead; even an empty file is a valid EDAM program that produces a visual result. There will be no ambiguities in the platform definition, no aspects of program execution that depend on the physical platform. The instruction lengths will be optimized specifically for visual effects and sound synthesis. I have been seriously thinking about reviving this project, especially now that there have been interesting excursions to the 16-byte possibility space. But I'll tell you more once I have something substantial to show.

Thursday, 2 June 2011

What should big pixels look like?

There has been some fuss recently about a new algorithm that vectorizes pixel art. And yes, judging from the example pictures, this algorithm by Johannes Kopf and Dani Lischinski indeed seems to produce results superior to the likes of hq*x and scale*x or mainstream vectorization algorithms. Let me duplicate the titular example for reference:
Impressive, yes, but as in case with all such algorithms, the first question that came to my mind was: "But does it manage dithering and antialiasing?". The paper explicitly answers this question: no.

All the depixelization algorithms so far have been succesful only with a specific type of pixel art. Pixel art of a cartoonish style that has clear lines and not too many details. This kind of pixel art may have been mainstream in Japan, but in the Western sphere, especially in Europe, there has been a strong tradition of optimalism: the tendency of maximizing the amount of detail and shading within the limited grid of pixels. An average pixel artwork on the Commodore 64 or the ZX Spectrum has an extensive amount of careful manual dithering. If we wish to find a decent general-purpose pixel art depixelization algorithm, it would definitely need to take care of that.

I once experimented by writing an undithering filter that attempts to smooth out dithering while keeping non-dithering-related pixels intact. The filter works as follows:
  • Flag a pixel as a dithering candidate if it differs enough from its cardinal neighborhood (no more than one of the four neighbors are more similar to the reference pixel than the neighbor average).
  • Extend the area of dither candidates: flag a pixel if at least five of its eight neighbors are flagged. Repeat until no new pixels are flagged.
  • For each flagged pixel, replace its color with the weighed average of all the flagged pixels within the surrounding 3x3 rectangle.
Would it be possible to improve the performance of a depixelization algorithm by first piping the picture thru my undithering filter? Let's try out. Here is an example of how the filter manages with a fullscreen C-64 multicolor-mode artwork (from the demoscene artist Frost of Panda Design) and how the results are scaled by the hq4x algorithm:

The undithering works well enough within the smooth areas, and hq4x is even able to recognize the undithered areas as gradients and smooth them a little bit further. However, when looking at the border between the nose and the background, we'll notice careful manual antialiasing that even adds some lonely dithering pixels to smooth out the staircasing. My algorithm doesn't recognize these lonely pixels as dithering, and neither does it recognize the loneliest pixels in the outskirts of dithered gradients as dithering. It is a difficult task to algorithmically detect whether a pixel is intended to be a dithering pixel or a contour/detail pixel. Detecting antialiasing would be a totally different task, requiring a totally new set of processing stages.

There seems to be still a lot of work to do. But suppose that, some day, we will discover the ultimate depixelization algorithm. An image recognition and rerendering pipeline that succesfully recognizes and interprets contours, gradients, dithering, antialiasing and everything else in all conceivable cases, and rerenders it in a high resolution and color without any distracting artifacts. Would that be the holy grail? I wouldn't say so.

The case is that we already have the ultimate depixelization algorithm -- the one running in the visual subsystem of the human brain. It is able to fill in amazing amounts of detail when coping with low amounts of reliable data. It handles noisiness and blurriness better than any digital system. It can extrapolate very well from low-complexity shapes such as silhouette drawings or groups of blurry dots on a CRT screen.

A fundamental problem with the "unlimited resolution" approach of pixel art upscaling is that it attempts to fill in details that aren't there -- a task in which the human brain is vastly superior. Replacing blurry patterns with crisp ones can even effectively turn off the viewer's visual imagination: a grid of blurry dots in the horizon can be just about anything, but if they get algorithmically substituted by some sort of crisp blobs, the illusion disappears. I think it is outright stupid to waste computing resources and watts for something that kills the imagination.

The reason why pixel art upscaling algorithms exist in the first place is that sharp rectangular pixels (the result of nearest-neighbor upscaling) look bad. And I have to agree with this. Too easily recognizable pixel boundaries distract the viewer from the art. The scaling algorithms designed for video scaling partially solve this problem with their interpolation, but the results are still quite bad for the opposite reason -- because there is no respect for the nature of the individual pixel.

When designing a general-purpose pixel art upscaling algorithm, I think the best route would go somewhere between the "unlimited resolution" approach and the "blurry interpolation" approach. Probably something like CRT emulation with some tasteful improvements. Something that keeps the pixels blurry enough for the visual imagination to work while still keeping them recognizable and crisp enough so that the beauty of the patterns can be appreciated.

Nevertheless, I was very fascinated by the Kopf-Lischinski algorithm, but not because of how it would improve existing art, but for its potential of providing nice, organic and blobby pixels to paint new art with. A super-low-res pixel art painting program that implements this kind of algorithm would make a wonderful toy and perhaps even vitalize the pixel art scene in a new and refreshing way. Such a vitalization would also be a triumph for the idea of Computationally Minimal Art which I have been advocating.

Monday, 15 March 2010

Defining Computationally Minimal Art (or, taking the "8" out of "8-bit")

[Icon Watch designed by &design]

Introduction


"Low-tech" and "8-bit" are everywhere nowadays. Not only are the related underground subcultures thriving, but "retrocomputing esthetics" seems to pop up every now and then in mainstream contexts as well: obvious chip sounds can be heard in many pop music songs, and there are many examples of "old video game style" in TV commercials and music videos. And there are even "pixel-styled" physical products, such as the pictured watch sold by the Japanese company "&design". I'm not a grand follower of popular culture, but it seems to me that the trend is increasing.


The most popular and widely accepted explanation for this phenomenon is the "nostalgia theory", i.e. "People of the age group X are collectively rediscovering artifacts from the era Y". But I'm convinced that there's more to it -- something more profound that is gradually integrating "low-tech" or "8-bit" into our mainstream cultural imagery.


Many people have became involved with low-tech esthetics via nostalgia, but I think it is only the first phase. Many don't experience this phase at all and jump directly to the "second phase", where pixellated graphics or chip sounds are simply enjoyed the way they are, totally ignoring the
historical baggage. There is even an apparent freshness or novelty value for some people. This happens with audiences that are "too young" (like the users of Habbo Hotel) or otherwise more or less unaffected by the "oldskool electronic culture" (like many listeners of pop music).


Since the role of specific historical eras and computer/gaming artifacts is diminishing, I think it is important to provide a neutral conceptual basis for "low-tech esthetics"; an independent and universal definition that does not refer to the historical timeline or some specific cultural technology. My primary goal in this article is to provide this definition
and label it as "Computationally Minimal Art". We will also be looking for support for the universality of Computationally Minimal Art and finding ur-examples that are even older than electricity.


A definition: Computationally Minimal Art


Once we strip "low-tech esthetics" from its historical and cultural connections, we will be left with "pixellated shapes and bleepy sounds" that share an essential defining element. This element stems from what is common to the old computing/gaming hardware in general, and it is perfectly possible to describe it in generic terms, without mentioning specific platforms or historical eras.


[Space Invaders sprite]

The defining element is LOW COMPUTATIONAL COMPLEXITY, as expressed in all aspects of the audiovisual system: the complexity of the platform (i.e. the number of transistors or logic gates in the hardware), the complexity of the software (i.e. the length in bits of the program code and static data), as well as the time complexity (i.e. how many state changes the computational
tasks require). A more theoretical approach would eliminate the differentiation of software and hardware and talk about description/program length, memory complexity and time complexity.


There's little more that needs to be defined; all the important visible and audible features of "low-tech" emerge from the various kinds of low complexity. Let me elaborate with a couple of examples:


  • A low computing speed leads to a low number of processed and output bits per time frame. In video output, this means low resolutions and limited color schemes. In audio output, this means simple waveforms on a low number of discrete channels.

  • A short program+data length, combined with a low processing speed, makes it preferrable to have a small set of small predefined patterns (characters, tiles, sprites) that are extensively reused.

  • A limited amount of temporary storage (emerging from the low hardware complexity) also supports the former two examples via the small amount of available video memory.

  • In general, the various types of low complexity make it possible for a human being (with some expertise) to "see the individual bits with a naked eye and even count them".

In order to complete the definition, we will still have to know what "low" means. It may not be wise to go for an arbitrary threshold here ("less than X transistors in logic, less than Y bits of storage and less than Z cycles per second"), so I would like to define it as "the lower the better". Of course, this does not mean that a piece of low-tech artwork would ideally
constitute of one flashing pixel and static square-wave noise, but that the most essential elements of this artistic branch are those that persist the longest when the complexity of the system approaches zero.


Let me therefore dub the idealized definition of "low-tech art" as Computationally Minimal Art (CMA).


To summarize: "Computationally Minimal Art is a form of discrete art governed by a low computational complexity in the domains of time, description length and temporary storage. The most essential features of Computationally Minimal Art are those that persist the longest when the
various levels of complexity approach zero."


How to deal with the low complexity?


Traditionally, of course, low complexity was the only way to go. The technological and economical conditions of the 1970s and 1980s made the microelectronic artist bump into certain "strict boundaries" very soon, so the art needed to be built around these boundaries regardless of the artist's actual esthetic ideals. Today, on the other hand, immense and virtually non-limiting amounts of computing capacity are available for practically everyone who desires it, so computational minimalism is nearly always a conscious choice. There are, therefore, clear differences in how the low complexity has been dealt with in different eras and
disciplines.


I'm now going to define two opposite approaches to low complexity in computational art: optimalism (or "oldschool" attitude), which aims at pushing the boundaries in order to fit in "as much beauty as possible", and reductivism (or "newschool" attitude), which idealizes the low complexity itself as a source of beauty.


Disclaimer: All the exaggeration and generalization is intentional! I'm intending to point out differences between various extremities, not to portray any existing "philosophies" accurately.


Optimalism


Optimalism is a battle of maximal goals against a minimal environment. There are absolute predefined boundaries that provide hard upper limits for the computational complexity, and these boundaries are then pushed by fitting as much expressive power as possible between them. This approach is the one traditionally applied to mature and static hardware platforms by the
video game industry and the demoscene, and it is characterized by the appreciation of optimization in order to reach a high content density regardless of the limitations.


[Frog, Landscape and a lot of Clouds by oys]

A piece of traditional European-style pixel graphics ("Frog, Landscape and a lot of Clouds" by oys) exemplifies many aspects of optimalism. The resolution and color constraints of a video mode (in this case, non-tweaked C-64
multicolor) provide the hard limits, and it is the responsibility of the artist to fill up the space as wisely and densely as possible. Large single-colored areas would look "unfinished", so they are avoided, and if it is possible to fit in more detail or dithering somewhere, it should be done. It is also avoidable to leave an available color unused -- an idea which leads to the infamous "Dutch color scheme" when applied to high/truecolor video modes.


When applied to chip music, the optimalist dogma tells, among all, to fill in all the silent parts and avoid "simple beeps". Altering the values of as many sound chip registers per frame as possible is thought to be efficient use of the chip. This adds to the richness of the sound, which is though to correlate with the quality of the music.


[Artefacts by Plush]

On platforms such as the Commodore 64, the demoscene and video game industry seem to have been having relatively similar ideals. Once an increased computing capacity becomes available, however, an important difference between these cultures is revealed. Whenever the video game
industry gets more disk space or other computational resources, it will try to use it up as aggressively as possible, without starting any optimization efforts until the new boundaries have been reached. The demoscene, on the other hand, values optimality and content density so much that it often prefers to stick to old hardware or artificial boundaries in order to keep the "art of optimality" alive. The screenshot is from the 4K demo "Artefacts" by Plush (C-64).


Despite the cultural differences, however, the core esthetic ideal of optimalism is always "bigger is better"; that an increased perceived content complexity is a requirement for increased beauty. Depending on the circumstances, more or less pushing of boundaries is required.


Reductivism


Reductivism is the diagonal opposite of optimalism. It is the appreciation of minimalism within a maximal set of possibilities, embracing the low complexity itself as an esthetic goal. The approach can be equated with the artistic discipline of minimal art, but it should be remembered that the idea is much older than that. Pythagoras, who lived around 2500 years ago, already appreciated the role of low complexity -- in the form of mathematical beauty such as simple numerical ratios -- in music and art.


The reductivist approach does not lead to a similar pushing of boundaries as optimalism, and in many cases, strict boundaries aren't even introduced. Regardless, a kind of pushing is possible -- by exploring ever simpler structures and their expressive power -- but most reductivists don't seem to be interested in this aspect. It is usually enough that the output comes out as "minimal enough" instead of being "as minimal as possible".


[VVVVVV by Terry Cavanagh]

The visuals of the recent acclaimed Flash-based platformer game, VVVVVV, are a good example of computational minimalism with a reductivist approach. The author, Terry Cavanagh, has not only chosen a set of voluntary "restrictions" (reminiscent of mature computer platforms) to guide the
visual style, but keeps to a reductivist attitude in many other aspects as well. Just look at the "head-over-heels"-type main sprite -- it is something that a child would be able to draw in a minute, and yet it is perfect in the same iconic way as the Pac-Man character is. The style totally serves its purpose: while it is charming in its simplicity and downright naivism, it
shouts out loud at the same time: "Stop looking at the graphics, have fun with the actual game instead!"


[Thrust]

Although reductivism may be regarded as a "newschool" approach, it is possible to find some slightly earlier examples of it as well. The graphics of the 1986 computer game Thrust, for example, has been drawn with simple geometrical lines and arcs. The style is reminiscent of older vector-based arcade games such as Asteroids and Gravitar, and it definitely serves a technical purpose on such hardware. But on home computers with bitmapped screens and sprites, the approach can only be an esthetical one.


Optimalism versus Reductivism


Optimalism and reductivism sometimes clash, and an example of this can be found in the chip music community. After a long tradition of optimalism thru the efforts of the video game industry and the demoscene, a new kind of cultural branch was born. This branch, sometimes mockingly called
"cheaptoon", seems to get most of its kicks from the unrefined roughness of the pure squarewave rather than the pushing of technological and musical boundaries that has been characteristic of the "oldschool way". To an optimalist, a reductivist work may feel lazy or unskilled, while an
optimalist work may feel like "too full" or "too refined" to a reductivist mindset.


Still, when working within constraints, there is room for both approaches. Quite often, an idea is good for both sides; a simple and short algorithm, for example, may be appreciated by an optimalist because the saved bytes leave room for something more,, while a reductivist may regard
the technical concept as beautiful on its own right.


Comparison to Low-Complexity Art


Now I would like to compare my definition of Computationally Minimal Art to another concept with a somewhat similar basis: Jürgen Schmidhuber's Low-Complexity Art.


[A low-complexity face picture by Juergen Schmidhuber]

While CMA is an attempt to formalize "low-tech computer art", Schmidhuber's LCA comes from another direction, being connected to an ages-old tradition that attempts to define beauty by mathematical simplicity. The specific mathematical basis used in Schmidhuber's theory is Kolmogorov complexity, which defines the complexity of a given string of information (such as a picture) as the length of the shortest computer program that outputs it. Kolmogorov's theory works on a high level of generalization, so the choice of language does not matter as long as you
stick to it.


Schmidhuber sees, in "down-to-earth coder terms", that the human mind contains a built-in "compressor" that attempts to represent sensory input in a form as compact as possible. Whenever this compression process succeeds well, the input is perceived as esthetically pleasing. It is a well-studied fact that people generally perceive symmetry and regularity as more beautiful than unsymmetry and irregularity, so this hypothesis of a "mental compressor" cannot be dismissed as just an arbitrary crazy idea.


Low-Complexity Art tests this hypothesis by deliberately producing graphical images that are as compressible as possible. One of the rules of LCA is that an "informed viewer" should be able to perceive the algorithmic simplicity quite easily (which also effectively limits the time complexity of the algorithm, I suppose). Schmidhuber himself has devised a system based
on indexed circle segments for his pictures.


[Superego by viznut/pwp]

The above picture is from "Superego", a tiny pc demo I made in 1998. The picture takes some tens of bytes and the renderer takes less than 100 bytes of x86 code. Unfortunately, there is only one such picture in the demo, although the 4K space could have easily contained tens of pictures. This is because the picture design process was so tedious and counter-intuitive --
something that Schmidhuber has encountered with his own system as well. Anyway, when I encountered Schmidhuber's LCA a couple of years after this experiment, I immediately realized its relevance to size-restricted demoscene productions -- even though LCA is clearly a reductivist approach as opposed to the optimalism of the mainstream demoscene.


What Low-Complexity Art has in common with Computationally Minimal Art is the concern about program+data length; a minimalized Kolmogorov complexity has its place in both concepts. The relationship with other types of complexity is different, however. While CMA is concerned about all the types of complexity of the audiovisual system, LCA leaves time and memory complexity out of the rigid mathematical theory and into the domain of a "black box" that processes sensory input in the human brain. This makes LCA much more theoretical and psychological than CMA, which is mostly concerned about "how the actual bits move". In other words, LCA makes you look at
visualizations of mathematical beauty and ignore the visualization process, while CMA assigns an utmost importance to the visualizer component as well.


Psychological considerations


Now, an important question: why would anyone want to create Computationally Minimal Art for purely esthetical reasons -- novelty and counter-esthetic values aside? After all, those "very artificial bleeping sounds and flashing pixels" are quite alien to an untrained human mind, aren't they? And even many fans admit that a prolonged exposure to those may cause headache.


It is quite healthy-minded to assume that the perception mechanisms of the human species, evolved during hundreds of millions of years, are "optimized" for perceiving the natural world, a highly complex three-dimensional environment with all kinds of complex lighting and shading conditions. The extremely brief technological period has not yet managed to alter the "built-in defaults" of the human mind anyhow. Studies show, for example, that people all over the world prefer to be surrounded by wide-open landscapes with some water and trees here and there -- a preference that was fixed to our minds during our millions of years on the African savannah.


[Synchiropus splendidus, photographed by Luc Viatour]

So, the untrained mind prefers a photorealistic, high-fidelity sensory input, and that's it? No, it isn't that simple, as the natural surroundings haven't evolved independently from the sensory mechanisms of their inhabitants. Fruits and flowers prefer to be symmetric and vivid-colored because animals prefer them that way, and animals prefer them that way because it is beneficial for their survival to like those features, and so on. The natural world is full of signalling which is a result of millions of years of coevolutionary feedback loops, and this is also an important source for our own sense of esthetics. (The fish in the picture, by the way, is a Synchiropus splendidus, photographed by Luc
Viatour
.)


I'm personally convinced that natural signalling has a profound preference for low complexity. Symmetries, regularities and strong contrasts are important because they are easy and effortless to detect, and the implementation requires a relatively low amount of genetic coding on both
the "transmitter" and "receiver" sides. These are completely analogous to the various types of computational complexity.


So, why does enjoying Computationally Minimal Art require "mental training" in the first place? I think it is not because of the minimality itself but because of certain pecularities that arise from the higher complexity of the natural world. We can't see individual atoms or even cells, so we haven't evolved a built-in sense for pixel patterns. Also, the sound generation
mechanisms in nature are mostly optimized to the constraints of pneumatics rather than electricity, so we don't really hear squarewave arpeggios in the woods (although some birds may come quite close).


But even though CMA requires some special adjustment from the human mind, it is definitely not alone in this area. Our cultural surroundings are full of completely unnatural signals that need similar adjustments. Our music uses instruments that sound totally different from any animal, and
practically all musical genres (apart from the simplest lullabies, I think) require an adjustment period. So, I don't think there's nothing particularly "alien" in electronic CMA apart from the fact that it still hasn't yet integrated in our mainstream culture.


CMA unplugged


The final topic we cover here is the extent where Computationally Minimal Art, using our strict definition, can be found. As the definition is independent from technology, it is possible to find ur-examples that predate computers or even electricity.


In our search, we are ignoring the patterns found in the natural world because none of them seem to be discrete enough -- that is, they fail to have "human-countable bits". So, we'll limit ourselves to the artifacts found in human culture.


[Bubble Bobble cross-stitch from spritestitch.com

Embroidery is a very old area of human culture that has its own tradition of pixel patterns. I guess everyone familiar with electronic pixel art has seen cross-stitch works that immediately bring pixel graphics in mind. The similarities have been widely noted, and there have been href="http://www.spritestitch.com/">quite many craft projects inspired by old video games. But is this just a superficial resemblance or can we count it as Computationally Minimal Art?


[Traditional monochrome bird patterns in cross-stitch]

Cross-stitch patterns are discrete, as they use a limited set of colors and a rigid grid form which dictates the positions of each of the X-shaped, single-colored stitches. "Individual bits are perceivable" because each pixel is easily visible and the colors of the "palette" are usually easy to tell apart. The low number of pixels limits the maximum description length, and one doesn't need to keep many different things in mind while working either. Thus, cross-stitch qualifies all the parts of the definition of Computationally Minimal Art.


What about the minimization of complexity? Yes, it is also there! Many traditional patterns in textiles are actually algorithmic or at least highly repetitive rather than "fully hand-pixelled". This is somewhat natural, as the old patterns have traditionally been memorized, and the memorization is much easier if mnemonic rules can be applied.


There are also some surprising similarities with electronic CMA. Many techniques (like knitting and weaving) proceed one complete row of "pixels" at a time (analogous to the raster scan of TV-like displays), and often, the set of colors is changed between rows, which is corresponds very well to the use of raster synchronization in oldschool computer graphics. There are even peculiar technique-specific constraints in color usage, just like there are similar constraints in many old video chips.


[Pillow from 'Introduction to Fair Isle']

The picture above (source) depicts a pillow knitted with the traditional Fair Isle technique. It is apparent that there are two colors per "scanline", and these colors are changed between specific lines (compare to rasterbars). The patterns are based on sequential repetition, with the sequence changing on a per-scanline basis.


Perhaps the most interesting embroidery patterns from the CMA point of view are the oldest ones that remain popular. During centuries, the traditional patterns of various cultures have reached a kind of multi-variable optimality, minimizing the algorithmical and technical complexity while maximizing the eye-pleasingness of the result. These patterns may very well
be worth studying by electronic CMA artists as well. Things like this are also an object of study for the field of ethnomathematics, so that's another word you may want to look up if you're interested.


What about the music department, then? Even though human beings have written music down in discrete notation formats for a couple of millennia already, the notes alone are not enough for us. CMA emphasizes the role of the rendering, and the performance therefore needs to be discrete as well. As it seems that every live performance has at least some non-discrete variables, we will need to limit ourselves to automatic systems.


[A musical box]

The earliest automatic music was mechanical, and arguably the simplest conceivable automatic music system is the musical box. Although the musical box isn't exactly discrete, as the barrel rotates continuously rather than stepwise, I'm sure that the pins have been positioned in an engineer's accuracy as guided by written music notation. So, it should be discrete enough to satisfy our demands, and we may very well declare the musical box as being the mechanical counterpart of chip music.


Conclusion


I hope these ideas can provide food for thought for people interested in the various forms of "low-tech" electronic art as well as computational art or "discrete art" in general. I particularly want people to realize the universality of Computationally Minimal Art and how it works very well outside of the rigid "historical" contexts it is often confined into.


I consciously skipped all the cultural commentary in the main text on my quest for proving the universality of my idea, so perhaps it's time for that part now.


In this world of endless growth and accumulation, I see Computationally Minimal Art as standing for something more sustainable, tangible and crafty than what the growth-oriented "mainstream cultural industry" provides. CMA represents the kind of simplicity and timelessness that is totally immune to the industrial trends of fidelity maximization and planned obsolescence. It is something that can be brought to a perfection by an individual artist,
without hiring a thousand-headed army of specialists.


As we are in the middle of a growth phase, we can only guess what kind of forms Computationally Minimal Art will get in the future, and what kind of position it will eventually acquire in our cultural framework. We are living interesting times indeed.