Nanite world
My article about the soup is here.
nanite, n. an electromechanical device measured in nanometers (microscopic size) and being developed for medical uses
Nanites in sense of software could be basic units of computation, simple, mobile entities that execute one function. The outer layer of a nanite would be its interface, built of 1s and 0s. Two nanites could "couple" by matching a limited portion of their interface (1 to 0 and 0 to 1). When one nanite couples another, it will execute the new nanite's function prior to its own. Nanite code can be written in assembly language since the instructions can be mutated to produce new functionality. The operating system then becomes a nanite factory where nanites are grown to perform certain functions. We call this "directed evolution". Software is then written as a body of nanites that are interfacing with each other in various ways to achieve various desired functionality. Since nanites are mobile and the operating system is nothing but a container, work can be distributed across machines and natural parallelization can occur. Nanites can also be trained in certain specific tasks - such as software repair. Linux is a suitable operating system to implement this idea on. Work in progress.
Code search
This is a possible improvement to Google's code search (http://code.google.com/). In the field of bioinformatics there is a tool called BLAST (Basic Local Alignment Search Tool) which is used to find similarities between protein or nucleotide sequences. The paper is here. It is used (usually) when a biologist has isolated a sequence which seems to have an unknown function. This sequence is then taken and compared against a database of known sequences (this database is manually human-curated). If a statistically significant hit is produced, we can (maybe) conclude that the two sequences have similar functionality. I have been looking into ways of translating this into the field of code search. Perhaps we can find a statistically justified way of "signing" code functions in a certain language (like C or assembly or python) and then create a database of these signatures with human curated functionality descriptions. Then we can have a program take a source code of some software and run it against this database. Perhaps our program will then be able to "explain" in human terms what the software we are analyzing is doing? Work in progress.
Block Diagrammer
This could be an improvement to the way code is written (or maybe programming is taught) - automatic block-diagramming of code. The idea is nothing new but I am not sure I have seen a good diagrammer outhere. For studying large projects (such as, e.g. the linux kernel), people resort to all sorts of trickery. Maybe it is vi+ctags, maybe it is emacs, maybe just editor+grep, either way, the 30,000 foot view is in the connections between various functions (who calls who and what is being passed on). So, below is a C tokenizer written in Python (pretty manual labor but it is a quick write) to start it off. Next step would be to produce a full parser that will be integrated into a block-diagram producing module. Obviously, the ultimate test would be to run it on the linux kernel code or something similarly large and complicated. Here is a C version of the tokenizer. It is missing a parser obviously but I hand-coded the tokenizer (which wasn't difficult).
ClustalW multi-threaded
Below is a parallel version of the ClustalW tool for multiple sequence alignment. ClustalW is a widely used tool but unfortunately is sequential in nature (which makes it run for a very long time). There is a binary version that was sold by SGI (no code, sorry) and ran on Irix. Also, there is an MPI version outhere for a cluster. My version is a simple smp-machine based pthreads implementation. It has been quoted in a bunch of scientific papers (here, here and here)
GACrack
Cracking the crypt(3) algorithm with genetic algorithms. The approach might be OKay but need to work out the math (can it be done? probably not), Anyways, code in python here.
Cellular Automata and the Stock Market
In my daily work I see a lot of stock market data. Here is how Standard and Poor's 500 index looks across one year (as a cellular automaton). The red dots are returns down and white dots returns up or not changed. Black lines represent a stock that fell out of the index due to a corporate action (see more on cellular automata below). This image was generated with a little Python script using the Python Imaging Library (PIL). Pretty cool, huh? 
Cellular Automaton view of DNA evolution
My project at the Wolfram Science Summer School was one that had been bugging me forever (and still does): can we model DNA or RNA evolution as a cellular automaton rule. Dr. Stephen Wolfram writes in his book "A new Kind of Science" about various rules that create essentially four classes of behavior among cellular automata - one class is behaviour that quickly ends up being repetitive (and uninteresting). The second class is nested repetitive behavior which is essentially as uninteresting as the first class. Classes three and four however, exhibit complex, random and intricate patterns that develop, go away and/or persist among the visual representation of the automaton evolution. It is these two classes that are very interesting because they seem to indicate that complex behavior can be created out of very simple rules and even simpler initial conditions. So, the basic question for me is, are we today a result of a simple rule or set of rules applied to an initial simple DNA chain over and over for billions of years? Can we model evolution as a cellular automaton or a set of automata working together? One obvious problem is that we do not have any DNA molecules which are ancient (well, that has changed a bit with the discovery of a 70-million years old T-Rex bone containing the original undamaged tissues). Having no ancient DNA of, for example, yeast, we cannot run models to compare it to today's yeast (which had by the way gone through several genome duplications/deletions). As Dr. Wentian Li pointed out in our private email exchange, perhaps we can have more success using Chimpanzee and Human genes since the speciation event hapenned "only" 5 million years ago or so. I have recently been discussing a different approach to this problem with Dr. Dong Xu. Work in progress....
contact me at ognen @ naniteworld.com