Research Projects
Last updated: Wednesday, 26-Apr-2006 12:40:51 MDT  

Program Furcation

Distributing the Feature Load

Abstract

Microcontroller-level programming involves non-functional requirements and constraints that directly affect an application's capabilities. These constraints include RAM and storage shortage, available processing, and limited device I/O. Demands of features on distributed real-time embedded (DRE) systems is increasing, and while some DRE system's capabilities expand, the tiny, resource-limited devices will always have a niche in pervasive computing. The costs of implementing and validating such systems follow typical software engineering understanding where programming and testing far exceed the cost of the device.

In exchange of programming to the constraints of each device, a programmer can design the application, ignoring the device limitations and focusing on the whole system requirements. Then, a special compiler can automatically split ("furcate") the program into many executables. The individual execution unit contains a portion of the whole system, and each unit carries the algorothmic support to communicate with other units. Program furcation automatically distributes resource load through smaller interconnected executables, helping field-scientists simplify the design and testing of embedded system devices. A furcated program will be more reliable, because clusters of devices in the field will perform as single program systems, checking and validating each other's input.

 
Problem Statement
Years ago, programs had to fit within the confines of a platform with constrained resources. There were several work-arounds to solve this problem: prioritize and scale back the requirements, hand-code assembler to optimize speed and/or footprint, compress storage, etc. Most of the techniques to economize the resources were implemented by those with years of computing experience. While this problem evaporated as the resource limitations lifted, the problem still remains wholly unsolved.
 
Core Problems
  1. At present, pervasive devices require more resources (CPU performance, available RAM/ROM/flash, I/O) than are available.
  2. Resources impose loads on each other. For example, a storage resource (like flash) and a communication module (like radio) require special algorithms (fireware space) and dedicated CPU time. Adding resources can thus increase the utilization of others.
  3. In the future, despite advances in miniaturization and hybridization of components which increase resources, there will always be a need for low-resource devices/platforms.
 
Possible Solution: Automatic Program Furcation
In the domain of pervasive computing where each device uses a radio transponder, one possible solution is to break-up each assigned task (vertical feature) and place it on individual devices. The devices request information or operations from peer devices. Such an approach implies that a fully functional "device" comprises more than one device in a field.

     Thesis: Program furcation will automatically distribute resource load through smaller interconnected executables, helping field-scientists simplify the design and testing of embedded system devices.
 
What It is Not
Offloading Unlike Spectra which offloads CPU- or storage-intensive tasks on to a host computer, Furcated Programming distributes the tasks between related devices in a network cluster. The devices, then, act as one device outside their cluster.
Parallelization Increasing parallelism is not the main objective. Instead, the goal of this process is to distribute payload or workload onto multiple devices, and using the intrinsic communication capability, get the devices to work as one application. The reason is simply to address the cost vs. capability issue. The cost of manufacture of the "smart dust" will continue to decrease, but the cost of internal features will rise dramatically as the applications become more complex.
 

Program Furcation Examples
To illustrate the need for Program Furcation here are some constraint types:
ConstraintType*Action
Memory Large ExecutableAModules split into inter-communicating blocks.
Large Transient DataBModules split into inter-communicating blocks.
CPU Extra-long algorithmCUnit dedicated to computation *or* computation split off.
Unschedulable tasks statically foundDBumped tasks split off *or* priority balanced between nodes.
Device I/OHigher data resolutionEEach unit shares responsibility & primary node collects info.
Readings verificationFAll readings checked for reliability & common reading used.
Storage Rolling data logsGDistribution/redundancy to counteract flash aging.
Accumulated data storageH?? (difficult to determine at build time due to DB variability)
Augmented functionalityIReplacement/additional of related functionality.
Examples
AEmbedded CORBA application; Fourier algorithm (for purpose, see C).
BRulebase that predicts geologic major disturbances using seismic readings.
CFourier transform on data readings looking for unusual spikes in data transitions.
DA system that requires a lot of I/O and several small tasks.
ECollection of all light sensors for better resolution. (This is not a very good example for many reasons.)
FNeeding a very reliable data reading, all the sensors could be read and the odd ones discarded. (Not a very good example either.)
GInformational or statistical data with a sliding window which does not require long-term storage.
HLong-term storage for later collection like data readings.
IUpdated/patched program code or data tables. This is especially a problem, because of the resources involved; specifically, because of the new features, is one mote reprogrammed or the entire cluster? Another difference is it's implicitly in the field.
 

What Does the Programming Environment Look Like?
There are, of course, boundaries to the proposed solution. First, the device must have some broadcast communication device; second, the communication must be balanced against the corresponding power consumption (max-flow/min-cut); and third, not all datatypes are good transmission material. The imposed boundaries might begin to rule out out-of-the-box programming languages, and the changes required to the languages are an important part of the research.
Issues
  • Data: Pointers — The problem of transmitting pointers has been partly solved as early as UNIX RPCs where the pointer is 'copied' along with its data using a deep copy. The new 'pointer' becomes an offset into the data block. Each pointer is followed and copied into a block of offsets (care must be taken for security implications). At the same time, arrays and large linked structures (records) may result in the Fat Data issue. The compiler provides the proper detail needed for initiating the furcate function, hence the focus on compiler- level implementation. Anonymous data (void*) and some unions unavoidably excluded from support, because strong types must be known ahead of time. Similarly, pointer aliasing is reportedly problematic.
  • Data: Fat Data — Fat data (large data block) is strongly discouraged but may still be unavoidable. In most cases, the data being transmitted is fairly simple in type. Still, there may be arrays of that type. An option of compressing the fat data can help reduce the transmission flow.
  • Data: Strong Coupling — Since the idea is to use a max-flow/min-cut approach, the hope is that there will be loose coupling along the connection. Still, even strong coupling may be required. Therefore, the coupling is treated like fat data and packed together as a package. The receiver unpacks the data for the algorithm to use.
  • Networking: Transmission Loss — Transmission loss is a solved problem in connected/unconnected networks using sequencing and retransmission.
  • Networking: Message Echo — Again, this problem is solved using well-defined networking protocols.
  • Networking: Dead/Sleeping Destination — The dead destination problem may be solvable because other nearby devices may provide similar services. Still, the nodes should be synchronized so that they all are awake at the same time, or at least have a wake-up procedure. Otherwise, the requesting node will have to incorporate a timeout and recovery state machine.
 
Ideas
  • Contracts — The connection between each program slice would require foundational handshaking and interfacing datatypes. These can be set up in a contract at the beginning of the interface negotiation. As needed, a program slice could support different contracts based upon the requesters' needs. For instance, the data interface could either be integers or strings.
  • Resource Sharing — Using this concept in "smart dust" opens opportunities of resource sharing where if there are features A, B, C, the combined group of primary devices (A) could share nearby B and C devices. This might extend the life of the units by reducing singular P2P communications.
 

Target Program Sections
The concept of Program Furcation depends on two main factors: data encapsulation and program modularity. Of course, some programming languages are somewhat behaviorly challenged in this regard, so it's hard to stick to any promises. The following table describes the different types of program transitions and identifies their suitability for program furcation A and B represent separate and distinct modules, and the split point is represented with a lightning mark.
Straight-line Transition
The calling sequence from A to B is a good candidate, because it has a clean interface.
Transition into Loop/Recursion
A call from A to B which then recurses or loops is a potentially good candidate unless B neither concludes nor returns.
Mutual-Recursion
A mutually recursive call between A and B may not be a suitable case because of lengthy data exchange unless enough computational time passes between transmission (difficult to do with static analysis).
Root Loop/Recursion
Splitting a recursive A call is clearly not useful, because each target will have the same code (A). However, sharing responsibilities, like verification measures or duplicate sensor readings, might use this type of algorithmic split.
Another way to look at the problem is by call-tree analysis. The program is divided into steps (labeled A, B, C), transitions (arrows), and data (labeled X). The goal is to identify the best place to break a transition.
A Transitions to B which Accesses data X
Simple encapsulation. B provides read/write accessors; data path revolves around X.
A Transitions to B which Transititions to C
Simple call tree path. Data path depends on what B & C requires and how much entanglement there is between A & B.
A Transitions to B which Multiplexes to C1 - Cn
Common in normal call tree growth. (Typical data transfer is TBD.)
Many 'A's Transition to B which Transitions to C
This is common when a program makes library calls. (Typical data transfer is TBD.)
Many 'A's Transition to B which Multiplexes to C1 - Cn
Again, found in library calls that require a lot of internal support. (Typical data transfer is TBD.)
 

Language Elements Affecting Furcation
Just as there are patterns and antipatterns that impact the capabilities automatic furcation, there are elements to be found in languages that will affect the process. To be used, perhaps, as rules of thumb, this is not intended to be an exhaustive list nor is it to be exclusive. The goal is to identify elements that logically and hypothetically impact the furcation of programs.
ElementImpact Description
Abstraction
Inheritance
Encapsulation Encapsulation is crucial for furcation to work properly. No data should be exposed to outside readers/writers.
Componentization This is, in my opinion, an extension to Encapsulation and is thoroughly welcome to the whole process.
Delegation Delegation is Microsoft's term for function pointer passing (I believe). At this moment, I cannot see a problem with it as long as encapsulation rules are followed.
Data Aliasing
 

Where is the Science?[Thinking out loud...]

Computer science vs. software engineering in this topic is an on-going debate. I guess I need to discern the distinctions better, beyond the basic: "the science can be measured." Certainly, it addresses a specific need and is fraught with complications, like C-pointers; and it answers a specific problem with a programmatic solution (SMoP). But, how to make it a science?

It is not a mathematical model, but has some characteristics, because dividing up a program involves the coupling between program fragments. It is not a network model which identifies the communication of packets, but packet sizes are a crucial piece to the puzzle: vis. large packets shorten motes' life. It could be compiler theory, since tracing paths and making judgment calls on 'furcatability' requires information about data types.

The classic view of science is to observe something, hypothesize a generalized model, test the model, and refine until the model stabilizes. Of course, linked lists follow directed graph theory, so there is an actual model to build from. However, much of CS is based on artificial foundations (hence, "technos" in technology). Much of it appears to be: create an idea, generate an implementation of it, generalize it, etc., all of which seems a little backwards. Still, the concept of furcating a program has some merit and has a long, vested history. Some have tried to use this concept to increase parallelism, but the result was lackluster, at best.

I did realize that some problems with memory may be already answered in the target itself. Usually, embedded devices have limited or no memory allocation. Of course, depending on such an assumption, limits the "big picture" of Program Furcation: the ability to program an application without regard to resources. The stage of creating interfacing contracts has significant attraction, too, because the connections can be irrespective of the application. This means, for example, that two applications that share a particular use case could, in fact, share the same program slice.

The irony is that I do not have to cover the entire space in order to make this problem "interesting." I found earlier in my career that a complete answer is often less interesting that a carved niche in a wide open field. For example, distinguishing PostScript from a program listing using an automatic language detector is simply not 100% possible, but I proved that one can get very close (99.992%), thus making the problem and the solution computationally "interesting" and patentable. Therefore, I believe that while a 100% may be unreachable, how close can we get?

 

Related Work
Coign is similar to this system; however, the goals and methods are different. Coign attempted to increase parallelism by distributing the work load, whereas Program Furcation attempts to split programs to fulfill resource requirements. Again, Coign uses the program binary, and Furcation is integral with the compiler, and therefore, more information about the program is available.

Detours uses method-call rewrites to intercept DLL calls from functions. It is weakly similar to Program Furcation in that it modifies the structure of the intended program, but its goal differs significantly.
 
References
[1]Galen C. Hunt and Michael L. Scott. The Coign Automatic Distributed Partitioning System. Proceedings of the Third Symposium on Operating System Design and Implementation (OSDI '99), pp. 187-200. New Orleans, LA, February 1999. USENIX.
[2]Galen C. Hunt. Detours: Binary interception of win32 functions. In Proc. of the 3rd USENIX Windows NT Symp., Seattle, WA, July 1999.
[3]T. Leighton and S. Rao. An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms. In Proceedings of the 29th Annual Symposium on Foundations of Computer Science, pages 422-431, 1988.
[4]Eli Tilevich and Yannis Smaragdakis. J-orchestra: Automatic java application partitioning. In Proceedings of the 16th European Conference on Object-Oriented Programming, volume 2374 of LNCS, pages 178-204. Springer-Verlag, 2002.
[5]Daniel J. Palermo and Prithviraj Banerjee. Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers. Technical Report CRHC-95-09, Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Ill.
[6]Jason Flinn and SoYoung Park and M. Satyanarayanan. Balancing Performance, Energy, and Quality in Pervasive Computing. In Proceedings of the 22nd International Conference on Distributed Computing Systems, Vienna, Austria, July 2002.
 

Improved? Names
Compiler-level Automatic Program Splitting (CAPS)
Compiler-Level Automatic Program Splitting (CLAPS)
Distributed Use State Transmission (DUST)
Distributed Use caSe Transfiguration (DUST)
Distributed Use Case Transfiguration Around Point-sized Environments (DUCTAPE)
Directly Implanted, Scattered Components over Examining Domains (DISCoverED)
Sliced Computational Headaches In Zone-Origined Identical Devices (SCHIZOID)
Compiler-Level Automatic Program-Trace Reduction Algorithm Partitioning (CLAP-TRAP)

Copyright © 2001-2006 Intelligent Algorithmic Solutions and Sean Walton eMail