Program Furcation |
Distributing the Feature Load |
| Abstract |
Microcontroller-level programming involves non-functional requirements and constraints
that directly affect an application's capabilities. These constraints include RAM and storage shortage,
available processing, and limited device I/O. Demands of features on distributed real-time
embedded (DRE) systems is increasing, and while some DRE system's capabilities
expand, the tiny, resource-limited devices will always have a niche in pervasive computing.
The costs of implementing and validating such systems follow typical software engineering
understanding where programming and testing far exceed the cost of the device.
In exchange of programming to the constraints of each device, a programmer can design
the application, ignoring the device limitations and focusing on the whole system requirements.
Then, a special compiler can automatically split ("furcate") the program into many executables.
The individual execution unit contains a portion of the whole system, and each unit carries the
algorothmic support to communicate with other units. Program furcation automatically distributes
resource load through smaller interconnected executables, helping field-scientists simplify the
design and testing of embedded system devices. A furcated program will be more reliable, because
clusters of devices in the field will perform as single program systems, checking and validating
each other's input. |
| |
| Problem Statement |
| Years ago, programs had to fit within the confines of a platform with constrained resources.
There were several work-arounds to solve this problem: prioritize and scale back the
requirements, hand-code assembler to optimize speed and/or footprint, compress storage,
etc. Most of the techniques to economize the resources were implemented by those with
years of computing experience. While this problem evaporated as the resource limitations
lifted, the problem still remains wholly unsolved. |
| |
| Core Problems |
- At present, pervasive devices require more resources (CPU performance, available
RAM/ROM/flash, I/O) than are available.
- Resources impose loads on each other. For example, a storage resource (like flash)
and a communication module (like radio) require special algorithms (fireware space)
and dedicated CPU time. Adding resources can thus increase the utilization of others.
- In the future, despite advances in miniaturization and hybridization of components
which increase resources, there will always be a need for low-resource devices/platforms.
|
| |
| Possible Solution: Automatic Program Furcation |
In the domain of pervasive computing where each device uses a radio transponder, one possible
solution is to break-up each assigned task (vertical
feature) and place it on individual devices. The devices request information or
operations from peer devices. Such an approach implies that a fully functional "device"
comprises more than one device in a field.
Thesis: Program furcation will automatically distribute
resource load through smaller interconnected executables, helping field-scientists simplify the design
and testing of embedded system devices. |
| |
| What It is Not |
| Offloading |
Unlike Spectra which offloads CPU- or storage-intensive tasks on to a host computer, Furcated Programming
distributes the tasks between related devices in a network cluster. The devices, then, act as one device outside their cluster. |
| Parallelization |
Increasing parallelism is not the main objective. Instead, the goal of this process is to distribute payload or workload onto multiple devices, and using the intrinsic communication capability, get the devices to work as one application.
The reason is simply to address the cost vs. capability issue. The cost of manufacture of the "smart dust" will continue to decrease, but the cost of internal features will rise dramatically as the applications become more complex. |
|
| |
|
| Program Furcation Examples |
| To illustrate the need for Program Furcation here are some constraint types: |
| Constraint | Type | * | Action |
| Memory | Large Executable | A | Modules split into inter-communicating blocks. |
| Large Transient Data | B | Modules split into inter-communicating blocks. |
| CPU | Extra-long algorithm | C | Unit dedicated to computation *or* computation split off. |
| Unschedulable tasks statically found | D | Bumped tasks split off *or* priority balanced between nodes. |
| Device I/O | Higher data resolution | E | Each unit shares responsibility & primary node collects info. |
| Readings verification | F | All readings checked for reliability & common reading used. |
| Storage | Rolling data logs | G | Distribution/redundancy to counteract flash aging. |
| Accumulated data storage | H | ?? (difficult to determine at build time due to DB variability) |
| Augmented functionality | I | Replacement/additional of related functionality. |
|
| Examples |
| A | Embedded CORBA application; Fourier algorithm (for purpose, see C). |
| B | Rulebase that predicts geologic major disturbances using seismic readings. |
| C | Fourier transform on data readings looking for unusual spikes in data transitions. |
| D | A system that requires a lot of I/O and several small tasks. |
| E | Collection of all light sensors for better resolution. (This is not a very good example for many reasons.) |
| F | Needing a very reliable data reading, all the sensors could be read and the odd ones discarded. (Not a very good example either.) |
| G | Informational or statistical data with a sliding window which does not require long-term storage. |
| H | Long-term storage for later collection like data readings. |
| I | Updated/patched program code or data tables. This is especially a problem, because of the resources involved; specifically, because of the new features, is one mote reprogrammed or the entire cluster? Another difference is it's implicitly in the field. |
|
| |
|
| What Does the Programming Environment Look Like? |
There are, of course, boundaries to the proposed solution. First, the device must have some
broadcast communication device; second, the communication must be balanced against
the corresponding power consumption (max-flow/min-cut); and third,
not all datatypes are good transmission material. The imposed boundaries might begin to
rule out out-of-the-box programming languages, and the changes required to the languages
are an important part of the research.
|
| Issues |
- Data: Pointers — The problem of transmitting pointers has been partly solved as early as UNIX
RPCs where the pointer is 'copied' along with its data using a deep copy. The new 'pointer' becomes an offset
into the data block. Each pointer is followed and copied into a block of offsets (care must be taken for security
implications). At the same time, arrays and large linked structures (records) may result in the Fat Data issue.
The compiler provides the proper detail needed for initiating the furcate function, hence the focus on compiler-
level implementation. Anonymous data (void*) and some unions unavoidably excluded from support, because
strong types must be known ahead of time. Similarly, pointer aliasing is reportedly problematic.
- Data: Fat Data — Fat data (large data block) is strongly discouraged but may still be unavoidable.
In most cases, the data being transmitted is fairly simple in type. Still, there may be arrays of that type. An
option of compressing the fat data can help reduce the transmission flow.
- Data: Strong Coupling — Since the idea is to use a max-flow/min-cut approach, the hope is that
there will be loose coupling along the connection. Still, even strong coupling may be required. Therefore, the
coupling is treated like fat data and packed together as a package. The receiver unpacks the data for the
algorithm to use.
- Networking: Transmission Loss — Transmission loss is a solved problem in connected/unconnected
networks using sequencing and retransmission.
- Networking: Message Echo — Again, this problem is solved using well-defined networking protocols.
- Networking: Dead/Sleeping Destination — The dead destination problem may be solvable because
other nearby devices may provide similar services. Still, the nodes should be synchronized so that they all are
awake at the same time, or at least have a wake-up procedure. Otherwise, the requesting node will have to
incorporate a timeout and recovery state machine.
|
| |
| Ideas |
- Contracts — The connection between each program slice would require foundational handshaking
and interfacing datatypes. These can be set up in a contract at the beginning of the interface negotiation. As
needed, a program slice could support different contracts based upon the requesters' needs. For instance, the
data interface could either be integers or strings.
- Resource Sharing — Using this concept in "smart dust" opens opportunities of resource sharing
where if there are features A, B, C, the combined group of primary devices (A) could share nearby B and C
devices. This might extend the life of the units by reducing singular P2P communications.
|
|
| |
|
| Target Program Sections |
| The concept of Program Furcation depends on two main factors: data encapsulation and program
modularity. Of course, some programming languages are somewhat behaviorly challenged in this regard, so it's
hard to stick to any promises. The following table describes the different types of program transitions and identifies
their suitability for program furcation A and B represent separate and distinct modules, and the split point is
represented with a lightning mark. |
 | Straight-line Transition
The calling sequence from A to B is a good candidate, because it has a clean interface. |
 | Transition into Loop/Recursion
A call from A to B which then recurses or loops is a potentially good candidate unless B neither concludes nor returns. |
 | Mutual-Recursion
A mutually recursive call between A and B may not be a suitable case because of lengthy data exchange unless
enough computational time passes between transmission (difficult to do with static analysis). |
 | Root Loop/Recursion
Splitting a recursive A call is clearly not useful, because each target will have the same code (A). However,
sharing responsibilities, like verification measures or duplicate sensor readings, might use this type of algorithmic split. |
| Another way to look at the problem is by call-tree analysis. The program is divided into steps
(labeled A, B, C), transitions (arrows), and data (labeled X). The goal is to identify the best place to break a transition. |
 | A Transitions to B which Accesses data X
Simple encapsulation. B provides read/write accessors; data path revolves around X. |
 | A Transitions to B which Transititions to C
Simple call tree path. Data path depends on what B & C requires and how much entanglement there is between A & B. |
 | A Transitions to B which Multiplexes to C1 - Cn
Common in normal call tree growth. (Typical data transfer is TBD.) |
 | Many 'A's Transition to B which Transitions to C
This is common when a program makes library calls. (Typical data transfer is TBD.) |
 | Many 'A's Transition to B which Multiplexes to C1 - Cn
Again, found in library calls that require a lot of internal support. (Typical data transfer is TBD.) |
|
| |
|
| Language Elements Affecting Furcation |
| Just as there are patterns and antipatterns that impact the capabilities automatic furcation, there are elements to be found in
languages that will affect the process. To be used, perhaps, as rules of thumb, this is not intended to be an exhaustive
list nor is it to be exclusive. The goal is to identify elements that logically and hypothetically impact the furcation of programs. |
| Element | Impact Description |
| Abstraction | |
| Inheritance | |
| Encapsulation |
Encapsulation is crucial for furcation to work properly. No data should be exposed to outside readers/writers. |
| Componentization |
This is, in my opinion, an extension to Encapsulation and is thoroughly welcome to the whole process. |
| Delegation |
Delegation is Microsoft's term for function pointer passing (I believe). At this moment, I cannot see a problem with it
as long as encapsulation rules are followed. |
| Data Aliasing | |
|
| |
|
| Where is the Science?[Thinking out loud...] |
Computer science vs. software engineering in this topic is an on-going debate. I guess I need to discern the
distinctions better, beyond the basic: "the science can be measured." Certainly, it addresses a specific need and is fraught with
complications, like C-pointers; and it answers a specific problem with a programmatic solution (SMoP). But, how to make it a
science?
It is not a mathematical model, but has some characteristics, because dividing up a program involves the coupling between
program fragments. It is not a network model which identifies the communication of packets, but packet sizes are a crucial
piece to the puzzle: vis. large packets shorten motes' life. It could be compiler theory, since tracing paths and making judgment
calls on 'furcatability' requires information about data types.
The classic view of science is to observe something, hypothesize a generalized model, test the model, and refine until the model
stabilizes. Of course, linked lists follow directed graph theory, so there is an actual model to build from. However, much of CS
is based on artificial foundations (hence, "technos" in technology). Much of it appears to be: create an idea, generate an
implementation of it, generalize it, etc., all of which seems a little backwards. Still, the concept of furcating a program has some
merit and has a long, vested history. Some have tried to use this concept to increase parallelism, but the result was lackluster,
at best.
I did realize that some problems with memory may be already answered in the target itself. Usually, embedded devices have
limited or no memory allocation. Of course, depending on such an assumption, limits the "big picture" of Program Furcation:
the ability to program an application without regard to resources. The stage of creating interfacing contracts has significant
attraction, too, because the connections can be irrespective of the application. This means, for example, that two applications
that share a particular use case could, in fact, share the same program slice.
The irony is that I do not have to cover the entire space in order to make this problem "interesting." I found earlier in my
career that a complete answer is often less interesting that a carved niche in a wide open field. For example, distinguishing
PostScript from a program listing using an automatic language detector is simply not 100% possible, but I proved that one can
get very close (99.992%), thus making the problem and the solution computationally "interesting" and patentable. Therefore, I
believe that while a 100% may be unreachable, how close can we get? |
| |
|
| Related Work |
Coign is similar to this system; however, the goals
and methods are different. Coign attempted to increase parallelism by distributing the work
load, whereas Program Furcation attempts to split programs to fulfill resource requirements. Again,
Coign uses the program binary, and Furcation is integral with the compiler, and therefore, more
information about the program is available.
Detours uses method-call rewrites to intercept DLL calls from functions. It is weakly
similar to Program Furcation in that it modifies the structure of the intended program, but its goal differs
significantly. |
| |
| References |
| [1] | Galen C. Hunt and Michael L. Scott. The Coign Automatic Distributed Partitioning System.
Proceedings of the Third Symposium on Operating System Design and Implementation (OSDI '99),
pp. 187-200. New Orleans, LA, February 1999. USENIX. |
| [2] | Galen C. Hunt. Detours: Binary interception of win32 functions. In Proc. of the 3rd USENIX Windows
NT Symp., Seattle, WA, July 1999. |
| [3] | T. Leighton and S. Rao. An approximate max-flow min-cut theorem for uniform multicommodity
flow problems with applications to approximation algorithms. In Proceedings of the 29th Annual
Symposium on Foundations of Computer Science, pages 422-431, 1988. |
| [4] | Eli Tilevich and Yannis Smaragdakis. J-orchestra: Automatic java application
partitioning. In Proceedings of the 16th European Conference on Object-Oriented
Programming, volume 2374 of LNCS, pages 178-204. Springer-Verlag, 2002. |
| [5] | Daniel J. Palermo and Prithviraj Banerjee. Automatic Selection of Dynamic
Data Partitioning Schemes for Distributed-Memory Multicomputers. Technical Report CRHC-95-09, Center for Reliable and
High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Ill.
| | [6] | Jason Flinn and SoYoung Park and M. Satyanarayanan. Balancing
Performance, Energy, and Quality in Pervasive Computing. In Proceedings of the 22nd International Conference on
Distributed Computing Systems, Vienna, Austria, July 2002.
| |
| |
|
| Improved? Names |
| • | Compiler-level Automatic Program Splitting (CAPS) |
| • | Compiler-Level Automatic Program Splitting (CLAPS) |
| • | Distributed Use State Transmission (DUST) |
| • | Distributed Use caSe Transfiguration (DUST) |
| • | Distributed Use Case Transfiguration Around Point-sized Environments (DUCTAPE) |
| • | Directly Implanted, Scattered Components over Examining Domains (DISCoverED) |
| • | Sliced Computational Headaches In Zone-Origined Identical Devices (SCHIZOID) |
| • | Compiler-Level Automatic Program-Trace Reduction Algorithm Partitioning (CLAP-TRAP) |
|
|
| Copyright © 2001-2006 Intelligent Algorithmic Solutions and Sean Walton |
eMail |