KOLA C++ Language-Specific Layer
Design Sketch
1. Introduction
The purpose of the C++ Language-Specific Layer of KOLA (CPPL) is to create an easy and maximally transparent interface necessary to create C++ programs that take advantage of Khazana’s distributed infrastructure, as well as port existing C++ code on top of Khazana. The Language-Independent Layer (LIL) is designed to support abstract object structures on top of the flat Khazana’s shared address space, and provides efficient mechanisms to bring those objects into the local virtual memory, maintain, manipulate and synchronize them. However, its support can not be extended to such vital issues as
dynamic type identification and checking, object access detection and loading, object polymorphism and class inheritance, transparent concurrency control, etc., due to their heavy reliance on the programming language built-in support, which varies greatly among different object-oriented languages. This document describes one possible approach to implementing such object support for programs written in C++.
2. CPPL Design
CPPL implementation relies heavily on the LIL (independent layer). Thought this section, the following API is used for the LIL:
// allocates memory for a new object in Khazana
object_id_t kh_malloc(sie_t size);
// frees Khazana object
void kh_free(object_id_t oid);
// fetches khazana object into VM, with a lock context passed by reference and
// the operation mode (read, write, default)
Obj* obj_fetch(object_id_t oid, lock_ctx_t* lctx, op_mode_t op_mode);
// Releases the object from memory
void obj_release(Obj* obj);
// swizzles oid, returning a VM pointer
Obj* swizzle(object_id_t oid);
// unswizzles the object, returning its oid, given a VM pointer
object_id_t unswizzle(Obj* obj);
obj_fetch() utilizes the SPPL-generated swizzle-object(void* obj, kh_tag* tag) function that initializes an object based on the type and size information stored in its tag. For more information on swizzle-object, see sec. 2.1.3.1.
2.1. Object Storage and Retrieval
This design assumes that the C++ compiler has a full knowledge of the complete object hierarchy for any object stored or retrieved under CPPL. Furthermore, any C++ environment that retrieves an object must be able to recognize that object’s layout, have correspondent entries in the virtual table, and share all additional metadata format with the environment that originally initialized that object in Khazana, which can be ensured by using the same version of C++ compiler anywhere the same object is accessed (see section 3.3).
Given the possibility that an object may be loaded and viewed through one of its superclasses, CPPL must have an independent way of recognizing the object’s true type, as well as its real size. To solve this problem, we propose adding an extra tag to each object stored in Khazana, which would contain its type id and its size. We assume that the total number of types represented will not exceed 32767 (with typeid = 0 signaling an error type), and that its size will not exceed 32768 bytes, so that the tag is 32 bits long. However, given the possibility that such assumptions may become too restrictive in the future, and that we might want to include other information in the tags later on, it is our intent to treat tag size and contents with much flexibility during the CPPL implementation.
It should be noted that the tag’s type representation can at most provide mapping to the objects true name, and transparently initialize the object upon retrieval. However, the retrieving program must also contain all the necessary information (i.e. include all relevant .h files) about the true type of the fetched object, and not just the type that it is being loaded as. Consider the following exampleª :
Class A{ int I; virtual void Print() {…}};
Class B: public A{void Print() {…}};
Program 1:
B* b = khazana_new B;
b->I = 10;
Program 2:
A* a = kh_retrieve(b-reference);
a->Print();
Even though Program 2 may never refer to the retrieved object as an instance of class B, it must still contain all the information about class B in order for CPPL to be able to load the object correctly, and invoke the right virtual methods.
2.1.1. Transparent Swizzling and "Smart" Pointers
Any program built on top of Khazana should be able to utilize two different kinds of objects – persistent Khazana objects, stored in Khazana address space and brought into the local memory, as well as purely local objects. Indeed, it is unnecessary and even wasteful to mandate that all objects used by a program be stored in Khazana, which would have a prohibitive performance cost.
Instead, the programmer must be able to designate certain object references as instances of persistent Khazana objects maintained in Khazana address space and automatically swizzled and unswizzled when necessary, while others can be instantiated and treated as traditional C++ objects residing exclusively in the local virtual memory. While the local objects require no special treatment beyond what C++ language environment already provides, we must create an infrastructure to deal with the persistent object references, preferably much in the same way we deal with local pointers. We propose creating a template class that represents such persistent references, as following:
template <class T>
class Ref {
object_id_t address;
public:
/* Constructors */
Ref<T> (object_id_t oid) : address (oid);
Ref<T> (T* ptr) : address(unswizzle(ptr)) {};
Ref<T> () : address (0) {};
/******************** Pointer-style operations *****************/
// Cast operator
operator T* () {return (T*) kh_retrieve§ § (address);}
// Overload arrow
T* operator->() {return (T*) kh_retrieve(address);}
// Overload star
T& operator*() {return *(T*)kh_retrieve(address);}
// overload assignment to a regular pointer
T* operator= (T* ptr) {address = unswizzle(ptr); return ptr;}
// overload assignment to a "smart" pointer
Ref<T>& operator= (Ref<T>& ptr) {address = ptr.address; return ptr;}
}; // end of Ref<T> declaration
As we can see, method framework for the "smart" pointers proposed above allows for transparent access and manipulation of Khazana persistent objects, and their interaction with traditional C pointers. Using the operators described above, a Khazana object can be retrieved, dereferenced, assigned to another reference, casted, and so on. CPPL implementation of the overloaded operators is responsible of swizzling and unswizzling Khazana references (oid’s) as necessary.
2.1.2. Creating and Deleting Khazana Objects
To allow the creation of Khazana objects, CPPL would add overloaded new and delete operators to each class, as following:
class O {
public:
// Khazana new
void* operator new(size_t size, __KhClass __kh){
object_id_t oid = kh_malloc(size, GetTag());
if (!oid) return NULL;
void* obj = obj_fetch(oid, getLockCtx(), OP_MODE_DEFAULT);
return obj;
}
// Regular new
void* operator new(size_t size) {return ::new(size);}
// Khazana Delete
void operator delete (void* obj, __KhClass __kh) {
object_id_t oid = unswizzle(obj);
kh_free(oid);
}
// Regular Delete
void operator delete (void* ptr) {::delete (ptr);}
…
};
Notice the use of a __KhClass instance __kh, which is necessary to distinguish between the local and persistent new and delete operators. Thus, to create persistent instances of class O in Khazana, or to delete one, one would use the following syntax:
Ref<O> o = new(__kh) O(…); // persistent new, calling desired constructor
delete(__kh) o; // persistent delete
Local instances of class O will be created and deleted as usual:
O* o = new O;
delete o;
2.1.3. Loading and Storing Khazana Objects
CPPL will provide a function to load Khazana objects into memory given their oid’s - kh_retrieve(); and store memory objects in Khazana, returning apropriate oid’s – kh_store(). In a sense, these functions are high-level equivalents of swizzle() and unswizzle(), that also provide type checking, re-initialize object metadata used by C++ compilers, and handle Khazana object tags. kh_retrieve() and kh_store() will be dynamically constructed by the CPPL processor after parsing all the objects.
2.1.3.1. kh_retrieve() and kh_object_swizzle() Functions
__KhClass __kh; // our dummy class to call the appropriate constructor
// Retrieves a Khazana object given its oid
template <class T> T* kh_retrieve(object_id_t oid, T*)
{
// swizzle the pointer into local memory
void* ptr = obj_fetch(oid, getLockCtx(), OP_MODE_DEFAULT);
if (!ptr) return NULL;
return (T*)ptr;
}
// Takes void* and tag, reconstructs the object in void*
E_Code kh_object_swizzle(void* obj, kh_tag* tag)
{
kh_typeid_t t = tag->GetType(); // CPPL type id of the object
kh_objsize_t size = tag->GetSize(); // size of the object
T* obj;
// ------------ recreate the object based on its type -------------
// switch over the object’s type
switch (t){
case TYPE_1: obj = (T*) new(obj) class1(__kh); break;
case TYPE_2: obj = (T*) new(obj) class2(__kh); break;
…
default: obj = NULL; break;
return E_OK;
}// end of kh_retrieve()
This implementation requires an addition of a special overloaded new operator, which instead of allocating new memory simply calls constructor on the memory that is passed over to it. The syntax would be as following:
void* operator new(size_t size, void* p) {return p;}
and is shared by all objects. To ensure that the constructor called by this operator does not modify any of the retrieved object’s data members, the CPPL processor adds an additional "empty" public constructor to each object, in the form:
class O::O(__KhClass __kh) {}
2.1.3.2. lock() and unlock() functions
Every class would be supoplemented by these two functions, which perform locking and unlocking. For Locking, programmer must provude an op_mode.
// Takes the op_mode – read, write, etc.
boolean lock(op_mode_t mode)
{
boolean result = FALSE;
object_id_t oid = unswizzle(this);
if (oid){
returt = obj_fetch(oid, getLockCtx(), mode);
}
return result;
}
// Takes the op_mode – read, write, etc.
boolean unlock()
{
boolean result = FALSE;
object_id_t oid = unswizzle(this);
if (oid){
returt = obj_release(oid, getLockCtx());
}
return result;
}
Correspondingly, we would need to provide GetLockCtx() function for each processed class, that returns the fLockCtx "lock context" object, also added to each class.
2.2. CPPL Processor Output
To summarize the previous sections, the following list of functions and variables that should be produced by the CPPL processor:
2.2.1. kh_cppl_fund.h file included by all objects
This file contains basic type declarations and is not dependant on the classes that it processes. Every DDL file should include it.
// This defines the tags
typedef u_short kh_typeid_t; // both objects are unsigned
typedef u_short kh_objsize_t; // short ints for now
struct kh_tag_t {
kh_typeid_t type;
kh_typeid_t size;
};
// This defines Operation Modes
typedef op_mode_t khazana_op_mode_t;
// Choices for now are: { OP_MODE_DEFAULT, OP_MODE_READ, OP_MODE_WRITE};
// Similarly, lock contexts defined in Khazana
typedef lock_ctx_t khazana_lock_ctx_t;
// Dummy class for the dummy constructors
class __KhClass {};
// Overload the global new to set up memory in swizzle_object()
void* operator new(size_t size, void* p) {return p;}
2.2.2. Things added to each class declaration
The following modifications will be made to each class processed by CPPL:
class O : public A, public B
{
D d; // some embedded
E e; // objects
…
public:
//--------------- additional member variables -----------------------------
kh_typeid_t __typeid = _CLASS_O_KHTYPE_;
lock_ctx_t __lockctx = lock_ctx_t::default_lock;
// --------------------- Get/Set functions ---------------------------------
kh_typeid_t GetKhTypeID() {return __typeid;}
// fils in the tag
void GetKhTag(kh_tag_t* tag){tag->type = GetKhTypeID(); tag->size = sizeof(O);}
// returns the lock context
void GetLockCtx(loc_ctx_t* pctx) { *pctx = __lockctx;}
// --------------------- Dummy non-modifying constructor -----------------------
// notice that we will need to call dummy constructor for each embedded class
// and for each parent class.
O(__KhClass kh) : A(kh), B(kh), d(kh), e(kh){}
// --------------------- overloaded new and delete operators ------------------------------
// Khazana new
void* operator new(size_t size, __KhClass __kh){
object_id_t oid = kh_malloc(size, GetTag());
if (!oid) return NULL;
void* obj = obj_fetch(oid, getLockCtx(), OP_MODE_DEFAULT);
return obj;
}
// Regular new
void* operator new(size_t size) {return ::new(size);}
// Khazana Delete
void operator delete (void* obj, __KhClass __kh) {
object_id_t oid = unswizzle(obj);
kh_free(oid);
}
// Regular Delete
void operator delete (void* ptr) {::delete (ptr);}
// ------------------------------ Lock and Unlock functions -------------------------------------
// Takes the op_mode – read, write, etc.
boolean Lock(op_mode_t mode)
{
boolean result = FALSE;
object_id_t oid = unswizzle(this);
if (oid){ returt = obj_fetch(oid, getLockCtx(), mode); }
return result;
}
// Takes the op_mode – read, write, etc.
boolean Unlock()
{
boolean result = FALSE;
object_id_t oid = unswizzle(this);
if (oid){ returt = obj_release(oid, getLockCtx());}
return result;
}
};// end of class O declaration
2.2.3. kh_cppl_mod.h processor file
This file will be generated by the CPPL processor AFTER all the DDL processing. It then automatically links it to all DDL files. The file contains the implementations of the kh_retrieve() and object_swizzle()
functions, as described in section 2.1.3.1.
2.3. Dynamic Type Checking
Khazana type_id information is stored with each class by the CPPL processor, and can be accessed through the GetKhTypeID() method, as described in sec. 2.1.3.2. This information, however, is only relevant to the CPPL itself, and should be transparent to the programmer.
Another important responsibility of CPPL is to ensure that the type of the object loaded from Khazana is compatible with the type of the object it is being loaded into. One possible solution to this problem could be implemented through the use of the C++ typeinfo library, by adding an extra check in the automatically-generated kh_retrieve() function:
#include <typeinfo>
template <class T> T* kh_retrieve(object_id_t oid)
{
…
if (typeid(temp) != typeid(T)) return NULL;
…
}
This would ensure the language-level type safety.
2.4. Cross-Platform Compatibility of Khazana Objects
One potential problem arising from distributing Khazana objects across various platforms is the difference in word sizes and representations.
To remedy the big-endian vs little-endian conflicts, we propose storing all objects using the big-endian word format. Most architectures would require no additional changes, since they follow the big-endian notation. In the cases of PDP-11 and VAX families of computers and Intel microprocessors, however, additional transformations would have to be added to the kh_store() and kh_retrieve() routines to ensure that the objects being written or retrieved conform to the big-endian standard.
The problem of word sizes, however, is a more severe one, since it requires changes in the size of the objects being stored. Since most modern architectures use 32-bit words, we intend to adopt it as our standard. As for the CRAY and DEC Alpha families of architectures, they will not be supported by CPPL in the short term.
3. Restrictions on Use of Some C++ Features
3.1. Static Members
CPPL does not support the use of static objects. Also, objects stored in Khazana can not contain any static data members. The programmer must ensure that none of the classes that this object’s type is inherited from, nor any of its embedded objects’ types contain any static data members.
3.2. Template Classes
CPPL does not support template classes, due to the difficulty of enumerating all possible final configurations of this class statically. One possible solution might require the porogrammer to enymerate all possible applications of a particular template class, so that CPPL can instantiate all the appropriate abstract instances of that class.
3.3. C++ Compilers
In order to ensure the correct reestablishment of the object’s data members and methods upon its loading, current design assumes that the same version of C++ compiler will be used universally on the same objects.