Contents |
Next |
|---|
Info on ARCHTEST: http://www.mpdiag.com/archtest.html. Info on Reasoning about Parallel Archtectures: http://www.mpdiag.com/rapa.html.
---------------------------------------------
ARCHTEST has been used by many companies in the development and testing of new shared memory systems.
In addition, considerable research has been based on ARCHTEST by Prof. G. C. Gopalakrishnan and his students at the University of Utah.
---------------------------------------------
(You can fish in a lake all day and not catch any fish, but that does not prove that there are no fish in the lake).
ARCHTEST can show only that a machine violates an architecture.
(However, the output from ARCHTEST can also suggest the frequency and severity of violations of particular architectures.)
---------------------------------------------
There are only nine unique tests known.
Therefore, we can test only for the architectures associated with those nine tests.
---------------------------------------------
There are some tests which involve only, say, rule R and CMP and UPO. If such a test fails, then it is clear the machine violates rule R.
Such a test is called a pure test for R.
For most rules there exists no pure test.
---------------------------------------------
Initially, A = B = U[i] = V[i] = 0, i = 0 to K.
P1 P2
L00: A = 0; L00: B = 0;
L01: - = A; L01: - = B;
L02: U[0] = B; L02: V[0] = A;
L10: A = 1; L10: B = 1;
L11: - = A; L11: - = B;
L12: U[1] = B; L12: V[1] = A;
L20: A = 2; L20: B = 2;
L21: - = A; L21: - = B;
L22: U[2] = B; L22: V[2] = A;
L30: A = 3; L30: B = 3;
L31: - = A; L31: - = B;
L32: U[3] = B; L32: V[3] = A; etc.
Seek 7.1. U[i] < j and V[j] < i. d1 = V[ U[i]+1 ] - i < 0.
Seek 7.2. V[i] < j and U[j] < i. d2 = U[ V[i]+1 ] - i < 0.
To show: Not A(CMP,UPO,RR,CC1).
To understand the ideas behind this test, we need to
understand executions, event, rules, and architectures.
---------------------------------------------
An execution is composed of one or more processes, each of which is composed of one or more statements.
A statement consists of one or more read events along with one write event for each process in the execution.
---------------------------------------------
Initially, A = B = U = V = X = Y = 0.
P1 P2 P3 P4
A = 1; U = A; X = B; B = 1;
V = B; Y = A;
Initially, A = B = U = X = 1, V = Y= 0.
The events for the first statement in P2 are:
(P2,L1,R,1,A,S2) (P2,L1,W,1,U,S1) (P2,L1,W,1,U,S2) (P2,L1,W,1,U,S3) (P2,L1,W,1,U,S4)The write events represent the times at which the new value of A becomes visible to the stores, S1, S2, S3, S4 for processes P1, P2, P3, P4, respectively.
---------------------------------------------
These behaviors called rules.
The computation rules are so obvious that they are usually taken for granted, but it is not possible to reason about the behavior of a machine without defining these rules explicitly.
The uniprocessor order rules describe the ordering rules that both uniprocessors and multiprocessors need to obey.
The order rules are simple. Program order (PO) requires that all events in a process occur in the order defined by the underlying program.
The cache coherence rules describe the patterns in time at which the output from a statement becomes visible to all processes in an execution.
Each rule induces a partial ordering on the events in an execution. (Actually it is more complicated than that.)
---------------------------------------------
A failure to obey one of the rules of computation might look like this:
Initially, A = 0.
P1
L0: A = 1;
Terminally, A = 13.
Operand A is initially zero. Then it is set to one. But at the
end of the execution it has the value 13. Clearly, the machine on
which the execution occurred did not compute in the way one normally
expects a machine to behave.---------------------------------------------
(P1,L0,R,0,B,S1) <srw (P1,L0,W,0,A,S1)
to show that in
P1
L0: A = B;
the read event(s) in a statement occur before the write events.---------------------------------------------
Initially, A = X = 0.
P1 P2
L0: A = 1; L0: X = A;
If X = 0 at the end of the execution, then it must be that
the read event in process P2 occurred before the write event in
process P1. Represent this by
(P2,L0,R,0,A,S2) <crw (P1,L0,W,1,A,S2)
to show that statement L0 in process P2 read a 0 from operand A
in S2 before statement L0 in process P1 wrote a 1 value into
operand A in S2.---------------------------------------------
Initially, A = X = 0.
P1 P2
L0: A = 1; L0: X = A;
Terminally, A = X = 1.
If X = 1 at the end of the execution, then it must be that
the write event in process P1 occurred before the read event in
process P2. Represent this by
(P1,L0,W,1,A,S2) <cwr (P2,L0,R,1,A,S2)
to show that statement L0 in process P1 wrote a 1 into operand A
in S2 (S2 is P2's view of storage) before statement L0 in process
P2 read a 1 value from operand A in S2.---------------------------------------------
Initially, A = 0.
P1 P2
L0: A = 1; L0: A = 2;
If A = 2 (in both S1 and S2, that is, in both P1's and P2's
view of storage) after this program has executed, then it must be
that the write events in process P1 (for both S1 and S2) occurred
before the write events in process P2. Represent this by
(P1,L0,W,1,A,S1) <cww (P2,L0,W,2,A,S1)
(P1,L0,W,1,A,S2) <cww (P2,L0,W,2,A,S2)
to show that statement L0 in process P1 wrote a 1 into operand A
in S1 and in S2 before statement L0 in process P2 wrote a 2 into
operand A in S1 and in S2, respectively.---------------------------------------------
X = 1; X = 1; Y = X;
Y = X; X = 2; X = 1;
Thus for this fragment
Pi
X = 1;
X = 2;
we would have
(Pi,-,W,1,X,Si) <upo (Pi,-,W,2,X,Si)
but UPO would not mean that (Pi,-,W,1,X,Si) would necessarily
occur before any of the other write events in the second
statement.The necessity of this behavior is so obvious that UPO is treated like CMP as being part of the rules that all machines, even multiprocessors, automatically obey.
There is one case which is not so clear cut. If two fetches from the same operand are made, will they always occur in order? This seems to me to be a reasonable assumption, but others may not agree.
Y = X;
Z = X;
I have chosen to give this rule and the associated relation special
names, URR and <urr, respectively.---------------------------------------------
If a machine obeys RR, and if a first read operation on shared data logically occurs in a program before a second read operation on shared data, then the first read operation is executed before the second read operation.
Write-Write Order (WW)
If a machine obeys WW, and if a first write operation on shared data logically occurs in a program before a second write operation on shared data, then the first write operation is executed before the second write operation.
Write-Read Order (WR)
If a machine obeys WR, and if a write operation on shared data logically occurs in a program before a read operation on shared data, then the write operation is executed before the read operation.
Many machines allow WR to be disobeyed.
Read-Write Order (RW)
If a machine obeys RW, and if a read operation on shared data logically occurs in a program before a write operation on shared data, then the read operation is executed before the write operation.
Program Order (PO)
A machine obeys PO if and only if it obeys RR, WW, RW, and WR.
---------------------------------------------
All processes see each operand change value at exactly the same instant. CC1 is also called write atomicity (CC1).
CC2
All processes see exactly the same sequence of changes of values for all operands. CC2 is also called write synchronization.
CC3
All processes see exactly the same sequence of changes of values for each separate operand.
CC4
Different processes may see an operand assume different sequences of values.
---------------------------------------------
Symbolically, if x <hb w1, then x <hb w2.
This is inconvenient to write when one is trying to establish the existence of a circuit. A more economical notation is to write x <hb w1 =cc1 w2 to show that x occurs before w1 which occurs at the same time as w2.
---------------------------------------------
If a machine obeys rules R1, R2, ..., Rn, then denote by A(R1,R2,...,Rn) the architecture for the machine.
--------------------------------------------
Suppose that every program which can calculate any given result on M1 can calculate the same result on M2, and vice versa.
Then M1 and M2 are indistinguishable.
This is written: M1 <=> M2.
Alternatively, A1 and A2 are indistinguishable.
Otherwise, M1 and M2 (A1 and A2) are distinguishable.
---------------------------------------------
Then A1 <=> A2. The architecture for Sun machines requires machines to be CC2. The documents cite this theorem as support for the machines being in fact CC1.
---------------------------------------------
Initially, A = B = U = V = 0.
P1 P2
L00: A = 1; L00: B = 1;
L01: - = A; L01: - = B;
L02: U = B; L02: V = A;
Terminally, A = B = 1, U = V = 0.
(P1,L02,R,0,B,S1) <crw (P2,L00,W,1,B,S1) (<crw by CC1)
=cc1 (P2,L00,W,1,B,S2)
<upo (P2,L01,R,1,B,S2)
<rr (P2,L02,R,0,A,S2)
<crw (P1,L00,W,1,A,S2)
=cc1 (P1,L00,W,1,A,S1)
<upo (P1,L01,R,1,A,S1)
<rr (P1,L02,R,0,B,S1)
---------------------------------------------
(P1,L32,R,4,B,S1) <crw (P2,L50,W,5,B,S1)
.
(P2,L50,W,5,B,S1) =cc1 (P2,L50,W,5,B,S2)
By write atomicity the write into B in S1 occurred at the
same instant as the write into B in S2.
(P2,L50,W,5,B,S2) <upo (P2,L51,R,5,B,S2)
By uniprocessor order the write into B in S2 in statement
L50 occurred before the read from B in S2 in statement L51.
(P2,L51,R,5,B,S2) <rr (P2,L52,R,2,A,S2)
By read order the read event in statement L51 occurred
before the read event in statement L52.The second half of the circuit repeats the reasoning for the first half.
The existence of the circuit shows that the machine failed to obey A(CMP,UPO,RR,CC1).
---------------------------------------------
Initially, A = B = U[i] = V[i] = 0, i = 0 to K.
P1 P2
L00: A = 0; L00: B = 0;
L01: - = A; L01: - = B;
L02: U[0] = B; L02: V[0] = A;
L10: A = 1; L10: B = 1;
L11: - = A; L11: - = B;
L12: U[1] = B; L12: V[1] = A;
L20: A = 2; L20: B = 2;
L21: - = A; L21: - = B;
L22: U[2] = B; L22: V[2] = A;
L30: A = 3; L30: B = 3;
L31: - = A; L31: - = B;
L32: U[3] = B; L32: V[3] = A; etc.
Seek 7.1. U[i] < j and V[j] < i. d1 = V[ U[i]+1 ] - i < 0.
Seek 7.2. V[i] < j and U[j] < i. d2 = U[ V[i]+1 ] - i < 0.
To show: Not A(CMP,UPO,RR,CC1).
If a circuit is found, then
d measures the length of the circuit.Test T7 fails on many machines. The reason is that fetches are made from the store buffer, and the store buffer is not subjected to the same MESI discipline as the cache. ---------------------------------------------
Eight machines relaxed the rule of WR, that is, they allowed reads to occur before logically preceding writes Call these machines POK machines.
Twelve machines relaxed both WR and CC1. This has been called processor consistency (Pcon).
More into at: http://www.mpdiag.com/results.html. ============================================
============================================
---------------------------------------------
Goal: See the results of subjecting both operating system and hardware to the same tests.
---------------------------------------------
Ans: Critical section routines.
(Are there others? Probably very, very few.)
Most critical section routines use atomic instructions but there are several in the literature which do not:
Question: can critsec routines run correctly on a PCon machine?
---------------------------------------------
It is easy to generate data that appears to come from a SC machine. ARCHTEST does this and correctly concludes that the machine is SC.
How to generate data that appears to come for a non-SC machine?
At present ARCHTEST generates valid data and then modifies it slightly at random points. Hand-checking shows that ARCHTEST correctly picks out the errors.
---------------------------------------------
This program would help ensure that ARCHTEST analyzes data correctly. It could also be of use to other researchers.
---------------------------------------------
One important lemma necessary for the proof had 224 subcases. I wrote a program in Pascal to generate and to check each subcase.
I will be rewriting this program in C and making it available with a copyleft statement in order for others to build on it.
---------------------------------------------
| Prev | Contents | Next |
|---|