h37846 s 00001/00001/00500 d D 1.13 97/08/07 17:40:52 carde 14 13 c removed link to cern webmaker -- link was broken c e s 00008/00008/00493 d D 1.12 97/07/01 11:08:53 wegis 13 12 c e s 00001/00001/00500 d D 1.11 97/06/20 10:48:02 wegis 12 11 c Update e s 00001/00001/00500 d D 1.10 97/06/10 17:22:48 wegis 11 10 c Update e s 00001/00001/00500 d D 1.9 97/06/10 17:07:22 wegis 10 9 c Update e s 00000/00000/00501 d D 1.8 97/06/10 16:55:54 wegis 9 8 c Update e s 00018/00007/00483 d D 1.7 97/06/06 13:34:12 wegis 8 7 c e s 00007/00018/00483 d D 1.6 97/06/06 13:32:33 wegis 7 6 c e s 00006/00006/00495 d D 1.5 97/06/04 15:10:37 wegis 6 5 c e s 00001/00001/00500 d D 1.4 96/06/21 14:58:32 rmg 5 4 c Replaced bogus footer comments with good ones. e s 00002/00002/00499 d D 1.3 96/06/21 12:12:14 rmg 4 3 c Replaced bogus footer comments with good ones. e s 00030/00003/00471 d D 1.2 96/04/15 09:39:42 hassan 3 1 c partial new look e s 00000/00000/00000 d R 1.2 95/12/19 10:28:12 Codemgr 2 1 c SunPro Code Manager data about conflicts, renames, etc... c Name history : 1 0 sfaq/verifier.html e s 00478/00000/00000 d D 1.1 95/12/19 10:28:11 mrm 1 0 c date and time created 95/12/19 10:28:11 by mrm e u U f e 0 t T I 1 I 13 E 13
E 3 I 3
D 13 E 13 E 3
This paper presents the details of the lowest-levels of the Java security mechanism. Before any downloaded code is executed, it is scanned and verified to ensure that it conforms to the specifications of the virtual machine.
The language was created for developing programs in a heterogenous network-wide environment. Since one of the initial goals of the language was to be used in embedded systems with a minimum amount of memory, the Java language is designed to be small and to use a small amount of hardware resources.
The Java compiler generates class files, which have an architecturally neutral, binary intermediate format. Embedded in the class file are bytecodes, which are implementations for each of the class's methods, written in the instruction set of a virtual machine. The class file format has no dependencies on byte-ordering, pointer size, or underlying operating system.
The bytecodes are executed by means of a runtime system, an emulator for the virtual machine's instruction set. The same bytecodes can be run on any platform.
The lowest levels of the Java interpreter implement security in several ways.
An additional problem with compile-time checking is version skew. A user may have successfully compiled a class, say PurchaseStockOptions to be a subclass of TradingClass. But the definition of TradingClass might have changed since the time the class was compiled: methods might have disappeared or changed arguments; variables might have changed types or changed from dynamic (per object) to static (per class). The visibility of a method or variable may have changed from public to private.
All class files brought in from "the outside" are subjected to a verifier. This verifier ensures that the class file has the correct format. The bytecodes are verified using a simple theorem prover which establishes a set of "structural constraints" on the bytecodes.
The bytecode verifier also enhances the performance of the interpreter. Runtime checks that would otherwise have to be performed for each interpreted instruction can be eliminated. Rather, the interpreter can assume that these checks have already been performed. Though each individual check may be inexpensive, several machine instructions for the execution of each bytecode instruction are eliminated.
For example, the interpreter already knows that the code will adhere to the following constraints:
The verifier is extremely conservative. It will refuse to certify some class files that a more sophisticated theorem prover might certify.
Other languages can be compiled into the class format. The bytecode verifier, by not being specifically tied to the Java language, allows users to import code from outside their firewall with confidence.
A class file contains:
For each field and method in the class, the bytes in the class file indicate the field or method's name and its type. The type of a field or method is indicated by a string called its signature. Fields may have an additional attribute giving the field's initial value. Methods may have an additional attribute giving the code for performing that method.
Methods may, in fact, have multiple code attributes. The attribute CODE indicates bytecode to be run through the interpreter. Methods might also have attributes such as SPARC-CODE or 386-CODE which are machine-code implementations of the method. The HotJava browser will ignore the machine-code implementation of any method from an untrustworthy source, since it cannot verify that machine code is structurally sound.
The current implementation of the HotJava browser believes that any class file that comes from the network is untrustworthy. It will only run machine code that has been loaded from local class files. However, the class format can allow authors to digitally sign class files. Future browsers may be more trusting of signed machine code coming from trusted sources.
Each method activation has a separate expression-evaluation stack and set of local registers. Each register and each stack location must be able to hold an integer, a single float, a handle, or a return address. Longs and double floats must fit into two consecutive stack locations or two consecutive registers. The virtual-machine instructions ("opcodes") will address longs and double floats in registers using the index of the lower-numbered register.
Objects on the stack and in registers are not (necessarily) tagged. The virtual-machine instruction set provides opcodes to operate on different primitive data types. For example, ineg, fneg, lneg, and dneg each negate the top item on the stack, but they assume that the top item on the stack is an integer, a single float, a long, or a double float, respectively.
The bytecode instructions can be divided into several categories:
This pass ensures that the class file has the format of a class file. The first several bytes must contain the right magic number. All recognized attributes need to be the proper length. The class file must not be truncated or have extra bytes at the end. The constant pool must not contain any unrecognized information.
For example, if a method contains a call to another method that returns an object of type foobarType, and that object is then immediately assigned to a field of the same type, the verifier doesn't bother to check if the type foobarType exists. However, if it is assigned to a field of the type anotherType, the definitions of both foobarType and anotherType must be loaded in to assure that foobarType is a subclass of anotherType.
The first time an instruction that references a class is executed, the verifier does the following:
After the verification has been performed, the instruction in the bytecode stream is replaced with an alternative form of the instruction. For example, the opcode new is replaced with new_quick. This alternative instruction indicates that the verification needed by this instruction has taken place, and need not be performed again. It is illegal for these _quick instructions to appear in Pass 3.
First, the bytes that make up the virtual instructions are broken up into a sequence of instructions, and the offset of the start of each instruction is kept in a bit table. The verifier then goes through the bytes a second time and parses the instructions. During this pass each instruction is converted into a structure. The arguments, if any, to each instruction are checked to make sure they are reasonable:
[Some extra information is kept about each instruction in a finally clause. This information is discussed further in the section Try / Finally].
Next, a data-flow analyzer is initialized. For the first instruction, the lower-numbered registers contain the types indicated by the method's type signature; the stack is empty. All other registers contain an illegal value. For all other instructions, indicate that this instruction has not yet been visited; there is yet no information on its stack or registers.
Finally, the data-flow analyzer is run. For each instruction, there is a "changed" bit indicating whether this instruction needs to be looked at. Initially, the "changed" bit is set only for the first instruction. The data-flow analyzer executes the following loop:
To merge two register states, compare each register. If the two types aren't identical, then unless both contain handles, indicate that the register contains an unknown (and unusable) value. For differing handle types, the merged state contains the common ancestor of the two types.
If the data-flow analyzer runs on the method without reporting any failures, then the method has been successfully verified by Pass 3 of the class file verifier.
Certain instructions and data types complicate the data-flow analyzer. We now examine each of these.
Whenever a long or double is moved into a register, the following register is marked as containing the second half of a long or double. This special value indicates that all references to the long or double must be through the lower numbered register.
Whenever any value is moved to a register, the preceding register is examined to see if it contains the first word of a long or a double. If so, that preceding register is changed to indicate that it now contains an unknown value. Since half of the long or double has been eradicated, the other half can no longer be used.
Dealing with 64-bit quantities on the stack is simpler. The verifier treats them as single units on the stack. For example, the verification code for the dadd opcode (add two double floats) checks that the top two items on the stack are both double floats. When calculating stack length, longs and double floats on the stack have length two.
Stack manipulation opcodes must treat doubles and longs as atomic units. For example, the verifier reports a failure if the top element of the stack is a double float and it encounters the opcodes pop or dup. The opcodes pop2 or dup2 must be used instead.
new myClass(i, j, k);are roughly the following:
new <myClass> # allocate uninitialized space dup # duplicate object on the stack <push arguments> invokenonvirtual myClass.<init> # initializeThis code leaves the newly created and initialized object on top of the stack.
The myClass initialization method sees the new uninitialized object as its this argument in register 0. It must either call an alternative myClass initialization method or call the initialization method of a superclass on the this object before it is allowed to do anything else with this.
In normal instance methods (what C++ calls virtual methods), the verifier indicates that register 0 initially contains an object of "the current class"; for constructor methods, register 0 instead contains a special type indicating an uninitialized object. After an appropriate initialization method is called (from the current class or the current superclass) on this object, all occurrences of this special type on the stack and in the registers are replaced by the current class type. The verifier prevents code from using the new object before it has been initialized and from initializing the object twice.
Similarly, a special type is created and pushed on the stack as the result of the opcode new. The special type indicates the instruction in which the object was created and the type of the uninitialized object created. When an initialization method is called on that object, all occurrences of the special type are replaced by the appropriate type.
The instruction number needs to be stored as part of the special type since there may be multiple instances of a non-yet-initialized type in existence on the stack at one type. For example, the code created for the following:
new InputStream(new Handle(),new InputStream("foo"))may have two uninitialized InputStream's active at once.
Code may not have an uninitialized object on the stack or in a register during a backwards branch, or in a register in code protected by an exception handler or a finally. Otherwise, a devious piece of code could fool the verifier into thinking it had initialized an object when it had, in fact, initialized an object created in a previous pass through the loop.
try { startFaucet(); waterLawn(); } finally { stopFaucet(); }The Java language guarantees that the faucet is turned off, even if an exception occurs while starting the faucet or watering the lawn. The code inside the brackets after the try is called the protected code. The code inside the brackets after the finally is the cleanup code. The cleanup code is guaranteed to be executed, even if the protected code does a "return" out of the function, or contains a break or continue to outside the try/finally, or gets an exception.
To implement this construct, the Java compiler uses the exception handling facilities, together with two special instructions, jsr (jump to subroutine) and ret (return from subroutine). The cleanup code is compiled as a subroutine. When it is called, the top object on the stack will be the return address; this return address is saved in a register. At the end of the cleanup code, it performs a ret to return to whatever code called the cleanup.
To implement try/finally, a special exception handler is set up around the protected code which catches all exceptions. This exception handler:
The cleanup code presents a special problem to the verifier. Usually, if a particular instruction can be reached via multiple paths and a particular register contains incompatible values through those multiple paths, then the register becomes unusable. However, a particular piece of cleanup code might be called from several different places:
Verifying code that contains finally's can be somewhat complicated. Fortunately, most code does not have finally's. The basic idea is the following:
However, before users will consent to bring over executable code from untrustworthy sources (i.e. most of the network!), they want assurances that the code cannot damage them. The byte-code verifier is the lowest-level of a many-tiered strategy [10].
E 6 I 6 D 7 1. The Java Language Overview. Available via /1.0alpha3/doc/overview/java/index.html
E 7
I 7
D 8
1. James Gosling and Henry McGilton. The Java Language Overview: A White Paper. Sun Microsystems Technical Report, May 1995. Available via /whitePaper/javawhitepaper_1.html
E 8
I 8
D 13
1. The Java Language Overview.
E 13
I 13
1. The Java Language Overview.
E 13
E 8
E 7
E 6
D 6
2. James Gosling and Henry McGilton. The Java Language Overview: A White Paper. Sun Microsystems Technical Report, May 1995. Available via http://java.sun.com/whitePaper/javawhitepaper_1.html
E 6
I 6
D 7
2. James Gosling and Henry McGilton. The Java Language Overview: A White Paper. Sun Microsystems Technical Report, May 1995. Available via /whitePaper/javawhitepaper_1.html
E 7
I 7
D 8
2. Donald E Knuth. The Art of Computer Programming, volume 1: Fundamental Algorithms. Addison-Wesley, Reading, Massachusetts, 1969.
E 8
I 8
D 13
2. James Gosling and Henry McGilton. The Java Language Overview: A White Paper. Sun Microsystems Technical Report, May 1995. Available via http://java.sun.com/whitePaper/javawhitepaper_1.html
E 13
I 13
2. James Gosling and Henry McGilton. The Java Language Overview: A White Paper. Sun Microsystems Technical Report, May 1995.
E 13
E 8
E 7
E 6
D 7
3. Donald E Knuth. The Art of Computer Programming, volume 1: Fundamental Algorithms. Addison-Wesley, Reading, Massachusetts, 1969.
E 7
I 7
D 8
3. The Java Virtual Machine Specification. Available via VM Spec
E 8
I 8
3. Donald E Knuth. The Art of Computer Programming, volume 1: Fundamental Algorithms. Addison-Wesley, Reading, Massachusetts, 1969.
E 8
E 7
D 6
4. The HotJava Overview. Available via http://java.sun.com/1.0alpha3/doc/overview/hotjava/index.html
E 6
I 6
D 7
4. The HotJava Overview. Available via /1.0alpha3/doc/overview/hotjava/index.html
E 7
I 7
D 8
5. The Unicode Consortium. The Unicode Standard: Worldwide Character Encoding. Addison-Wesley, Reading, Massachusetts, 1992. Available via http://unicode.org/
E 8
I 8
D 13
4. The HotJava Overview.
E 13
I 13
4. The HotJava Overview.
E 13
E 8
E 7
E 6
D 6
5. The Java Virtual Machine Specification. Available via http://java.sun.com/1.0alpha3/doc/vmspec/vmspec_1.html
E 6
I 6
D 7
5. The Java Virtual Machine Specification. Available via VM Spec
E 7
I 7
D 8
6. Alfred V.Aho, Ravi Sethi,, and Jeffrey D Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, Massachusetts, 1988
E 8
I 8
D 13
5. The Java Virtual Machine Specification. Available via http://java.sun.com/1.0alpha3/doc/vmspec/vmspec_1.html
E 13
I 13
5. The Java Virtual Machine Specification. Available via http://java.sun.com/docs/books/vmspec/html/VMSpecTOC.doc.html
E 13
E 8
E 7
E 6
D 7
6. The Unicode Consortium. The Unicode Standard: Worldwide Character Encoding. Addison-Wesley, Reading, Massachusetts, 1992. Available via http://unicode.org/
E 7
I 7
D 8
7. Samuel P. Harbison. Modula-3. Prentice-Hall, Inc. 1992.
E 8
I 8
6. The Unicode Consortium. The Unicode Standard: Worldwide Character Encoding. Addison-Wesley, Reading, Massachusetts, 1992. Available via http://unicode.org/
E 8
E 7
D 7
7. Alfred V.Aho, Ravi Sethi,, and Jeffrey D Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, Massachusetts, 1988
E 7
I 7
D 8
8. Guy L. Steele Jr. Common Lisp: The Language, Second Edition. Digital Press, Bedford, Massachusetts, 1990. Available via http://www.cs.cmu.edu/Web/Groups/AI/html/cltl/cltl2.html.
E 8
I 8
7. Alfred V.Aho, Ravi Sethi,, and Jeffrey D Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, Massachusetts, 1988
E 8
E 7
I 8
8. Samuel P. Harbison. Modula-3. Prentice-Hall, Inc. 1992.
9. Guy L. Steele Jr. Common Lisp: The Language, Second Edition. Digital Press, Bedford, Massachusetts, 1990. Available via http://www.cs.cmu.edu/Web/Groups/AI/html/cltl/cltl2.html.
D 13
10. HotJava(tm): The Security Story. Available via http://java.sun.com/1.0alpha3/doc/security/security.html
E 13
I 13
10. HotJava(tm): The Security Story.
E 13
E 8
D 7
8. Samuel P. Harbison. Modula-3. Prentice-Hall, Inc. 1992.
9. Guy L. Steele Jr. Common Lisp: The Language, Second Edition. Digital Press, Bedford, Massachusetts, 1990. Available via http://www.cs.cmu.edu/Web/Groups/AI/html/cltl/cltl2.html.
D 6
10. HotJava(tm): The Security Story. Available via http://java.sun.com/1.0alpha3/doc/security/security.html
E 6
I 6
10. HotJava(tm): The Security Story. Available via /1.0alpha3/doc/security/security.html
I 8
D 13
Generated with CERN WebMaker
E 13
I 13
D 14
Generated with CERN WebMaker
E 14
I 14
Generated with CERN WebMaker
E 14
E 13
E 8
D 7
D 6
Generated with CERN WebMaker
E 6
I 6
Generated with CERN WebMaker
E 6
I 3
E 7
D 4
E 4
I 4
E 4
Copyright © 1996 Sun Microsystems, Inc., 2550 Garcia Ave., Mtn. View, CA 94043-1100 USA. All rights reserved. Contact the Java developer community via the newsgroup comp.lang.java Send questions or comments about this web site to Copyright © 1996 Sun Microsystems, Inc., 2550 Garcia Ave., Mtn. View, CA 94043-1100 USA. All rights reserved. Contact the Java developer community via the newsgroup comp.lang.java Send questions or comments about this web site to
D 5
About the Author
Sun Microsystems
D 13
Java Products Group
E 13
I 13
Java Products Group
E 13
fy@eng.sun.com
D 12
or JavaSoft technical support via email to java@java.sun.com.
webmaster@java.sun.com.
or JavaSoft technical support via email to java@java.sun.com.
webmaster@java.sun.com.
E 5
I 5
D 10
E 10
I 10
D 11
E 11
I 11
E 11
E 10
E 5