Primitive Types Considered Harmful
Sherman R. Alpert
IBM T.J. Watson Research Center
Yorktown Heights, NY 10598 USA
salpert@us.ibm.com
  
This article originally appeared in Java Report, November, 1998 (Volume 3, Number 11).
Thanks to SIGS Publications for granting permission to reproduce the article here.
The version that appears here is a slightly modified version.

1 Introduction

"100% Java" is the Java developers’ slogan, meaning "Don’t taint your Java software by incorporating non-Java components." Yet, Java itself is neither 100% pure object or 100% pure object-oriented; it is "tainted" with components of the procedural programming paradigm. The Java designers have borrowed many ideas from diverse sources—other object-oriented languages, especially C++ and Smalltalk, design patterns1, basic object-oriented design principles—picking and choosing good ideas from each. And they’ve done a truly fine job. However, at one particular juncture, I believe the language architects made the wrong choice, and that was the decision to incorporate non-object primitive types into the otherwise uniform object-oriented language model.

There are two types of types in Java. By primitive types, I mean elementary variable types such as int, boolean, float, long, short, and so on. These are sometimes referred to as "built-in" types. The other type of type is objects: a variable can be declared to be of type Object or String or any other class. Referring to the two types of types as metatypes, we have primitive and object metatypes (the truth is, the two metatypes are primitive and reference which includes not only objects, but interfaces and arrays—more on why they are "reference" types later). By incorporating both metatypes, Java is not populated purely with objects.

The result, as would be expected, is that Java is also not 100% object-oriented. Programmers operate on objects by sending them messages and accessing their subsumed fields using dot notation. We operate on primitive variables with operators built into the Java language, such as +, *, and ++. (There are exceptions: in some (very few) cases, we can also act on objects using language-level operators, such as + (concatenation) for String instances; this actually only adds to the potential confusion because the language is thus all the more inconsistent.)

So, what’s the problem? By building a world in which objects co-exist with primitive types, and procedural programming constructs co-exist with object-oriented mechanisms, Java offers a single world with a dichotomy of semantics. It mixes procedural semantics into an otherwise uniform object-oriented model. Primitives are not first-class objects, yet they exist in a language that involves, primarily, first-class objects. Researchers have conjectured that mixing programming paradigms in a single language can be problematic, and a uniform model should be more comprehensible and usable2. In this article, I will offer concrete examples of why mixing paradigms is problematic in Java. My primary goal is to illustrate that incorporating primitive types into Java causes problems for programmers, and hopefully motivate the Java designers to rectify the problem. Secondarily, I will offer a solution to the problem as grist for the Java designers’ mill.

1.1 Why primitive types?

It was certainly not illogical for the Java designers to include primitive types. There are several possible reasons they may have done so. The first potential reason is consistency—many of Java’s ideas originate in C++ and in C++, objects and primitive type also co-exist. When object-oriented constructs were grafted onto the C language, C’s data types remained as is. The motivation behind this is that Bjarne Stroustrup, C++’s designer, wanted to be as consistent as possible with the base C language, hoping to capitalize on programmers’ pre-C++ knowledge. "Unfortunately, user-defined and built-in types cannot be made to behave in a completely uniform manner. The reason is that the irregulariy of the built-in types cannot be removed without causing serious incompatibilities with C" (emphasis mine).3, p. 380 For Java, this is a debatable issue. There is no base language with which Java ought to be consistent. On the other hand, Java programmers do have experience programming in non-object-oriented languages and are thus accustomed to using primitive data types.

Another point arguing against defining elementary types as objects (that is, as instances of some class) is, Do we really want multiple instances of elementary types with the same value? For example, do we really want multiple integers whose value is 1? If primitive types such as integers are objects, then we might be able to have two different integer objects whose value is identical.

But, really, the most important reasons for handling primitive types differently than objects are performance and verification. Primitive types such as integers, booleans, floats are part of the "hard currency," the constantly used bread-and-butter of an imperative language like Java. Simply, then, we want operations involving variables of these types—operations that are at the heart of our computing model—to run fast.

The Java Virtual Machine (JVM) maintains an execution stack for each thread. This Java stack is populated with stack frames: each time a method is invoked, a new stack frame is pushed onto the thread's stack. The stack frame represents the state of the method, including a local variable table (with entries for this (the object in which the method is running), the method's parameters, and variables declared within the method), a dynamic operand stack, and various data involved in invoking and returning from the method (such as a pointer to the invoker's stack frame). Variables declared in a Java program are represented inside the JVM by entries in the local variable table and operand stack. As mandated by the JVM specification, entries for primitive variables in the variable table and operand stack contain the actual values of those variables.

Objects are represented on the stack quite differently. An entry for an object contains not the object itself, but a pointer into memory (into the heap) where the object resides (see Figure 1). Hence, accessing an object referenced in an operand stack entry requires at least an access of that stack entry to retrieve the memory pointer and another access of the object in the heap itself. (Actually, the JVM specification does not mandate any particular heap access mechanism; hence, accessing an object may require even more memory lookups, depending on the JVM’s implementation4, 5) The upshot of all this is that accessing the values of primitive variables is faster that accessing data in objects since primitives require fewer memory fetches.

Note that while the JVM specification dictates the numerical magnitudes for all primitive types (e.g., an int must contain 32 bits of data, a short must have 16), it does not specify the actual size of a stack word beyond the criterion that it must be at least 32 bits. Often, the size of a word on the virtual machine's stack is the native pointer size of the platform on which the JVM is implemented.4, 5 For example, in a typical Windows95 JVM, single precision primitives reside in one 32 bit entry, and doubles and longs, which must be 64 bits wide, occupy two contiguous slots (see Figure 1). On platforms with a 64 bit native word size, a JVM could place all primitive variables (including doubles and longs) in a single stack word, which would leave some bits unused for all primitive variables (since, as per the JVM specification, doubles and longs must occupy two stack words regardless of the word size).5 For the rest of our discussion, I'll assume a 32 bit word size.

 

The other efficiency issue is speed of execution of operations. Sending a message to an object requires the JVM to perform multiple memory accesses to retrieve the object from the heap, several more to retrieve the method (based on pointers from the object to its class and possibly one or more superclass-chain pointers), and then executing the method using a generic method-invocation mechanism. On the other hand, the value of a primitive variable is right in the stack plus the JVM has specific opcodes for specific primitive operations, such as adding two ints. These primitive-specific opcodes are executed directly by the virtual machine; they do not require accessing and executing a method defined in a class.

Type-specific opcodes also facilitate one aspect of security, specifically a portion of code verification. Bytecode verification includes making sure opcodes are type safe and consistent. So, for example, if opcodes in an applet tell the JVM to "load int from local variable 1 onto the operand stack" immediately followed by "store double into local variable 1," the bytecode verifier would fail. Since doubles occupy two words, storing a double implies storing the top two operand stack words into the two variable table entries starting at entry 1. This would result in overwriting whatever variable is currently in slot 2 of the variable table. This could be used to store pointers into unsecure portions of memory. So, the types encoded into Java opcodes play an important role in bytecode verification.

Clearly, then, we do not wish to completely conceptually eliminate primitive types from the Java Virtual Machine. But, we may want to do eliminate them from Java source programs. Let's see why.

2 Concrete examples of the problem

Performance and security notwithstanding, including primitive types in Java is problematic. Let’s look at some concrete examples of why this is so. We’ll consider these in, more or less, increasing problem magnitude.

2.1 Dichotomy of basic semantics

In Java, not only do we have different entities—objects and primitives—populating the same world, but features of the language itself have different meaning depending on the type of entity they are being applied to. At first blush, it may appear that this gulf of semantics—two possible meanings for a single language feature—could cause confusion and programming errors. Perhaps surprisingly, I'm going to begin by arguing that some of these basic differences are, in fact, not problematic per se, and thus are not part of the "harmful" aspects of mixing primitive and object metatypes. However, I will point out problems associated with these semantic disparities.

The most basic difference is how we operate on the two different Java data entities. As I noted earlier, we operate on objects principally with dot notation mechanisms, resulting in sending messages to the objects or accessing their public member fields. Contrarily, we work on primitive variables with operators that are part of the Java language itself (+, *, etc.). The latter is probably a good thing: it capitalizes on programmers' previous experience with other languages and it is probably more natural. In a pure object-oriented language like Smalltalk, even mainstream operators, such as +, are really messages sent to objects: anInteger + someOtherInteger means send the + message to anInteger with someOtherInteger as the argument. Many novice Smalltalk programmers have problems accepting and appreciating this concept. Nonetheless, the real problem derives from the inability to send messages to primitive typed variables. We'll see later that this prevents our use of polymorphism and basic object-oriented design strategies.

I'm going to provide a couple of other examples of the dichotomy of semantics caused by having two basic types of program entities. These boil down to the same issues. First, value versus reference semantics: as we've already seen, variables that are declared to be a primitive type actually contain their values whereas variables declared to be instances of some class contain a reference (pointer) to the actual object. Second, all primitives are immutable whereas most objects are not. You can always change what a variable references by reassigning the variable. But, you can also change mutable objects in place by sending messages or accessing public fields. That is, you can modify an object referenced by a variable without changing the variable (which object it points to) at all . You cannot modify immutable entities, including all primitives.

2.1.1 == != ==

The built-in language-level operator == means two different things, depending on whether its operands are objects or primitive types. In one case, == tests for identity, in the other, for simple equality. If code compares two variables whose declared type is a class, that means "Do the two variables point to the same object?" Comparing two primitive variables means "Are the two variables logically equal, that is, do they contain the same value?" To be concrete, in the following code snippet we're testing whether a and b reference the identical object:

Object a, b;
...
if (a == b) ...
And here we're testing whether x and y have the identical value: int x, y;
...
if (x == y) ...
The truth is, there really is no difference in how == works -- under the covers, it operates the same for both primitives and objects. The expression a == b really means "Do the variables a and b (that is, the stack entries representing variables a and b) contain precisely the same bits?" It just so happens that in the case of primitive variables those bits represent the variables' values (do the variables contain the same value?), whereas for objects those bits represent pointers into the heap (do the two variables contain the same object reference, i.e., do they point to the identical object?).

The rest of the truth is that this really isn't a problem. The semantics of the == operator are actually, "Do two variables refer to the same thing?" In a very real sense, asking whether 3 == 3 is asking whether the thing on both sides of the == is the same "object." After all, the Java world really possesses a single 3 "object." So, logically, == means the same thing for primitives as for objects.

We, in fact, do want the ability to check whether two variables reference the identical object—this is useful in a number of situations (for example, for comparing two instances which have identical fields but are different: say, two Employees with the same name, or when using similar objects as Dictionary keys). So, the semantics of == for objects are just fine as is. The real problem, or at least confusion, with regard to the primitive vs. object dichotomy arises in the primitive wrapper classes. Java includes a wrapper class for each primitive type: the Integer class corresponds to ints, Boolean is used to wrap booleans, Float instances are used for floats, etc. The problem proceeds from the fact that multiple instances of these basic wrapper classes may have the identical value but not be ==. For example, the following expressions are false!

new Boolean(true) == new Boolean(true) //false
new Integer(2) == new Integer(2) //false
On the other hand, when the == operator compares two primitive types, the result is based on the logical equality of the operands: int int1 = 2, int2 = 2;
boolean b1 = true, b2 = true;
int1 == int2 //true
b1 == b2 //true
The two Integer objects above are not the same object, but they should be. There ought to be only one instance of Integer with a value of 2, just like there is "one instance" of int with a value of 2. The same should hold for other primitive wrapper classes. In the first example, we're not interested in whether the two Integers are the same object, but rather whether they're logically equivalent. To determine this, we can't use an "equals" test; we have to use an equals() test. That is, we can’t use the == operator; we have to remember, instead, to send a message to one instance asking that it compare itself to the other: new Integer(2).equals(new Integer(2)) //true
new Boolean(true).equals(new Boolean(true)) //true
2.1.2 Passing arguments by reference vs. by value
"When actual arguments are passed to a Java method, they are passed either by value or by reference—primitives are passed by value, objects by reference." This is the story often told to new Java programmers—in fact, you'll find many Java books that say just this. But it's a lie. The truth is that all variables are passed by value—an invoked method cannot change the contents of the actual parameters. It just happens that some variables possess reference semantics. Primitive variables contain a value, object variables contain an object reference, both are passed by value. The method may reassign its local copy of this variable but this will not change the actual parameter in the caller’s scope: void doesNotChange (int anInt) {
    // Primitive type version of doesNotChange()
    anInt = 5;
}
//calling doesNotChange():
int i = 1;
doesNotChange(i);
// At this point i has not changed. Its value is still 1.
 

void doesNotChange (String aString) {
    // Object version of doesNotChange()
    aString = "xyz";
}
//calling doesNotChange():
String s = "abc";
doesNotChange(s);
// At this point s has not changed. It is still "abc".

However, passing an object reference gives the invoked method a pointer to the actual object. Even though the method cannot reassign this variable to a different object in the caller's scope, it can send messages to the object or directly modify public fields inside that object. So, here's the difference: while primitives passed as parameters to a method cannot be changed in any way, objects passed in the same fashion can be "permanently" modified in situ. Here's an example (assume a class named Zot with a public field named field1): void doesChange (Zot aZotObject) {
    aZotObject.changeYourself();
    aZotObject.field1 = 2;
}
//calling doesChange:
Zot z = new Zot();
z.field1 = 1;
doesChange(z);
// At this point z has changed. In addition to any side effects
// in changeYourself(), the value of z's "field1" field is now 2.
So, there is no fundamental difference between primitives and objects with respect to being passed as arguments; the real difference is, once again, we can send messages to objects or access their internal fields and these actions may result in changing those objects. Hence, the real problem is simply that programmers must be cognizant of the potentially different results depending on whether method arguments are primitives or objects. Anything that a programmer has to remember is a potential source for errors.

The semantics of the = operator are quite similar to what we've just discussed. Two variables may reference the same object, in which case changing one implies changing the other (by definition). But, reassigning one does not affect the other—just as with primitives.

2.2 Round pegs in square holes

Let's move on to the weightier problems caused by mixing primitives and objects. The first is that we cannot use primitives where objects are expected.

Since Java is strongly typed, an object’s member fields are declared to be of specific types; when the type of a field is defined as Object or some subclass thereof, a primitive type may not be assigned to or used in that field. This is particularly problematic in the case of container classes (by that I do not mean the AWT Container components, but rather classes whose instances can contain collections or lists of other objects). Let’s get concrete. Vector is a commonly used container class. A Vector provides the functionality of an ordered collection of elements, with the ability to add, insert, remove, and find elements at any position in the list. Hence, it has many desirable features and capabilities beyond a simple array. As a result, we may want to use a Vector’s functionality for flexibly storing and accessing objects or primitives. The problem, of course, is that the Vector class is defined to contain elements of type Object. We cannot have a Vector with elements of type int, nor any other primitive type. The real problem is primitives and objects are fundamentally incompatible. Primitive types cannot be used where objects are expected—yet they coexist in the same environment.

As a result, if we desire a Vector’s capabilities, we are forced to do a bunch of extra object creation and casting. The objects involved are instances of the primitive wrapper classes defined by Java's developers precisely because of this incompatibility problem. To store an int in a Vector, we must first create a new instance of Integer to "wrapper" the int. We start by declaring the Vector and int variables:

Vector myNumbers = new Vector(100);
int i;
What we would like to do after assigning a value to i is simply: myNumbers.addElement(i); But this causes a compile error—there is no method addElement() with an argument of type int. And with good reason: a Vector may only contain Objects, not primitives. Vector's only addElement() method expects an Object parameter. So, instead of the above, we create and store an Integer based on i's value: myNumbers.addElement(new Integer(i)); In later code, when we want to access the integers contained in the Vector, we must undo the type conversion we have just performed (note that we also first cast the element to class Integer because as far as the Vector is concerned, it contains elements whose type is Object): // Retrieve the Integer object and then convert it to int:
Integer temp = (Integer)myNumbers.elementAt(index);
i = temp.intValue();
What we would rather have done is something like: i = (int)myNumbers.elementAt(index); Of course, this causes a compile error as well because of the type mismatch: we cannot cast (with the explicit cast) or convert (without the cast) from type Object to int.

This may be a relatively minor problem, but nonetheless we’ve incurred two types of unnecessary overhead. First, cognitive overhead for the programmer who must remember that primitives cannot be used where objects are expected and to add wrapping and unwrapping code as necessary. Second, we have the runtime overhead of casting primitive types to and from wrapper objects.

2.3 Good object-oriented design abrogated

The existence of primitive types also necessitates special case code, the effect of which is the undoing of polymorphism. Contrary to principles of good object-oriented design, we often must test for and write extra code specially for primitive operands. We cannot simply write code that sends a single polymorphic message that is understood (though perhaps implemented differently) by many different types of objects. Let's look at an example.

The String class has an overloaded static method named valueOf()that produces a String representation of its argument. For object arguments, the object’s toString() method is invoked:

public static String valueOf(Object obj) {
    return (obj == null) ? "null" : obj.toString();
}
So the decision as to "How should a particular object appear when printed" is up to the object being printed (it's in the toString()method defined in (or inherited by) each object's class). So, this design adheres to good object-oriented design because it places responsibilities where they ought to be: the client merely asks the object for its printable representation and the object has the responsibility of providing its own printable form. This design also facilitates changing the way objects of a particular class print—we merely override the default version of toString() inherited from the class's superclass chain. This is nice because in fact an object's toString() method can be, and is, called from numerous code locations Thus, to change the way objects of a particular class print, we can change code in a single location, that class's toString().

[ If we wish to be stricter still in our adherence to object-oriented design principles, we could go a step further. Note that when the obj argument is null, the printable appearance is up to the valueOf method’s code acting upon a passive datum. In Smalltalk, nil (the equivalent to Java’s null) is an object, the Singleton instance of the UndefinedObject class. Hence, it decides for itself how to appear when printed. If this were true in Java, valueOf would look like this:

public static String valueOf(Object obj) {
    return obj.toString();
}
]

The first method shown above should be the entire implementation of valueOf(). We should only require a single valueOf() method in the String class and it should simply send the toString()message to its argument, placing the responsibility for how an object should print with that object. However, because we have primitives to worry about—and we can't send messages (such as toString()) to them—String is forced to overload valueOf()to handle primitive type arguments. Hence, String implements method like the following, which adopts a decidedly procedural approach: the client code decides how some type should be printed, rather that the type itself.

public static String valueOf(boolean b) {
    return b ? "true" : "false"
}
Of course, String also implements a valueOf() for an int argument, another for a float argument, and so on.

If I'm an object, the behavior for deciding what I should look like when printed is in me (the object) as it should be in an object-oriented language. For primitives, the model is procedural: the decision is made by code separate from the data, acting on those data. Since they do not decide for themselves (with their own method), we cannot change this behavior in a single location. If we always want booleans to print as "T" or "F" rather than "true" or "false," we have to find every place that code is implemented and modify them all.

But, more importantly, the real problem here is that we cannot write programs using sound object-oriented design practices such as exploiting polymorphism because we cannot send messages to primitive variables.

2.4 The mirror is crack'd

Long after the decision to include two metatypes in the Java language, Java's developers realized the need for reflective capabilities. Reflection is the ability of a program to find out things about itself and its components at runtime. Java now includes classes such as Class, Field, and Method which allow a program to determine the class of an object, the names and types of fields within an object, the names of messages understood by an object, to dynamically retrieve and invoke a specific method in an object, and so forth.

Let's consider an example. Every object is an instance of a class and we can ask an object for its Class object. The problem is that a primitive type is not an instance of any class. So what does it mean, for example, to ask a primitive variable for its class?

Equation eq = new Equation("x = y + 4");
Class c = eq.getClass();
As we'd expect, this getClass() call returns a Class object representing the Equation class. But, how about primitives? int i = 1;
i.getClass() // compile error:
// "The method getClass invoked for type int with arguments ()
// is not defined."
But we can retrieve a Class object—sort of—for an int field, as follows. If we define a class with a field named i of type int, we can ask the Class for the Field named i. What do we get? The answer is, we get some sort of hybrid class/primitive type thing: class Test {
    public int i;
}

Test test = new Test();
Class c = test.getClass(); //retrieve the Test Class object
Field fieldI = c.getField("i"); //retrieve the Field named i
// Now ask for field i's class:
Class classForI = fieldI.getType();

We now have a Class object for int! We can ask it its name, and we find that we have a Class named "int." classForI.getName() // returns "int" Realizing part of the problem, the developers of Class included a message to ask if the Class is a primitive: classForI.isPrimitive() // returns true But, how can a Class be a primitive? Let's confuse things a bit more. We can access the Class object representing a defined class by coding, say: Equation.class // returns the Equation Class object But, we can also say: int.class and obtain a Class object whose name is "int". Note we have not sent the getClass() method to an object; we have used the reserved word for a built-in primitive type (int) and, using dot notation, accessed its class "field." And this returns a Class object!

You can also ask for a Class instance by name, as follows:

Class.forName("java.lang.String")
// Returns the Class object:
// (java.lang.Class) class java.lang.String
But, as you might expect, doing the same sort of thing fails for primitive type names: Class.forName("int")
// throws ClassNotFoundException
We can retrieve a Class instance whose name is "int" using yet another approach. Here we do so by asking the corresponding wrapper class for its type (and we can do the same with all the primitive wrapper classes): Integer.TYPE;
// Returns the Class object:
// (java.lang.Class) int
There are times when using the reflection API, "magic" happens with regard to primitive types and wrapper classes. Sometimes we expect a primitive, but get instead an instance of one of the wrapper classes. Conversely, at times we must remember to wrap primitives before passing them as arguments to reflection methods. Let's look at some examples. Assume first the following class definition: public class Tester {
    public boolean test (int i) {
        return i > 3;
    }
}
We can retrieve a Method object (in this case the one representing test()) as follows: Tester t = new Tester()          //line 1
Class[] argTypes = new Class[1]; //line 2
argTypes[0] = Integer.TYPE;      //line 3
Method m = t.getClass().getMethod("test", argTypes); //line 4
To retrieve the Method, we send getMethod() to the Class (line 4). The second parameter to getMethod() is an array containing the types of the formal parameters of the method we're trying to retrieve. Even though the argument to test() is of type int, getMethod() requires an array of Classes for the argument types. We declare and create the array in line 2, and then have to ask the appropriate wrapper class for its TYPE (as above, this retrieves a Class) (line 3).

Now that we have a Method object, we can invoke it. Using Java's reflection API, we invoke a method using invoke(). test() has one formal argument whose type is int, but, invoke() expects an array of Objects containing the actual parameters. Again we can't use the expected primitive per se; we must instead use an instance of the appropriate wrapper class:

Object[] actualArguments = new Object[1];
actualArguments[0] = new Integer(3);
m.invoke(t, actualArguments); // equivalent to "t.test(3)"
Lastly, test() returns a primitive, a boolean. But, invoke() returns an Object! invoke() automatically wraps primitive return types rather than returning a primitive per se. Hence, the result of m.invoke(...) is an instance of the Boolean wrapper class.

I could provide many more examples, but I think you can see the real problem: the inclusion of primitive types forces Java's reflection mechanisms to incorporate what amount to kludges to handle these non-object types.

3.0 Fixing the problem: A proposal

As I noted early on, my main goal in writing this article is to assert and support the notion that Java's inclusion of primitive types causes problems for programmers and designers. Hopefully, I have provided the Java language designers with the motivation to correct a problem and hopefully they will devise a clever solution. Having said all this, however, I feel obligated to offer at least one potential solution.

Before discussing a fix, let's review the core problems:

1. We should be able to use primitive variables wherever objects can be used; we should be able, for example, to have a Vector or a Stack of ints. And we should be able to do so without giving it a second thought, without having to convert to primitive-wrapper objects into and out of the Vector or Stack.

2. We should be able to write code that sends a single polymorphic message regardless of the type of the message receiver. Otherwise, polymorphism—and thus a big piece of good object-oriented design—is defeated and we need to code special cases for primitives.
2a. Hence, we should be able to send messages to primitive typed variables.

3. Java's reflection code has incorporated an assortment of kludges to be able to handle primitives. Reflection should be uniform and consistent regardless of metatype.

My basic proposal to remediate these problems is derived from the approach used in many modern Smalltalk environments. First, the primitive declaration types int, boolean, float, etc. should be deprecated. All such variables should henceforth be declared the type of the currently associated primitive wrapper class. Thus, integer variables should be declared of type Integer rather than int, booleans should be type Boolean rather than boolean, variables containing a single-precision floating point should be Float instead of float, and so forth.

This immediately solves problems 1 through 3! However, a few problems immediately jump right out at us. First, what about literals? Currently, this is legal Java code:

            int i = 5;
            i = 1;

Based on my proposal, the following would have to be legal instead:

Integer i = 5;
i = 1;
But this is no big deal at all. Java already creates instances based on literals. In the following statement, the literal "abc" results in a String object (a new String instance is created if this is the first "abc" literal encountered in the compilation; otherwise s is pointed to an existing String that contains the characters "abc"—but a String object nonetheless): String s = "abc"; Java also provides language-level support for array creation based on literals, though arrays are reference variables, with a reference pointer in the stack and the actual data in the heap.

What about performance and verification? If all primitive variables are now treated like objects, what about the performance advantages of having primitive values reside directly in the stack and having specific JVM opcodes for them, as well as the bytecode verifier's usage of these type-specific bytecoodes? The answer is, treat primitive as if they were objects when required, and treat them as if they're primitives when possible. All variables of the "primitive" type (note quotes; this means all variables declared as one of the primitive wrapper classes, Integer, Boolean, Float, etc.) should continue to contain the value of variable, not a reference—that is, exactly as done now, variables declared a "primitive" type will have their values reside directly in the stack. With regard to opcodes, the compiler knows when an operation involves "primitive" variables. It can thus generate opcodes accordingly, that is, the identical code it generates now for primitives. So, hard-currency operations like + will generate the same efficient bytecodes as they do now. Nonetheless, if we send a message to a "primitive" variable, the corresponding method in the declared class (Integer, Boolean, Float, etc.) would be invoked.

Again, this is not a far leap for Java. Arrays are already handled as the primitives are with respect to opcodes. The Java compiler generates array-specific bytecodes that the JVM can execute more efficiently than generic method invocation. Yet, we can send messages to arrays—arrays inherit and understand all messages defined in class Object. They also have a public member field named length accessible using dot notation—e.g., myArray.length. And arrays can be used anywhere an object can—for example, as elements in a Vector.

What about multiple different instances of the same primitive entity? As mentioned above, we don't want two different integer objects with the value 2. These entites are internal, hard-wired—we don't want multiple "copies" of them. This is solved by handling "primitive" objects as in Smalltalk. Smalltalk behaves as if it implements the Flyweight design pattern for "primitive" classes. In the Flyweight pattern1, clients of a class ask the class to create new instances with a particular value, but hidden within its implementation the class determines if an instance with that same value already exists; if so, that pre-existing instance is returned rather than creating a new one. This is discussed in greater detail in Alpert, Brown, and Woolf 6, but briefly, Smalltalk has only one integer 3 (for example); in Smalltalk, the following is the case (Smalltalk is weakly typed, and hence there are no declared types):

i1 = 3.
i2 = 3.
i1 == i2 "true"
But, as discussed earlier, in the current implementation of Java, two Integer objects with the same internal int value are not ==. All the primitive wrapper classes behave similarly. Under my proposal, the following would be the case instead: Integer i1 = 3;
Integer i2 = 3;
i1 == i2 // true
Once again, this is not a big leap for Java. Java currently behaves similarly with respect to String literals. If the same String literal is encountered twice in a single compilation unit (typically, a single class), the compiler generates code to have them both reference the identical String instance, rather than creating a new instance for each. That is, the generated code results in the same behavior as the Flyweight pattern. So, for example: String s1 = "abc";
String s2 = "abc";
s1 == s2 // true
Instances of the "primitive" classes should also be immutable. We don't want to be able to tell 3 to become 4 (of course, we can assign 4 to an Integer variable whose value is currently 3—we just cannot mutate the 3 object per se). We also want to be able to put an Integer in, say, a Hashtable, knowing its value (and thus its hash value) won't change.

So far, what I've proposed is easily implemented. There's one significant outstanding problem, however. Anything we can put in containers such as Vectors will eventually be removed from those containers. And when they are, we need to know each such object's exact class so we can send messages to them and invoke the appropriate method. The proposed "primitives" are instances of classes. But, based on this proposal, "primitive" variables are not represented inside the JVM (in the stack) as other objects. Non-"primitive" object variables are represented on the stack as references that point to the actual objects which maintain class information about their class. But we want "primitive" variables to be represented as primitive types are currently, with only their values in the stack. This works fine for "standalone" instances—if you declare a variable type Integer, the compiler knows to invoke methods in the Integer class for message-sends and to generate int opcodes for language-level operations (like +). But, when we place an Integer in a heterogeneous Vector containing objects of various classes, and then remove those objects, how do we know when we have an Integer? Consider this code:

Vector stuff = new Vector(100);
Integer i = 1;
String s = "abc";
stuff.addElement(i);
stuff.addElement(s)
...
for (Enumeration e = stuff.elements(); e.hasMoreElements(); )
    System.out.println(e.nextElement().toString());
We need to know which toString() to invoke for each element in the Vector. When we retrieve the stored Integer, how does the JVM know it's an instance of theInteger class?

One proposal is to emulate what some modern Smalltalk virtual machines do. As I mentioned earlier, many Smalltalk environments implement just what I've proposed so far. In Smalltalk, everything is an object and all "primitives" are instances of a class such as Integer and Boolean. But Smalltalk's "primitive" variables maintain their value, rather than an object reference, in the stack (these are known as immediate objects; see Liu7). To handle to class-identification problem, all variables on the stack contain tag bits (this is, in turn, based on earlier research by Ungar8 on improving Smalltalk's efficiency). Different configurations of tag bits represent different "primitive" classes, with one distinguished configuration meaning "this is an object reference."

For Java, we need eight different tag values for the eight primitive types, plus one for the "this is really a reference pointer" value. The latter could be tag value 0. So we need values 0 through 8: four bits. If we wish to retain the JVM's word size, this may mean a loss of precision for some "primitive" variables. Some, but not all. Boolean (boolean), Character (char), Byte (byte), and Short (short) variables contain 16 bits or less of actual data, but occupy a full word on the JVM's stack. Hence, even in a JVM with a 32 bit word size, using four bits for class information would not affect the values of such variables. For Integer (int), Long (long), Float (float), and Double (double), we would lose four bits of information on a 32 bit platform (of course, double word variables—Doubles and Longs—require tag bits in only one word). A potential alternative is to increase the width of the stack. Again, the JVM specification does not mandate a specific word size for the stack: "Implementation designers must ... choose a word size that is at least 32 bits, but otherwise they can pick whatever word size will yield the most efficient implementaion." 4, pp. 85-86 Further, JVMs that already have a larger word size may wind up forfeiting no data bits: in the 64 bit implementation mentioned earlier, there would already be more than four unused and available bits for all "primitive" variables on the stack.

One small problem remains. In the general case, objects can be tested to determine whether they are equal to null. However, since "primitives" have no reference pointer, they cannot be compared to null using the JVM's current mechanisms. Hence, an additional change to the JVM would be required, or we might have to disallow such comparisons. In fact, a potential alternative to non–heap-resident "primitive" objects with tag bits is a proposal by James Gosling. He has suggested the implementation of an immutable class modifier (as in immutable class Complex {...}; see "Efficient Classes" in Gosling9). Instances of classes so flagged are not allocated on the heap and cannot be compared to or assigned null (see Gosling9 for other characteristics of immutable classes). Hence, "primitive" classes might be defined as immutable as per Gosling's proposal.

I've offered one solution that attempts to enable good object-oriented design while retaining the advantages of efficiency and verification. But, of course, every design decision involves trade-offs. We gain by eliminating the disadvantages of primitive types enumerated in this article; we lose four bits of information for some "primitives." Nonetheless, the more important point is that primitive types, as is, bear multiple disadvantages for object-oriented—and, specifically, Java—programmers.

References
1. Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1995). Design Patterns: Elements of Reusable Object-Oriented Software. Reading, MA: Addison-Wesley.
2. Rosson, M. B. & Alpert, S. R. (1990). The cognitive consequences of object-oriented design, Human-Computer Interaction, (5) 4, 345-379.
3. Stroustrup, B. (1995). The Design and Evolution of C++. Reading, MA: Addison-Wesley.
4. Venners, B. (1998). Inside the Java Virtual Machine. New York: McGraw-Hill.
5. Lindholm, T. & Yellin, F. (1996). The Java Virtual Machine Specification. Reading, MA: Addison-Wesley. Also: http://java.sun.com/docs/books/vmspec/html/VMSpecTOC.doc.html
6. Alpert, S. R., Brown, K., & Woolf, B. (1998). The Design Patterns Smalltalk Companion. Reading, MA: Addison-Wesley.
7. Liu, C. (1996). Smalltalk, Objects, and Design. Greenwich, CT: Manning.
8. Ungar, D. M. (1987). The Design And Evaluation of a High Performance Smalltalk System. NY: ACM.
9. Gosling, J. (1998). The Evolution of Numerical Computing in Java. http://java.sun.com/people/jag/FP.html