Program Translation

Modern programming languages attempt to give programmers the capability of doing complex things with a computer, while writing instructions for the computer in a language close to their own natural language. For example, most programmers are comfortable enough with standard mathematical language to use expressions such as 1/3*l*w*h (the volume of a pyramid with base length l, base width w, and height h). They do not care to know the sequence of machine instructions that is needed to evaluate this expression, much less the machine coding for those instructions.

Also, when a programmer wants to specify a numerical value, they prefer to specify it as a string of decimal digits rather than machine binary code. Furthermore, a programmer has no interest in the character coding used for the decimal digits. Thus programming languages require powerful mechanisms for translating the language of a programmer into a language that the machine understands.

Overview

Historically, there have been two primary approaches to the problem of translating programming languages into machine language: compilers and interpreters. Although interpreters reduce the number of steps in software development iterations, they introduce a significant runtime translation overhead.

Compilation and interpretation can also be combined into a two-step approach using virtual machines. Virtual machines offer the important advantage of platform independence.

Interpreter runtime translation overhead can be reduced in two ways. One involves using a technique called just-in-time (JIT) compilation. The other involves writing computationally intensive portions of a program in some fast compiled language.

Compilers

The distinguishing characteristic of a compiler is that it is used once to translate a program into machine language. The machine language is saved in a file so that it can be executed as many times as you want without having to be translated again. This allows program execution at the full speed of the computer processor by eliminating the translation overhead. The drawback is that the translated program can only run on the family of processors for which the compiler was designed.

Interpreters

The distinguishing characteristic of an interpreter is that it translates a program into machine language every time that the program is executed. There is no saved translation. If an interpreter is written for each machine then you can move the program from one machine to another and still expect it to execute correctly. It is important to note that even though you need a different interpreter on each machine, the interpreter is the only software that is not portable. Programs written in the language for which the interpreter is designed are portable.

The drawback of interpreters is that the program is translated into machine instructions every time it is executed. This can slow down execution by a factor of a few hundred.

Virtual Machines

A virtual machine is a machine that is implemented in software rather than hardware. In most virtual machine implementations, portability is an important objective. The idea is to introduce an artificial machine language that is close to the language of real processors. Then the translation overhead for converting from the artificial (or virtual) machine language to a real machine language can be reduced significantly.

Program Translation

With a virtual machine, program translation is a two-step process. First, a program is compiled into the language of the virtual machine. When the program is executed, the virtual machine language is interpreted into the language of the real machine. This gives you the portability of an interpreter with significantly reduced execution overhead. With a virtual machine interpreter, the execution slowdown can be reduced to a factor of 5 to 25, depending on the nature of the program.

Portability

The only non-portable software in a program language that uses a virtual machine is the virtual machine interpreter. It is considerable simpler that an interpreter that translates directly from the programming language to real machine language because the virtual machine is already close to real machine language. The compiler from the programming language to the virtual machine language can be itself be compiled to virtual machine language so that it is portable to any machine with a virtual machine interpreter.

History

Historically, the idea of a virtual machine has been around for over 30 years. In the 1970s, the Pascal programming language used virtual machines on some platforms to allow easy porting of Pascal programs from one machine to another. One virtual machine language, called P-code, was used to support development of Pascal programs on early microprocessors. In addition to the portability advantage, P-code was exceptionally compact, an important consideration given the small memory capacities at that time.

Just-in-Time Compilation

Portability considerations are a powerful motivation for using interpreters. However, even with a virtual machine, the translation overhead for interpretation may be too much for some programs. This has lead to a lot of research on how to reduce the overhead. Within the past few years, one idea has moved from the realm of research to the practical realm: Just-In-Time (JIT) compilation.

Operation of a JIT Compiler

A JIT compiler is a component of an interpreter, compiling virtual machine language (usually) into real machine language. Unlike a normal compiler, it does not save the compilation results in a file. Instead, they are saved in memory only for the duration of a program's execution. When a portion of code is first executed, it is first translated to real machine language, just in time to be executed. The next time that portion of code is executed, the saved machine language instructions are executed directly.

Effectiveness of JIT Compilation

You can get a pretty good idea of why JIT compilation is so effective by considering the speed of modern processors and the size of the programs that they execute. Recent processors are capable of executing about ten billion instructions per second. A large program could contain 10 million instructions. This means that in order to execute for one second, each instruction must be executed 1000 times on the average. Generally, programs that take a reasonable time to develop can only have significant run time problems if they are executing some portions of their code a large number of times.

With JIT compilation, you only have translation overhead the first time a portion of code is executed. After that, you are executing the code at full speed. When you average the translation overhead over all of the times that the code portion is executed, it is just a small percentage.

Mixing Languages

The translation overhead of a virtual machine can also be reduced by defining a standardized way of incorporating code that is compiled into the machine language of the underlying machine. For example, the Java language has a Java Native Interface (JNI) standard that defines how code compiled from other languages can be incorporated into a Java program.

Using JNI, a Java programmer can write most of the code in Java but use a small amount of C or assembly language code for the computationally intensive parts of the program.