qt-everywhere-src-5.15.1/qtwebengine/src/3rdparty/chromium/third_party/nasm/doc/internal.doc - orbit - Git at Google

 Internals of the Netwide Assembler
 ==================================

 The Netwide Assembler is intended to be a modular, re-usable x86
 assembler, which can be embedded in other programs, for example as
 the back end to a compiler.

 The assembler is composed of modules. The interfaces between them
 look like:

 		  +--- preproc.c ----+
 		  |		     |
 		  +---- parser.c ----+
 		  |	   |         |
 		  |     float.c      |
 		  |		     |
 		  +--- assemble.c ---+
 		  |        |         |
 	nasm.c ---+     insnsa.c     +--- nasmlib.c
 		  |		     |
 		  +--- listing.c ----+
 		  |		     |
 		  +---- labels.c ----+
 		  |		     |
 		  +--- outform.c ----+
 		  |		     |
 		  +----- *out.c -----+

 In other words, each of `preproc.c', `parser.c', `assemble.c',
 `labels.c', `listing.c', `outform.c' and each of the output format
 modules `*out.c' are independent modules, which do not directly
 inter-communicate except through the main program.

 The Netwide *Disassembler* is not intended to be particularly
 portable or reusable or anything, however. So I won't bother
 documenting it here. :-)

 nasmlib.c
 ---------

 This is a library module; it contains simple library routines which
 may be referenced by all other modules. Among these are a set of
 wrappers around the standard `malloc' routines, which will report a
 fatal error if they run out of memory, rather than returning NULL.

 preproc.c
 ---------

 This contains a macro preprocessor, which takes a file name as input
 and returns a sequence of preprocessed source lines. The only symbol
 exported from the module is `nasmpp', which is a data structure of
 type `Preproc', declared in nasm.h. This structure contains pointers
 to all the functions designed to be callable from outside the
 module.

 parser.c
 --------

 This contains a source-line parser. It parses `canonical' assembly
 source lines, containing some combination of the `label', `opcode',
 `operand' and `comment' fields: it does not process directives or
 macros. It exports two functions: `parse_line' and `cleanup_insn'.

 `parse_line' is the main parser function: you pass it a source line
 in ASCII text form, and it returns you an `insn' structure
 containing all the details of the instruction on that line. The
 parameters it requires are:

 - The location (segment, offset) where the instruction on this line
   will eventually be placed. This is necessary in order to evaluate
   expressions containing the Here token, `$'.

 - A function which can be called to retrieve the value of any
   symbols the source line references.

 - Which pass the assembler is on: an undefined symbol only causes an
   error condition on pass two.

 - The source line to be parsed.

 - A structure to fill with the results of the parse.

 - A function which can be called to report errors.

 Some instructions (DB, DW, DD for example) can require an arbitrary
 amount of storage, and so some of the members of the resulting
 `insn' structure will be dynamically allocated. The other function
 exported by `parser.c' is `cleanup_insn', which can be called to
 deallocate any dynamic storage associated with the results of a
 parse.

 names.c
 -------

 This doesn't count as a module - it defines a few arrays which are
 shared between NASM and NDISASM, so it's a separate file which is
 #included by both parser.c and disasm.c.

 float.c
 -------

 This is essentially a library module: it exports one function,
 `float_const', which converts an ASCII representation of a
 floating-point number into an x86-compatible binary representation,
 without using any built-in floating-point arithmetic (so it will run
 on any platform, portably). It calls nothing, and is called only by
 `parser.c'. Note that the function `float_const' must be passed an
 error reporting routine.

 assemble.c
 ----------

 This module contains the code generator: it translates `insn'
 structures as returned from the parser module into actual generated
 code which can be placed in an output file. It exports two
 functions, `assemble' and `insn_size'.

 `insn_size' is designed to be called on pass one of assembly: it
 takes an `insn' structure as input, and returns the amount of space
 that would be taken up if the instruction described in the structure
 were to be converted to real machine code. `insn_size' also requires
 to be told the location (as a segment/offset pair) where the
 instruction would be assembled, the mode of assembly (16/32 bit
 default), and a function it can call to report errors.

 `assemble' is designed to be called on pass two: it takes all the
 parameters that `insn_size' does, but has an extra parameter which
 is an output driver. `assemble' actually converts the input
 instruction into machine code, and outputs the machine code by means
 of calling the `output' function of the driver.

 insnsa.c
 --------

 This is another library module: it exports one very big array of
 instruction translations. It is generated automatically from the
 insns.dat file by the insns.pl script.

 labels.c
 --------

 This module contains a label manager. It exports six functions:

 `init_labels' should be called before any other function in the
 module. `cleanup_labels' may be called after all other use of the
 module has finished, to deallocate storage.

 `define_label' is called to define new labels: you pass it the name
 of the label to be defined, and the (segment,offset) pair giving the
 value of the label. It is also passed an error-reporting function,
 and an output driver structure (so that it can call the output
 driver's label-definition function). `define_label' mentally
 prepends the name of the most recently defined non-local label to
 any label beginning with a period.

 `define_label_stub' is designed to be called in pass two, once all
 the labels have already been defined: it does nothing except to
 update the "most-recently-defined-non-local-label" status, so that
 references to local labels in pass two will work correctly.

 `declare_as_global' is used to declare that a label should be
 global. It must be called _before_ the label in question is defined.

 Finally, `lookup_label' attempts to translate a label name into a
 (segment,offset) pair. It returns non-zero on success.

 The label manager module is (theoretically :) restartable: after
 calling `cleanup_labels', you can call `init_labels' again, and
 start a new assembly with a new set of symbols.

 listing.c
 ---------

 This file contains the listing file generator. The interface to the
 module is through the one symbol it exports, `nasmlist', which is a
 structure containing six function pointers. The calling semantics of
 these functions isn't terribly well thought out, as yet, but it
 works (just about) so it's going to get left alone for now...

 outform.c
 ---------

 This small module contains a set of routines to manage a list of
 output formats, and select one given a keyword. It contains three
 small routines: `ofmt_register' which registers an output driver as
 part of the managed list, `ofmt_list' which lists the available
 drivers on stdout, and `ofmt_find' which tries to find the driver
 corresponding to a given name.

 The output modules
 ------------------

 Each of the output modules, `outbin.o', `outelf.o' and so on,
 exports only one symbol, which is an output driver data structure
 containing pointers to all the functions needed to produce output
 files of the appropriate type.

 The exception to this is `outcoff.o', which exports _two_ output
 driver structures, since COFF and Win32 object file formats are very
 similar and most of the code is shared between them.

 nasm.c
 ------

 This is the main program: it calls all the functions in the above
 modules, and puts them together to form a working assembler. We
 hope. :-)

 Segment Mechanism
 -----------------

 In NASM, the term `segment' is used to separate the different
 sections/segments/groups of which an object file is composed.
 Essentially, every address NASM is capable of understanding is
 expressed as an offset from the beginning of some segment.

 The defining property of a segment is that if two symbols are
 declared in the same segment, then the distance between them is
 fixed at assembly time. Hence every externally-declared variable
 must be declared in its own segment, since none of the locations of
 these are known, and so no distances may be computed at assembly
 time.

 The special segment value NO_SEG (-1) is used to denote an absolute
 value, e.g. a constant whose value does not depend on relocation,
 such as the _size_ of a data object.

 Apart from NO_SEG, segment indices all have their least significant
 bit clear, if they refer to actual in-memory segments. For each
 segment of this type, there is an auxiliary segment value, defined
 to be the same number but with the LSB set, which denotes the
 segment-base value of that segment, for object formats which support
 it (Microsoft .OBJ, for example).

 Hence, if `textsym' is declared in a code segment with index 2, then
 referencing `SEG textsym' would return zero offset from
 segment-index 3. Or, in object formats which don't understand such
 references, it would return an error instead.

 The next twist is SEG_ABS. Some symbols may be declared with a
 segment value of SEG_ABS plus a 16-bit constant: this indicates that
 they are far-absolute symbols, such as the BIOS keyboard buffer
 under MS-DOS, which always resides at 0040h:001Eh. Far-absolutes are
 handled with care in the parser, since they are supposed to evaluate
 simply to their offset part within expressions, but applying SEG to
 one should yield its segment part. A far-absolute should never find
 its way _out_ of the parser, unless it is enclosed in a WRT clause,
 in which case Microsoft 16-bit object formats will want to know
 about it.

 Porting Issues
 --------------

 We have tried to write NASM in portable ANSI C: we do not assume
 little-endianness or any hardware characteristics (in order that
 NASM should work as a cross-assembler for x86 platforms, even when
 run on other, stranger machines).

 Assumptions we _have_ made are:

 - We assume that `short' is at least 16 bits, and `long' at least
   32. This really _shouldn't_ be a problem, since Kernighan and
   Ritchie tell us we are entitled to do so.

 - We rely on having more than 6 characters of significance on
   externally linked symbols in the NASM sources. This may get fixed
   at some point. We haven't yet come across a linker brain-dead
   enough to get it wrong anyway.

 - We assume that `fopen' using the mode "wb" can be used to write
   binary data files. This may be wrong on systems like VMS, with a
   strange file system. Though why you'd want to run NASM on VMS is
   beyond me anyway.

 That's it. Subject to those caveats, NASM should be completely
 portable. If not, we _really_ want to know about it.

 Porting Non-Issues
 ------------------

 The following is _not_ a portability problem, although it looks like
 one.

 - When compiling with some versions of DJGPP, you may get errors
   such as `warning: ANSI C forbids braced-groups within
   expressions'. This isn't NASM's fault - the problem seems to be
   that DJGPP's definitions of the <ctype.h> macros include a
   GNU-specific C extension. So when compiling using -ansi and
   -pedantic, DJGPP complains about its own header files. It isn't a
   problem anyway, since it still generates correct code.
	Internals of the Netwide Assembler
	==================================

	The Netwide Assembler is intended to be a modular, re-usable x86
	assembler, which can be embedded in other programs, for example as
	the back end to a compiler.

	The assembler is composed of modules. The interfaces between them
	look like:

	+--- preproc.c ----+
	\| \|
	+---- parser.c ----+
	\| \| \|
	\| float.c \|
	\| \|
	+--- assemble.c ---+
	\| \| \|
	nasm.c ---+ insnsa.c +--- nasmlib.c
	\| \|
	+--- listing.c ----+
	\| \|
	+---- labels.c ----+
	\| \|
	+--- outform.c ----+
	\| \|
	+----- *out.c -----+

	In other words, each of `preproc.c', `parser.c', `assemble.c',
	`labels.c', `listing.c', `outform.c' and each of the output format
	modules `*out.c' are independent modules, which do not directly
	inter-communicate except through the main program.

	The Netwide Disassembler is not intended to be particularly
	portable or reusable or anything, however. So I won't bother
	documenting it here. :-)

	nasmlib.c
	---------

	This is a library module; it contains simple library routines which
	may be referenced by all other modules. Among these are a set of
	wrappers around the standard `malloc' routines, which will report a
	fatal error if they run out of memory, rather than returning NULL.

	preproc.c
	---------

	This contains a macro preprocessor, which takes a file name as input
	and returns a sequence of preprocessed source lines. The only symbol
	exported from the module is `nasmpp', which is a data structure of
	type `Preproc', declared in nasm.h. This structure contains pointers
	to all the functions designed to be callable from outside the
	module.

	parser.c
	--------

	This contains a source-line parser. It parses `canonical' assembly
	source lines, containing some combination of the `label', `opcode',
	`operand' and `comment' fields: it does not process directives or
	macros. It exports two functions: `parse_line' and `cleanup_insn'.

	`parse_line' is the main parser function: you pass it a source line
	in ASCII text form, and it returns you an `insn' structure
	containing all the details of the instruction on that line. The
	parameters it requires are:

	- The location (segment, offset) where the instruction on this line
	will eventually be placed. This is necessary in order to evaluate
	expressions containing the Here token, `$'.

	- A function which can be called to retrieve the value of any
	symbols the source line references.

	- Which pass the assembler is on: an undefined symbol only causes an
	error condition on pass two.

	- The source line to be parsed.

	- A structure to fill with the results of the parse.

	- A function which can be called to report errors.

	Some instructions (DB, DW, DD for example) can require an arbitrary
	amount of storage, and so some of the members of the resulting
	`insn' structure will be dynamically allocated. The other function
	exported by `parser.c' is `cleanup_insn', which can be called to
	deallocate any dynamic storage associated with the results of a
	parse.

	names.c
	-------

	This doesn't count as a module - it defines a few arrays which are
	shared between NASM and NDISASM, so it's a separate file which is
	#included by both parser.c and disasm.c.

	float.c
	-------

	This is essentially a library module: it exports one function,
	`float_const', which converts an ASCII representation of a
	floating-point number into an x86-compatible binary representation,
	without using any built-in floating-point arithmetic (so it will run
	on any platform, portably). It calls nothing, and is called only by
	`parser.c'. Note that the function `float_const' must be passed an
	error reporting routine.

	assemble.c
	----------

	This module contains the code generator: it translates `insn'
	structures as returned from the parser module into actual generated
	code which can be placed in an output file. It exports two
	functions, `assemble' and `insn_size'.

	`insn_size' is designed to be called on pass one of assembly: it
	takes an `insn' structure as input, and returns the amount of space
	that would be taken up if the instruction described in the structure
	were to be converted to real machine code. `insn_size' also requires
	to be told the location (as a segment/offset pair) where the
	instruction would be assembled, the mode of assembly (16/32 bit
	default), and a function it can call to report errors.

	`assemble' is designed to be called on pass two: it takes all the
	parameters that `insn_size' does, but has an extra parameter which
	is an output driver. `assemble' actually converts the input
	instruction into machine code, and outputs the machine code by means
	of calling the `output' function of the driver.

	insnsa.c
	--------

	This is another library module: it exports one very big array of
	instruction translations. It is generated automatically from the
	insns.dat file by the insns.pl script.

	labels.c
	--------

	This module contains a label manager. It exports six functions:

	`init_labels' should be called before any other function in the
	module. `cleanup_labels' may be called after all other use of the
	module has finished, to deallocate storage.

	`define_label' is called to define new labels: you pass it the name
	of the label to be defined, and the (segment,offset) pair giving the
	value of the label. It is also passed an error-reporting function,
	and an output driver structure (so that it can call the output
	driver's label-definition function). `define_label' mentally
	prepends the name of the most recently defined non-local label to
	any label beginning with a period.

	`define_label_stub' is designed to be called in pass two, once all
	the labels have already been defined: it does nothing except to
	update the "most-recently-defined-non-local-label" status, so that
	references to local labels in pass two will work correctly.

	`declare_as_global' is used to declare that a label should be
	global. It must be called _before_ the label in question is defined.

	Finally, `lookup_label' attempts to translate a label name into a
	(segment,offset) pair. It returns non-zero on success.

	The label manager module is (theoretically :) restartable: after
	calling `cleanup_labels', you can call `init_labels' again, and
	start a new assembly with a new set of symbols.

	listing.c
	---------

	This file contains the listing file generator. The interface to the
	module is through the one symbol it exports, `nasmlist', which is a
	structure containing six function pointers. The calling semantics of
	these functions isn't terribly well thought out, as yet, but it
	works (just about) so it's going to get left alone for now...

	outform.c
	---------

	This small module contains a set of routines to manage a list of
	output formats, and select one given a keyword. It contains three
	small routines: `ofmt_register' which registers an output driver as
	part of the managed list, `ofmt_list' which lists the available
	drivers on stdout, and `ofmt_find' which tries to find the driver
	corresponding to a given name.

	The output modules
	------------------

	Each of the output modules, `outbin.o', `outelf.o' and so on,
	exports only one symbol, which is an output driver data structure
	containing pointers to all the functions needed to produce output
	files of the appropriate type.

	The exception to this is `outcoff.o', which exports _two_ output
	driver structures, since COFF and Win32 object file formats are very
	similar and most of the code is shared between them.

	nasm.c
	------

	This is the main program: it calls all the functions in the above
	modules, and puts them together to form a working assembler. We
	hope. :-)

	Segment Mechanism
	-----------------

	In NASM, the term `segment' is used to separate the different
	sections/segments/groups of which an object file is composed.
	Essentially, every address NASM is capable of understanding is
	expressed as an offset from the beginning of some segment.

	The defining property of a segment is that if two symbols are
	declared in the same segment, then the distance between them is
	fixed at assembly time. Hence every externally-declared variable
	must be declared in its own segment, since none of the locations of
	these are known, and so no distances may be computed at assembly
	time.

	The special segment value NO_SEG (-1) is used to denote an absolute
	value, e.g. a constant whose value does not depend on relocation,
	such as the _size_ of a data object.

	Apart from NO_SEG, segment indices all have their least significant
	bit clear, if they refer to actual in-memory segments. For each
	segment of this type, there is an auxiliary segment value, defined
	to be the same number but with the LSB set, which denotes the
	segment-base value of that segment, for object formats which support
	it (Microsoft .OBJ, for example).

	Hence, if `textsym' is declared in a code segment with index 2, then
	referencing `SEG textsym' would return zero offset from
	segment-index 3. Or, in object formats which don't understand such
	references, it would return an error instead.

	The next twist is SEG_ABS. Some symbols may be declared with a
	segment value of SEG_ABS plus a 16-bit constant: this indicates that
	they are far-absolute symbols, such as the BIOS keyboard buffer
	under MS-DOS, which always resides at 0040h:001Eh. Far-absolutes are
	handled with care in the parser, since they are supposed to evaluate
	simply to their offset part within expressions, but applying SEG to
	one should yield its segment part. A far-absolute should never find
	its way _out_ of the parser, unless it is enclosed in a WRT clause,
	in which case Microsoft 16-bit object formats will want to know
	about it.

	Porting Issues
	--------------

	We have tried to write NASM in portable ANSI C: we do not assume
	little-endianness or any hardware characteristics (in order that
	NASM should work as a cross-assembler for x86 platforms, even when
	run on other, stranger machines).

	Assumptions we _have_ made are:

	- We assume that `short' is at least 16 bits, and `long' at least
	32. This really _shouldn't_ be a problem, since Kernighan and
	Ritchie tell us we are entitled to do so.

	- We rely on having more than 6 characters of significance on
	externally linked symbols in the NASM sources. This may get fixed
	at some point. We haven't yet come across a linker brain-dead
	enough to get it wrong anyway.

	- We assume that `fopen' using the mode "wb" can be used to write
	binary data files. This may be wrong on systems like VMS, with a
	strange file system. Though why you'd want to run NASM on VMS is
	beyond me anyway.

	That's it. Subject to those caveats, NASM should be completely
	portable. If not, we _really_ want to know about it.

	Porting Non-Issues
	------------------

	The following is _not_ a portability problem, although it looks like
	one.

	- When compiling with some versions of DJGPP, you may get errors
	such as `warning: ANSI C forbids braced-groups within
	expressions'. This isn't NASM's fault - the problem seems to be
	that DJGPP's definitions of the <ctype.h> macros include a
	GNU-specific C extension. So when compiling using -ansi and
	-pedantic, DJGPP complains about its own header files. It isn't a
	problem anyway, since it still generates correct code.