fs/yaffs2/README-linux - u-boot - Git at Google

 Welcome to YAFFS, the first file system developed specifically for NAND flash.

 It is now YAFFS2 - original YAFFS (AYFFS1) only supports 512-byte page
 NAND and is now deprectated. YAFFS2 supports 512b page in 'YAFFS1
 compatibility' mode (CONFIG_YAFFS_YAFFS1) and 2K or larger page NAND
 in YAFFS2 mode (CONFIG_YAFFS_YAFFS2).


 A note on licencing
 -------------------
 YAFFS is available under the GPL and via alternative licensing
 arrangements with Aleph One. If you're using YAFFS as a Linux kernel
 file system then it will be under the GPL. For use in other situations
 you should discuss licensing issues with Aleph One.


 Terminology
 -----------
 Page -  NAND addressable unit (normally 512b or 2Kbyte size) - can
 	be read, written, marked bad. Has associated OOB.
 Block - Eraseable unit. 64 Pages. (128K on 2K NAND, 32K on 512b NAND)
 OOB -   'spare area' of each page for ECC, bad block marked and YAFFS
 	tags. 16 bytes per 512b - 64 bytes for 2K page size.
 Chunk - Basic YAFFS addressable unit. Same size as Page.
 Object - YAFFS Object: File, Directory, Link, Device etc.

 YAFFS design
 ------------

 YAFFS is a log-structured filesystem. It is designed particularly for
 NAND (as opposed to NOR) flash, to be flash-friendly, robust due to
 journalling, and to have low RAM and boot time overheads. File data is
 stored in 'chunks'. Chunks are the same size as NAND pages. Each page
 is marked with file id and chunk number. These marking 'tags' are
 stored in the OOB (or 'spare') region of the flash. The chunk number
 is determined by dividing the file position by the chunk size. Each
 chunk has a number of valid bytes, which equals the page size for all
 except the last chunk in a file.

 File 'headers' are stored as the first page in a file, marked as a
 different type to data pages. The same mechanism is used to store
 directories, device files, links etc. The first page describes which
 type of object it is.

 YAFFS2 never re-writes a page, because the spec of NAND chips does not
 allow it. (YAFFS1 used to mark a block 'deleted' in the OOB). Deletion
 is managed by moving deleted objects to the special, hidden 'unlinked'
 directory. These records are preserved until all the pages containing
 the object have been erased (We know when this happen by keeping a
 count of chunks remaining on the system for each object - when it
 reaches zero the object really is gone).

 When data in a file is overwritten, the relevant chunks are replaced
 by writing new pages to flash containing the new data but the same
 tags.

 Pages are also marked with a short (2 bit) serial number that
 increments each time the page at this position is incremented. The
 reason for this is that if power loss/crash/other act of demonic
 forces happens before the replaced page is marked as discarded, it is
 possible to have two pages with the same tags. The serial number is
 used to arbitrate.

 A block containing only discarded pages (termed a dirty block) is an
 obvious candidate for garbage collection. Otherwise valid pages can be
 copied off a block thus rendering the whole block discarded and ready
 for garbage collection.

 In theory you don't need to hold the file structure in RAM... you
 could just scan the whole flash looking for pages when you need them.
 In practice though you'd want better file access times than that! The
 mechanism proposed here is to have a list of __u16 page addresses
 associated with each file. Since there are 2^18 pages in a 128MB NAND,
 a __u16 is insufficient to uniquely identify a page but is does
 identify a group of 4 pages - a small enough region to search
 exhaustively. This mechanism is clearly expandable to larger NAND
 devices - within reason. The RAM overhead with this approach is approx
 2 bytes per page - 512kB of RAM for a whole 128MB NAND.

 Boot-time scanning to build the file structure lists only requires
 one pass reading NAND. If proper shutdowns happen the current RAM
 summary of the filesystem status is saved to flash, called
 'checkpointing'. This saves re-scanning the flash on startup, and gives
 huge boot/mount time savings.

 YAFFS regenerates its state by 'replaying the tape'  - i.e. by
 scanning the chunks in their allocation order (i.e. block sequence ID
 order), which is usually different form the media block order. Each
 block is still only read once - starting from the end of the media and
 working back.

 YAFFS tags in YAFFS1 mode:

 18-bit Object ID (2^18 files, i.e. > 260,000 files). File id 0- is not
        valid and indicates a deleted page. File od 0x3ffff is also not valid.
        Synonymous with inode.
 2-bit  serial number
 20-bit Chunk ID within file. Limit of 2^20 chunks/pages per file (i.e.
        > 500MB max file size). Chunk ID 0 is the file header for the file.
 10-bit counter of the number of bytes used in the page.
 12 bit ECC on tags

 YAFFS tags in YAFFS2 mode:
   4 bytes 32-bit chunk ID
   4 bytes 32-bit object ID
   2 bytes Number of data bytes in this chunk
   4 bytes Sequence number for this block
   3 bytes ECC on tags
  12 bytes ECC on data (3 bytes per 256 bytes of data)


 Page allocation and garbage collection

 Pages are allocated sequentially from the currently selected block.
 When all the pages in the block are filled, another clean block is
 selected for allocation. At least two or three clean blocks are
 reserved for garbage collection purposes. If there are insufficient
 clean blocks available, then a dirty block ( ie one containing only
 discarded pages) is erased to free it up as a clean block. If no dirty
 blocks are available, then the dirtiest block is selected for garbage
 collection.

 Garbage collection is performed by copying the valid data pages into
 new data pages thus rendering all the pages in this block dirty and
 freeing it up for erasure. I also like the idea of selecting a block
 at random some small percentage of the time - thus reducing the chance
 of wear differences.

 YAFFS is single-threaded. Garbage-collection is done as a parasitic
 task of writing data. So each time some data is written, a bit of
 pending garbage collection is done. More pages are garbage-collected
 when free space is tight.


 Flash writing

 YAFFS only ever writes each page once, complying with the requirements
 of the most restricitve NAND devices.

 Wear levelling

 This comes as a side-effect of the block-allocation strategy. Data is
 always written on the next free block, so they are all used equally.
 Blocks containing data that is written but never erased will not get
 back into the free list, so wear is levelled over only blocks which
 are free or become free, not blocks which never change.


 Some helpful info
 -----------------

 Formatting a YAFFS device is simply done by erasing it.

 Making an initial filesystem can be tricky because YAFFS uses the OOB
 and thus the bytes that get written depend on the YAFFS data (tags),
 and the ECC bytes and bad block markers which are dictated by the
 hardware and/or the MTD subsystem. The data layout also depends on the
 device page size (512b or 2K). Because YAFFS is only responsible for
 some of the OOB data, generating a filesystem offline requires
 detailed knowledge of what the other parts (MTD and NAND
 driver/hardware) are going to do.

 To make a YAFFS filesystem you have 3 options:

 1) Boot the system with an empty NAND device mounted as YAFFS and copy
    stuff on.

 2) Make a filesystem image offline, then boot the system and use
    MTDutils to write an image to flash.

 3) Make a filesystem image offline and use some tool like a bootloader to
    write it to flash.

 Option 1 avoids a lot of issues because all the parts
 (YAFFS/MTD/hardware) all take care of their own bits and (if you have
 put things together properly) it will 'just work'. YAFFS just needs to
 know how many bytes of the OOB it can use. However sometimes it is not
 practical.

 Option 2 lets MTD/hardware take care of the ECC so the filesystem
 image just had to know which bytes to use for YAFFS Tags.

 Option 3 is hardest as the image creator needs to know exactly what
 ECC bytes, endianness and algorithm to use as well as which bytes are
 available to YAFFS.

 mkyaffs2image creates an image suitable for option 3 for the
 particular case of yaffs2 on 2K page NAND with default MTD layout.

 mkyaffsimage creates an equivalent image for 512b page NAND (i.e.
 yaffs1 format).

 Bootloaders
 -----------

 A bootloader using YAFFS needs to know how MTD is laying out the OOB
 so that it can skip bad blocks.

 YAFFS Tracing
 -------------
	Welcome to YAFFS, the first file system developed specifically for NAND flash.

	It is now YAFFS2 - original YAFFS (AYFFS1) only supports 512-byte page
	NAND and is now deprectated. YAFFS2 supports 512b page in 'YAFFS1
	compatibility' mode (CONFIG_YAFFS_YAFFS1) and 2K or larger page NAND
	in YAFFS2 mode (CONFIG_YAFFS_YAFFS2).


	A note on licencing
	-------------------
	YAFFS is available under the GPL and via alternative licensing
	arrangements with Aleph One. If you're using YAFFS as a Linux kernel
	file system then it will be under the GPL. For use in other situations
	you should discuss licensing issues with Aleph One.


	Terminology
	-----------
	Page - NAND addressable unit (normally 512b or 2Kbyte size) - can
	be read, written, marked bad. Has associated OOB.
	Block - Eraseable unit. 64 Pages. (128K on 2K NAND, 32K on 512b NAND)
	OOB - 'spare area' of each page for ECC, bad block marked and YAFFS
	tags. 16 bytes per 512b - 64 bytes for 2K page size.
	Chunk - Basic YAFFS addressable unit. Same size as Page.
	Object - YAFFS Object: File, Directory, Link, Device etc.

	YAFFS design
	------------

	YAFFS is a log-structured filesystem. It is designed particularly for
	NAND (as opposed to NOR) flash, to be flash-friendly, robust due to
	journalling, and to have low RAM and boot time overheads. File data is
	stored in 'chunks'. Chunks are the same size as NAND pages. Each page
	is marked with file id and chunk number. These marking 'tags' are
	stored in the OOB (or 'spare') region of the flash. The chunk number
	is determined by dividing the file position by the chunk size. Each
	chunk has a number of valid bytes, which equals the page size for all
	except the last chunk in a file.

	File 'headers' are stored as the first page in a file, marked as a
	different type to data pages. The same mechanism is used to store
	directories, device files, links etc. The first page describes which
	type of object it is.

	YAFFS2 never re-writes a page, because the spec of NAND chips does not
	allow it. (YAFFS1 used to mark a block 'deleted' in the OOB). Deletion
	is managed by moving deleted objects to the special, hidden 'unlinked'
	directory. These records are preserved until all the pages containing
	the object have been erased (We know when this happen by keeping a
	count of chunks remaining on the system for each object - when it
	reaches zero the object really is gone).

	When data in a file is overwritten, the relevant chunks are replaced
	by writing new pages to flash containing the new data but the same
	tags.

	Pages are also marked with a short (2 bit) serial number that
	increments each time the page at this position is incremented. The
	reason for this is that if power loss/crash/other act of demonic
	forces happens before the replaced page is marked as discarded, it is
	possible to have two pages with the same tags. The serial number is
	used to arbitrate.

	A block containing only discarded pages (termed a dirty block) is an
	obvious candidate for garbage collection. Otherwise valid pages can be
	copied off a block thus rendering the whole block discarded and ready
	for garbage collection.

	In theory you don't need to hold the file structure in RAM... you
	could just scan the whole flash looking for pages when you need them.
	In practice though you'd want better file access times than that! The
	mechanism proposed here is to have a list of __u16 page addresses
	associated with each file. Since there are 2^18 pages in a 128MB NAND,
	a __u16 is insufficient to uniquely identify a page but is does
	identify a group of 4 pages - a small enough region to search
	exhaustively. This mechanism is clearly expandable to larger NAND
	devices - within reason. The RAM overhead with this approach is approx
	2 bytes per page - 512kB of RAM for a whole 128MB NAND.

	Boot-time scanning to build the file structure lists only requires
	one pass reading NAND. If proper shutdowns happen the current RAM
	summary of the filesystem status is saved to flash, called
	'checkpointing'. This saves re-scanning the flash on startup, and gives
	huge boot/mount time savings.

	YAFFS regenerates its state by 'replaying the tape' - i.e. by
	scanning the chunks in their allocation order (i.e. block sequence ID
	order), which is usually different form the media block order. Each
	block is still only read once - starting from the end of the media and
	working back.

	YAFFS tags in YAFFS1 mode:

	18-bit Object ID (2^18 files, i.e. > 260,000 files). File id 0- is not
	valid and indicates a deleted page. File od 0x3ffff is also not valid.
	Synonymous with inode.
	2-bit serial number
	20-bit Chunk ID within file. Limit of 2^20 chunks/pages per file (i.e.
	> 500MB max file size). Chunk ID 0 is the file header for the file.
	10-bit counter of the number of bytes used in the page.
	12 bit ECC on tags

	YAFFS tags in YAFFS2 mode:
	4 bytes 32-bit chunk ID
	4 bytes 32-bit object ID
	2 bytes Number of data bytes in this chunk
	4 bytes Sequence number for this block
	3 bytes ECC on tags
	12 bytes ECC on data (3 bytes per 256 bytes of data)


	Page allocation and garbage collection

	Pages are allocated sequentially from the currently selected block.
	When all the pages in the block are filled, another clean block is
	selected for allocation. At least two or three clean blocks are
	reserved for garbage collection purposes. If there are insufficient
	clean blocks available, then a dirty block ( ie one containing only
	discarded pages) is erased to free it up as a clean block. If no dirty
	blocks are available, then the dirtiest block is selected for garbage
	collection.

	Garbage collection is performed by copying the valid data pages into
	new data pages thus rendering all the pages in this block dirty and
	freeing it up for erasure. I also like the idea of selecting a block
	at random some small percentage of the time - thus reducing the chance
	of wear differences.

	YAFFS is single-threaded. Garbage-collection is done as a parasitic
	task of writing data. So each time some data is written, a bit of
	pending garbage collection is done. More pages are garbage-collected
	when free space is tight.


	Flash writing

	YAFFS only ever writes each page once, complying with the requirements
	of the most restricitve NAND devices.

	Wear levelling

	This comes as a side-effect of the block-allocation strategy. Data is
	always written on the next free block, so they are all used equally.
	Blocks containing data that is written but never erased will not get
	back into the free list, so wear is levelled over only blocks which
	are free or become free, not blocks which never change.



	Some helpful info
	-----------------

	Formatting a YAFFS device is simply done by erasing it.

	Making an initial filesystem can be tricky because YAFFS uses the OOB
	and thus the bytes that get written depend on the YAFFS data (tags),
	and the ECC bytes and bad block markers which are dictated by the
	hardware and/or the MTD subsystem. The data layout also depends on the
	device page size (512b or 2K). Because YAFFS is only responsible for
	some of the OOB data, generating a filesystem offline requires
	detailed knowledge of what the other parts (MTD and NAND
	driver/hardware) are going to do.

	To make a YAFFS filesystem you have 3 options:

	1) Boot the system with an empty NAND device mounted as YAFFS and copy
	stuff on.

	2) Make a filesystem image offline, then boot the system and use
	MTDutils to write an image to flash.

	3) Make a filesystem image offline and use some tool like a bootloader to
	write it to flash.

	Option 1 avoids a lot of issues because all the parts
	(YAFFS/MTD/hardware) all take care of their own bits and (if you have
	put things together properly) it will 'just work'. YAFFS just needs to
	know how many bytes of the OOB it can use. However sometimes it is not
	practical.

	Option 2 lets MTD/hardware take care of the ECC so the filesystem
	image just had to know which bytes to use for YAFFS Tags.

	Option 3 is hardest as the image creator needs to know exactly what
	ECC bytes, endianness and algorithm to use as well as which bytes are
	available to YAFFS.

	mkyaffs2image creates an image suitable for option 3 for the
	particular case of yaffs2 on 2K page NAND with default MTD layout.

	mkyaffsimage creates an equivalent image for 512b page NAND (i.e.
	yaffs1 format).

	Bootloaders
	-----------

	A bootloader using YAFFS needs to know how MTD is laying out the OOB
	so that it can skip bad blocks.

	YAFFS Tracing
	-------------