HP-UX Memory Management

White Paper

Version 1.4

5965-4641

Last modified September 22, 2000

(C)Copyright 1997,2000 Hewlett-Packard Company

Legal Notices

The information contained within this document is subject to change without notice.

HEWLETT-PACKARD MAKES NO WARRANTY OF ANY KIND WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

Hewlett-Packard shall not be liable for errors contained herein nor for incidental consequential damages in connection with the furnishing, performance, or use of this material.

Warranty. A copy of the specific warranty terms applicable to your Hewlett-Packard product and replacement parts can be obtained from your local Sales and Service Office.

Restricted Rights Legend. Use, duplication, or disclosure by the U.S. Government Department is subject to restrictions as set forth in subparagraph (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 for DOD agencies, and subparagraphs (c) (1) and (c) (2) of the Commercial Computer Software Restricted Rights clause at FAR 52.227-19 for other agencies.

Copyright Notices. (C)copyright 1983-2000 Hewlett-Packard Company, all rights reserved.

This documentation contains information that is protected by copyright. All rights are reserved. Reproduction, adaptation, or translation without written permission is prohibited except as allowed under the copyright laws.

(C)Copyright 1981, 1984, 1986 UNIX System Laboratories, Inc.

(C)copyright 1986-1992 Sun Microsystems, Inc.
(C)copyright 1985-86, 1988 Massachusetts Institute of Technology.
(C)copyright 1989-93 The Open Software Foundation, Inc.
(C)copyright 1986 Digital Equipment Corporation.
(C)copyright 1990 Motorola, Inc.
(C)copyright 1990, 1991, 1992 Cornell University
(C)copyright 1989-1991 The University of Maryland.
(C)copyright 1988 Carnegie Mellon University.

Trademark Notices. UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company Limited.

NFS is a trademark of Sun Microsystems, Inc.

OSF and OSF/1 are trademarks of the Open Software Foundation, Inc. in the U.S. and other countries.

First Edition: April 1997 (HP-UX Release 10.30) Second Edition: September 2000 (HP-UX Release 11.11)

Contents

  1. Objectives of This Document
  2. Overview of Physical and Virtual Memory
    1. Pages
    2. Virtual Addresses
    3. Demand Paging

  3. The Role of Physical Memory
    1. Available Memory
    2. Lockable Memory
    3. Secondary Storage

  4. The Abstraction of Virtual Memory
    1. Virtual Space in PA-RISC
    2. Physical Addresses

  5. Memory-Relevant Portions of the Processor
    1. Translation Lookaside Buffer (TLB)
      1. The TLB Translates Addresses
      2. Organization and Types of TLB
      3. Block TLB
      4. TLB Entries

    2. The Page Table or PDIR
      1. Page Fault
      2. The Hashed Page Directory (hpde and hpde2_0) Structure

    3. Instruction and Data Cache
      1. Cache Organization

    4. How the CPU Uses Cache and TLB
    5. TLB Hits and Misses
    6. TLB Role in Access Control and Page Protection
    7. Cache Hits and Misses
    8. Registers

  6. Virtual Memory Structures
    1. Virtual Address Space (vas)
    2. Virtual Memory Elements of a pregion
    3. The Region, a System Resource
      1. a.out Support for Unaligned Pages
      2. Region Flags

    4. Finding the Pages of a Region
      1. Virtual Frame Descriptors (vfd)
      2. Disk Block Descriptor (dbd)
      3. Chunks -- Keeping the vfds and dbds Together in One Place
      4. Balanced Trees (B-Trees)
      5. Root of the B-tree
        1. vfd Prototypes

    5. pseudo-vas for Text and Shared Library pregions
    6. Hardware-Independent Page Information Table (pfdat)
      1. Flags Showing the Status of the Page
      2. Hardware-Dependent Layer Page Frame Data Entry

  7. Mapping Virtual to Physical Memory
    1. The HTBL
      1. When Multiple Addresses Hash to the Same HTBL Entry

    2. Mapping Physical to Virtual Addresses
      1. Address Aliasing

  8. Maintaining Page Availability
    1. Paging Thresholds
      1. The gpgslim Paging Threshold
      2. How Memory Thresholds are Tuned

    2. How Paging is Triggered
    3. vhand, the pageout daemon
      1. Two-Handed Clock Algorithm
      2. Factors Affecting vhand
      3. What Happens When vhand Wakes Up
      4. vhand Steals and Ages Pages

    4. The sched() Routine
      1. What to Deactivate or Reactivate
      2. When a Process is Deactivated
      3. When a Process is Reactivated
      4. Self-Deactivation
      5. Thrashing
      6. Serialization
      7. Deactivation Using the Pager

    5. Memory Resource Groups

  9. Swap Space Management
    1. Pseudo-Swap Space
    2. Physical Swap Space
      1. Device Swap Space
      2. File-System Swap Space

    3. Swap Space Parameters
    4. Swap Space Global Variables
    5. Swap Space Values
    6. Reservation of Physical Swap Space
      1. Swap Reservation Spinlock

    7. Reservation of Pseudo-Swap Space
      1. Pseudo-Swap and Lockable Memory

    8. How Swap Space is Prioritized
      1. Three Rules of Swap Space Allocation

    9. Swap Space Structures
    10. swaptab and swapmap Structures

  10. Overview of Demand Paging
    1. copy-on-write

  11. How Process Structures Are Set Up in Memory
    1. Region Type Dictates Complexity
    2. Duplicating pregions for Shared Regions
    3. Duplicating pregions for Private Regions
      1. Setting copy-on-write When the vfd is Valid
      2. Reconciling the Page and Swap Image
      3. Setting the Child Region's copy-on-write Status

    4. Duplicating a Process's Address Space
      1. Duplicating the uarea for the Child Process
      2. Reading from the Parent's copy-on-write Page
      3. Reading from the Child's copy-on-write Page

    5. Faulting In A Page
      1. Faulting In a Page of Stack or Uninitialized Data
      2. Faulting in a Page of Text or Initialized Data
      3. Retrieving the Page of Text or Initialized Data from Disk

  12. Virtual Memory and exec()
    1. Cleaning up from a vfork()
    2. Disposing of the Old pregions: dispreg()
    3. Building the New Process
    4. Virtual Memory and exit()

Tables

  1. Processor Architecture, Components and Purposes
  2. TLB flags (PA 2.x Architecture)
  3. struct hpde and struct hpde2_0, the Hashed Page Directory
  4. Security Checks in the TLB
  5. Types of Registers (PA-RISC 2.0)
  6. Principal Memory Management Kernel Structures
  7. Principal Elements of struct pregion
  8. Region (struct region)
  9. Unaligned a.out Support by Regions
  10. Region Flags
  11. Virtual Frame Descriptor (struct vfd)
  12. Disk Block Descriptor (struct dbd)
  13. B-tree Node Description (struct bnode)
  14. struct broot
  15. struct vfdcw
  16. Principal Entries in struct pfdat (Page Frame Data)
  17. Principal pf_flag Values
  18. struct hdlpfdat
  19. setmemthresholds() Paging Thresholds
  20. Paging Threshold Values
  21. pregion Elements used by vhand
  22. Variables Affecting vhand
  23. Configurable Swap Space Parameters
  24. Swap Space Characteristics (Global Variables)
  25. Device Swap Table swdevt[] (struct swdevt)
  26. File System Swap Table fswdevt[] (struct fswdevt)
  27. Swap Table Entry (struct swaptab)
  28. Swap Map Entry (struct swapmap)

Figures

  1. Physical Memory Available to Processes
  2. Major Sections of System Address Space (32 bit Kernel)
  3. Major Sections of System Address Space (64 bit Kernel)
  4. Bit Layout of 32-bit Physical Address
  5. Bit Layout of 64-bit Physical Address
  6. Bit Layout of 32-bit Virtual Address
  7. Processor Architecture, Showing Major Components
  8. Role of the TLB
  9. The TLB is a Cache for Address Translations
  10. Every Cache Entry Consists of a Cache Tag and Cache Line
  11. PPNs from Cache and TLB are Compared
  12. Virtual Address Translation
  13. Access Control to Virtual Pages
  14. Summary of Page Retrieval from TLB, Cache, PDIR
  15. Memory Management Structures
  16. Virtual Memory Elements of the pregion
  17. Virtual Frame Descriptor (vfd)
  18. Disk Block Descriptor (dbd)
  19. A chunk Contains an Array of vfddbds
  20. A Sample B-tree (order = 3, depth = 3)
  21. Mapping the pseudo-vas Structures
  22. Mapping from the htbl Entry to the Page Directory Entry
  23. How Multiple Addresses Hash to the Same htbl Entry
  24. Physical-to-virtual Address Translation
  25. Available Memory in the System
  26. Choosing a Swap Location
  27. The swaptab and swapmap Structures
  28. Duplicating pregions with Shared regions
  29. Duplicating a region of Type RT_PRIVATE
  30. The First Time a Read is Done to a copy-on-write Page
  31. Checking the Page Cache to Fault in a DBD_FSTORE Page

Objectives of this Document

OVERVIEW OF PHYSICAL AND VIRTUAL MEMORY

The memory management system is designed to make memory resources available safely and efficiently to threads and processes:

The data and instructions of any process (a program in execution) or thread of execution within a process must be available to the CPU by residing in physical memory at the time of execution.

To execute a process, the kernel creates a per-process virtual address space that is set up by the kernel; portions of the virtual space are mapped onto physical memory. Virtual memory allows the total size of user processes to exceed physical memory. Through "demand paging", HP-UX enables you to execute threads and processes by bringing virtual pages into main memory only as needed (that is, "on demand") and pushing out portions of a process's address space that have not been recently used.

The term "memory management" refers to the rules that govern physical and virtual memory and allow for efficient sharing of the system's resources by user and system processes.

The system uses a combination of pageout and deactivation to manage physical memory. Paging involves writing recently unreferenced pages from main memory to disk from time to time. A page is this smallest unit of physical memory that can be mapped to a virtual address with a given set of access attributes. On a loaded system, total unreferenced pages might be a large fraction of memory.

Deactivation takes place if the system is unable to maintain a large enough free pool of physical memory. When an entire process is deactivated, the pages associated with the process can be written out to secondary storage, since they are no longer referenced. A deactivated process cannot run, and therefore, cannot reference its data.

Secondary storage supplements physical memory. The memory management system monitors available memory and, when it is low, writes out pages of a process or thread to a secondary storage device called a swap device. The data is read from the swap device back into physical memory when it is needed for the process to execute.

Pages

Pages are the smallest contiguous block of physical memory that can be allocated for storing data and code. Pages are also the smallest unit of memory protection. The page size of all HP-UX systems is four kilobytes.

On a PA-RISC system, every page of physical memory is addressed by a physical page number (PPN), which is a software "reduction" of the physical page number from the physical address. Access to pages (and thus to the data they contain) are done through virtual addresses, except under specific circumstances. (When virtual translation must be turned off (the D and I bits are off), pages are accessed by their absolute addresses.)

Virtual Addresses

When a program is compiled, the compiler generates virtual addresses for the code. Virtual addresses represent a location in memory. These virtual addresses must be mapped to physical addresses (locations of the physical pages in memory) for the compiled code to execute. User programs use virtual addresses only.

The kernel and the hardware coordinate a mapping of these virtual and physical addresses for the CPU, called "address translation," to locate the process in memory.

The PA-RISC architecture is segmented; a complete virtual address consists of a space identifier (SID) and an offset within that space.

The offset may be 32 bits or 64 bits wide; earlier PA-RISC processors (before PA-RISC 2.0) only support 32 bit offsets.

From the point of view of a user program, the segmentation is not obvious; instead, user programs experience an almost flat address space with either 32 or 64 bit virtual addresses (depending on how the process was compiled).

The kernel however deals in the full complexity of space and offset.

From the kernel point of view, every process running on a PA-RISC processor shares a single global virtual address space, with global virtual addresses (GVAs) composed of both space and offset. (These GVAs are 96 bit on PA-RISC 2.0 processors running in 64-bit (wide) mode; smaller on earlier processors.) This global virtual address space is also shared by the kernel.

Although any process can create and attempt to read or write any global virtual address, the kernel uses page granularity access control mechanisms to prevent unwanted interference between processes.

When a virtual page is "paged" into physical memory, free physical pages are allocated to it by the physical memory allocator. These pages may be randomly scattered throughout the memory depending on their usage history. Translations are needed to tell the processor where the virtual pages are loaded. The process of translating the virtual into physical address is called virtual address translation.

Potentially the virtual address space can be much greater than the physical address space. The virtual memory system enables the CPU to execute programs much larger than the available physical memory and allows you run many more programs at a time than you could without a virtual memory system.

Demand Paging

For a process to execute, all the structures for data, text, and so on have to be set up. However, pages are not loaded in memory until they are "demanded" by a process -- hence the term, demand paging. Demand paging allows the various parts of a process to be brought into physical memory as the process needs them to execute. Only the working set of the process, not the entire process, need be in memory at one time. A translation is not established until the actual page is accessed.

THE ROLE OF PHYSICAL MEMORY

Memory is the "container" for data storage. The general repository for high-speed data storage is close to the CPU, and is termed random access memory (RAM) or "main memory." For the CPU to execute a process, the code and data referenced by that process must reside in random access memory (RAM). RAM is shared by all processes.

The more main memory in the system, the more data the system can access and the more (or larger) processes it can retain and execute without having to page or cause deactivation as frequently. Memory-resident resources (such as page tables) also take up space in main memory, reducing the space available to applications.

At boot time, the system loads HP-UX from disk into RAM, where it remains memory-resident until the system is shut down.

User programs and commands are also loaded from disk into RAM, but in small portions as they are needed. When a program terminates, the operating system frees the memory used by the process.

Disk access is slow compared to RAM access. Excessive disk access can lead to increased latency or reduced throughput and can lead to the disk access becoming the bottleneck in the system. To avoid this, you need to do some sort of buffering. Buffering, paging, and deactivation algorithms optimize disk access and determine when data and code for currently running programs are returned from RAM to disk. When a user or system program writes data to disk, the data is either written directly from the program's RAM (e.g. if writing to a "raw" device) or buffered in what is called the buffer cache and written to disk in relatively big chunks. Programs also read files and database structures from disk into RAM. When you issue the sync command before shutting down a system, all modified buffers of the buffer cache are flushed (written) out to disk.

On each processor, there are also registers and cache, which are even faster than main memory. Actual program execution actually happens in registers, which get data from the cache and other registers. The cache contains the current working copy of parts of main memory. Most of the time when discussing memory management, cache and registers will be completely ignored; data and instructions will be treated as being accessed directly from main memory. They are mentioned here in an attempt to reduce confusion:

From this point on, this section only discusses "main memory".

Figure 1 Physical Memory Available to Processes

	     +------------------------------+  | | |
	     |                              |  | | |
	     |                              |  | | |
	     |                              |  | | |
	     |                              |  | | |
	     |                              | Lockable memory
	     |                              |  | | |
	     |                              |  |Available memory
	     |                              |  | | |
	     +..............................+  | |Physical memory
	     |                              |    | |
	     |                              |    | |
	     +------------------------------+    | |
HP-UX kernel |                              |      |
at bootup    |                              |      |
	     +------------------------------+      |

Available Memory

The amount of main memory not reserved for the kernel is termed available memory. Available memory is used by the system for executing user processes.

Not all physical memory is available to user processes. Kernel text and initialized data occupy about 10 MB of RAM; additional memory is used by kernel bss (uninitialized data), and (especially) various structures allocated during kernel boot. Many of the structures allocated during kernel boot can be quite large. The sizes of some are determined by kernel tunables, but many are sized based on the amount of physical memory in the system, e.g. such a structure might have one 96 byte entry for every 4096 byte page of physical memory.

Instead of allocating all its data structures at system initialization, the HP-UX kernel dynamically allocates and releases some kernel structures as needed by the system during normal operation. This allocation comes from the available memory pool; thus, at any given time, part of the available memory is used by the kernel and the remainder is available for user programs.

Physical address space is the entire range of addresses used by hardware (4GB on 32 bit (narrow mode) kernels), and is divided into memory address space, processor-dependent code (PDC) address space, and I/O address space. The next figure shows the expanse of memory available for computation. Memory address space takes up 15/16 of the system address space, while address space allotted to PDC and I/O consume a relatively small range of addresses.

Figure 2 Major Sections of System Address Space (32 bit Kernel)

          +-----------+
0x00000000| page zero |
          +-----------+
          |           |
          |           |       +-----------------------+
          |  Memory   |      /| PDC address space     |0xF0000000
          |  address  |     / |                       |
          |  space    |    /  +-----------------------+
          |           |   /   |                       |0xF1000000
          |           |  /    |                       |
          |           | /     | I/O Register          |
0xF0000000+-----------+/      | address               |
          | PDC & I/O |       | space                 |
0xFFFFFFFF+-----------+       |                       |
                       \      |                       |
                        \     +.......................+
                         \    | Central bus           |
                          \   | address space         |
                           \  +.......................+
                            \ | Broadcast address     |0xFFFC0000
                             \| space (local, global) |0xFFFFFFFF
                              +-----------------------+

Figure 3 Major Sections of System Address Space (64 bit Kernel)

                   +-----------------------+
0x00000000 00000000| page zero             |        
                   +.......................+
                   |                       |
                   |                       |
                   |                       |
                   |                       |
                   |                       |
                   |                       |
                   | Memory                |
                   | address               |
                   | space                 |
                   |                       |
                   |                       |
                   |                       |
                   |                       |
                   |                       |
                   |                       |
                   |                       |
                   |                       |
                   +-----------------------+
0xF0000000 00000000| PDC address space     |
0xF1000000 00000000|                       |
                   +-----------------------+
                   | I/O Register          |
                   | address               |
                   | space                 |
                   +.......................+
                   | Central bus           |
                   | address space         |
                   +.......................+
0xFFFFFFFF FFFC0000| Broadcast address     |
0xFFFFFFFF FFFFFFFF| space (local, global) |
                   +-----------------------+

Lockable Memory

Pages kept in memory for the lifetime of a process by means of a system call (such as mlock, plock, or shmctl) are termed locked memory. Locked memory cannot be paged and processes with locked memory cannot be deactivated. Typically, locked memory holds frequently accessed programs or data structures, such as critical sections of application code. Keeping them memory-resident improves application performance.

The lockable_mem variable tracks how much memory can be locked.

Available memory is a portion of physical memory, minus the amount of space required for the kernel and its data structures. The initial value of lockable_mem is the available memory on the system after boot-up, minus the value of the system parameter, unlockable_mem.

The value of lockable memory depends on several factors:

HP-UX places no explicit limits on the amount of available memory you may lock down; instead, HP-UX restricts how much memory cannot be locked.

Other kernel resources that use memory (such as the dynamic buffer cache) can cause changes.

As the amount of memory that has been locked down increases, existing processes compete for a smaller and smaller pool of usable memory. If the number of pages in this remaining pool of memory falls below the paging threshold called lotsfree, the system will activates its paging mechanism, by scheduling vhand in an attempt to keep a reasonable amount of memory free for general system use.

Care must be taken to allow sufficient space for processes to make forward progress; otherwise, the system is forced into paging and deactivating processes constantly, to keep a reasonable amount of memory free.

Secondary Storage

Data is removed to secondary storage if the system is short of main memory. The data is typically stored on disks accessible either via system buses or network to make room for active processes.

Swap refers to a physical memory management strategy (predating UNIX) where entire processes are moved between main memory and secondary storage. Modern virtual memory systems today no longer swap entire processes, but rather use a paging scheme, where individual pages of data and instructions can be paged in from secondary storage as needed, or paged out again to free up memory for other uses. This is backed up by a deactivation scheme that allows whole processes to be pushed out if the system is desperately short of memory. However, the secondary storage dedicated to storing paged out data is still referred to as "swap space".

Device swap can take the form of an entire disk or LVM(1) logical volume of a disk. A file system can be configured to offer free space for swap; this is termed file-system swap. If more swap space is required, it can be added dynamically to a running system, as either device swap or file-system swap. The swapon command is used to allocate disk space or a directory in a file system for swap.

(1) Logical Volume Manager (LVM) is a set of commands and underlying software to handle disk storage resources with more flexibility than offered by traditional disk partitions.

THE ABSTRACTION OF VIRTUAL MEMORY

A computer has a finite amount of RAM available, but each 32-bit HP-UX process has a 4 GB virtual address space apportioned in four one-gigabyte quadrants. (64-bit HP-UX processes have an even larger virtual address space, though they can't actually use the full (16 Exabyte) range of virtual addresses addressable with 64 bits. It too is broken into 4 quadrants equal sized quadrants.) This is termed virtual memory.

Virtual memory is the software construct that allows each process sufficient computational space in which to execute. It is accomplished with hardware support.

Virtual Space in PA-RISC

As software is compiled and run, it generates virtual addresses that provide programmers with memory space many times larger than physical memory alone.

HP-UX is a Shared Address Space (SAS) operating system. A given virtual address (including space ID) refers to the same page of memory for all processes; translations are not changed when the process context changes.

Thus, the number of bits available for the space ID (segment) and offset (often simply called "virtual address") determines the ultimate size of the total virtual address space available to the kernel and all prcesses together.

As PA-RISC evolved, the number of bits usable for space and offset have increased. On PA-RISC 2.0, the space ID is 32 bits (18 bits actually used in HPUX 11.11) and the offset is effectively 42 bits (though stored in a 64 bit field). (PA-RISC 1.1 systems, and PA-RISC 2.0 running in narrow (32 bit) mode have a smaller offset.)

NOTE: Understand, however, that a single process has significant limitations on the virtual address space it is allowed to access. For example, a 32-bit SHARE_MAGIC executable text is limited to 1 GB and data is limited to 1 GB. Also, the total amount of shared virtual address space in the system is limited to much less than theoretically addressable; without using memory windows, the total shared space on a wide mode (64-bit) system is limited to approximately 8 TB (i.e. 2 64-bit quadrants).

Physical Addresses

A physical address points to a page in memory that represents 4096 bytes of data. The physical address also contains an offset into this page. Thus, the complete physical address is composed of a physical page number(PPN) and page offset. The PPN is the 20 or 52 most significant bits of the physical address where the page is located. These bits are concatenated with an 12-bit page offset to form the 32 or 64-bit physical address.

Figure 4 Bit Layout of 32-bit Physical Address

Page Number           Page Offset
+--------------------+------------+
|00000000000000000100|100001110011|
+--------------------+------------+
 0                 19 20         31

Figure 5 Bit Layout of 64-bit Physical Address

Page Number                                          Page Offset
+---------------------------------------------------+------------+
|000000000000000000000000000000000000000000000000100|100001110011|
+---------------------------------------------------+------------+
 0                                                 51 52         63

To handle the translation of the virtual address to a physical address the virtual address also needs to be looked at as a virtual page number(VPN) and page offset. Since the page size is 4096 bytes, the low order 12 bits of the offset are assumed to be the offset into the page. The space ID and the high order bits of the offset are the VPN.

For any given address you can determine the page number by discarding the least significant 12 bits. What remains is the virtual page number for a virtual address or the physical page number for the physical address.

The next figure shows the bit layout of a 32-bit virtual address of 0x0.4873.

Figure 6 Bit Layout of 32-bit Virtual Address

32-bit Space ID                   32-bit Offset
+--------------------------------+--------------------+------------+
|00000000000000000000000000000000|00000000000000000100|100001110011|
+--------------------------------+--------------------+------------+

|                                                    | |           |
+----------------------------------------------------+ +-----------+
                            |                                |

                        VPN = 0x4                        Page Offset
                                                           0x873

The virtual page number's address must be translated to obtain the associated page number, with page offset 0x873.

MEMORY-RELEVANT PORTIONS OF THE PROCESSOR

Figure 7 Processor Architecture, Showing Major Components

+---------------------------------------------------+
|  +--------------------+                           |
|  | Central Processing |                           |
|  |    Unit (CPU)      |    +-------------------+  |
|  +--------------------+    |   Floating Point  |  |
|            |-------------->|   Coprocessor     |  |
|            |               +-------------------+  |
|            |------------------------+             |
|            |                        |             |
|            V                        V             |
|  +--------------------+    +-------------------+  |
|  |                    |    |    Translation    |  |
|  |       Cache        |    |  Lookaside Buffer |  |
|  |                    |    |      (TLB)        |  |
|  +--------------------+    +-------------------+  |
|            |                        |             |
|            |<-----------------------+             |
|  +--------------------+                           |
|  |  System Interface  |                           |
|  |      Unit (SLU)    |                           |
|  +--------------------+                           |
|            |                                      |
+------------V--------------------------------------+
             |                                         Central Bus
==================================================================

The figure above and the table that follows, name the principal processor components; of them, registers, translation lookaside buffer, and cache are crucial to memory management, and will be discussed in greater detail following the table.

Table 1 Processor Architecture, Components and Purposes

Component Purpose
Central Processing Unit (CPU) The main component responsible for reading program and data from memory, and executing the program instructions. Within the CPU are the following:
  • Registers, high-speed memory used to hold data while it is being manipulated by instructions, for computations, interruption processing, protection mechanisms, and virtual memory management. Registers are discussed shortly in greater detail.
  • Control Hardware (also called instruction or fetch unit) that coordinates and synchronizes the activity of the CPU by interpreting (decoding) instructions to generate control signals that activate the appropriate CPU hardware.
  • Execution Hardware to perform the actual arithmetic, logic, and shift operations. Execution Hardware can take on many specialized tasks but most common are the Arithmetic and Logic Unit (ALU) and the Shift Merge Unit (SMU).
Instruction and Data Cache The cache is a portion of high-speed memory used by the CPU for quick access to data and instructions. The most recently accessed data is kept in the cache.
Translation Lookaside Buffer (TLB) The processor component that enables the CPU to access data through virtual address space by:
  • Translating the virtual address to physical address.
  • Checking access rights, so that access is granted to instructions, data, or I/O only if the requesting process has proper authorization.
Floating Point Coprocessor An assist processor that carries out specialized tasks for the CPU.
System Interface Unit (SIU) Bus circuitry that allows the CPU to communicate with the central (native) bus.

Translation Lookaside Buffer (TLB)

The translation lookaside buffer (TLB) translates virtual addresses to physical addresses.

Figure 8 Role of the TLB

+---------------------------+\
|                           |  \
|                           |    \
|                           |      \
|                           |        \
|                           |          \
|                           |            \
|                           |             +--------+
|          Virtual          |    +---+    |Physical|
|          address          |<-->|TLB|<-->|address |
|          space            |    +---+    |space   |
|                           |             +--------+
|                           |            /
|                           |          /
|                           |        /
|                           |      /
|                           |    /
|                           |  /
+---------------------------+/

Address translation is handled from the top of the memory hierarchy hitting the fastest components first (such as the TLB on the processor) and then moving on to the page directory table (pdir in main memory) and lastly to secondary storage.

The TLB translates addresses

The TLB looks up the translation for the virtual page numbers (VPNs) and gets the physical page numbers (PPNs) used to reference physical memory.

Figure 9 The TLB is a Cache for Address Translations

   Virtual address                                     Main Memory
 +-------------------+-----------+                      +--------+
 |Virtual Page Number|Byte Offset|                      | 0      |
 +-------------------+-----------+                      |        |
        |               |                               |        |
        |               +-------------------+           |        |
        V                                   |           |        |
       VPN      PPN   Rights ID O U T D P   |           |        |
 +------------+-------+----+---+-+-+-+-+-+  |           |        |
 |            |       |    |   | | | | | |  |        +------>[]  |
 +------------+-------+----+---+-+-+-+-+-+  |  PPN   |  |        |
T|            |       |    |   | | | | | |  |   +    |  |        |
L+------------+-------+----+---+-+-+-+-+-+  |  Offset|  |        |
B|            |       |    |   | | | | | |  |        |  |        |
 +------------+-------+----+---+-+-+-+-+-+  |        |  |        |
                    |                       |        |  |        |
                    V  Physical address     V        |  |        |
                +--------------------+-----------+   |  |        |
                |Physical Page Number|Byte Offset|---+  |physmem |
                +--------------------+-----------+      +--------+

Ideally the TLB would be large enough to hold translations for every page of physical memory; however this is prohibitively expensive; instead the TLB holds a subset of entries from the page directory table (PDIR) in memory. The TLB speeds up the process of examining the PDIR by caching copies of its most recently utilized translations.

Because the purpose of the TLB is to satisfy virtual to physical address translation, the TLB is only searched when memory is accessed while in virtual mode. This condition is indicated by the D-bit in the PSW (or the I-bit for instruction access).

Organization and Types of TLB

Depending on model, the TLB may be organized on the processor in one of two ways:

The advantage of having a split Data TLB (DTLB) and Instruction TLB (ITLB), is that it is possible to account for the different characteristics of data and instruction locality and type of access (frequent random access of data versus relatively sequential single usage of instructions).

Block TLB

Because TLB size is limited, it is desirable to use as few entries as possible to translate the largest possible amount of memory. PA-RISC 2.0 processors provide a variable page size, and memory is organized to use large page sizes wherever this is reasonable. In particular, the memory initially allocated for the kernel at boot time is mapped with the largest possible page size that fits it. (Other memory will be mapped with large pages if possible, but there are tradeoffs that may make this impractical, especially on small memory systems.)

PA-RISC processors before PA-RISC 2.0 do not support a general purpose variable page size. Instead, they may provide a block TLB. The block TLB is quite small, but its entries can map more than a single 4K page (i.e. multiple hpdes). Block TLB entries are used to reference kernel memory that remains resident. (Memory referenced by a block TLB entry cannot be paged out.) The block TLB is typically used for graphics, because their data is accessed in huge chunks. It is also used for mapping other static areas such as kernel text and data.

TLB Entries

Since the TLB translates virtual to physical addresses, each entry contains both the Virtual Page Number (VPN) and the Physical Page Number (PPN). Entries also contain Access Rights, an Access Identifier, and five flags.

Table 2 TLB flags (PA 2.x Architecture)

Flag Name Meaning
O Ordered Accesses to data for load and store are ranked by strength -- strongly ordered, ordered, and weakly ordered. (See PA-RISC 2.0 specifications for model and definitions.)
U Uncacheable Determines whether data references to a page from memory address space may be moved into the cache. Typically set to 1 for data references to a page that maps to the I/O address space or for memory address space that must not be moved into cache.
T(1) Page Reference Trap If set, any access to this page causes a reference trap to be handled either by hardware or software trap handlers
D Dirty When set, this bit indicates that the associated page in memory differs from the same page on disk. The page must be flushed before being invalidated.
B Break This bit causes a trap on any instruction that is capable of writing to this page
P Prediction method for branching Optional, used for performance tuning.

(1) The T,D, and B flags are only present in data or unified TLBs.

In PA 1.x architecture, an E bit (or "valid" bit) indicates that the TLB entry reflects the current attributes of the physical page in memory.

The Page Table or PDIR

The operating system maintains a table in memory called the Page Directory (PDIR) which keeps track of all virtual pages currently in memory. When a page is mapped in some virtual address space, it is allocated an entry in the PDIR. The PDIR is what links a virtual address to a physical page in memory.

The PDIR is implemented as a memory-resident table of software structures called hashed page directory entries (HPDEs), which contain virtual and physical addresses. When the processor needs to find a physical page not indexed in the TLB, it can search the PDIR with a virtual address to find the matching address.

The PDIR table is a hash table with collision chains. The virtual address is used to hash into one of the buckets in the hash table and the corresponding chain is searched until a chain entry with a matching virtual address is found.

Note that the page table is not a purely software construct. On systems that provide hardware for TLB miss handling, this is the table examined by the hardware to attempt to find an appropriate translation to insert in the TLB when resolving a TLB miss fault.

Page Fault

A trap occurs because translation is missing in the translation lookaside buffer (TLB). If the processor can find the missing translation in the PDIR, it installs it in the TLB and allows execution to continue. If not, a page fault occurs.

A page fault is a trap taken when the address needed by a process is missing from the main memory. This occurrance is also known as a PDIR miss. A PDIR miss indicates that the page is either on the free list, in the page cache, or on disk; the memory management system must then find the requested page on the swap device or in the file system and bring it into main memory.

Conversely, a PDIR hit indicates that a translation exists for the virtual address in the TLB.

The Hashed Page Directory (hpde and hpde2_0) Structure

Each PDE contains information on the virtual-to-physical address translation, along with other information necessary for the management of each page of virtual memory.

PA-RISC 1.1 and PA-RISC 2.0 systems use different hashed page directory entry structures, with mostly similar field names and purposes. The following table combines the structural elements of the PA-RISC 1.1 hashed page directory entry (struct hpde) and the PA-RISC 2.0 hashed page directory entry (struct hpde2_0).

Table 3 struct hpde and struct hpde2_0, the Hashed Page Directory

Element PA-RISC Version Meaning
pde_valid PA-RISC 1.1 Flag set by the kernel to indicate a valid pde entry.
pde_invalid PA-RISC 2.0 Flag set by the kernel to indicate an invalid pde entry.
pde_vpage both Virtual page - the virtual offset divided by 4096.
pde_space both Contains the complete virtual space ID.
pde_rtrap both Data reference trap enable bit; when set, any access to the page causes a page reference trap interruption.
pde_dirty both Dirty bit; marked if the page differs in memory from what is on disk.
pde_dbrk both Data break; used by the TLB.
pde_ar both Access rights; used by the TLB.(1)
pde_uncache both Uncache bit.
pde_order PA-RISC 2.0 Strong ordering bit.
pde_br_predict PA-RISC 2.0 Branch prediction bit.
pde_ref_trickle both Trickle-up bit for references. Used with pde_ref on systems whose hardware can search the htbl directly.
pde_block_mapped both Block mapping flag; indicates page is mapped by block TLB and cannot be aliased.
pde_executed both Used by the stingy cache flush algorithm to indicate that page is referenced as text(2).
pde_ref both Reference bit set by the kernel when it receives certain interrupts; used by vhand to tell if a page has been used recently.
pde_accessed both Used by the stingy cache flush algorithm to indicate that the page may be in data cache.
pde_modified both Indicator to the high-level virtual memory routines as to whether the page has been modified since last written to a swap device.
pde_uip both Lock flag used by trap-handling code.
pde_protid both Protection ID, used by the TLB.
pde_os PA-RISC 2.0 Entry in use.
pde_alias both Virtual alias field. If set, the pde has been allocated from elsewhere in kernel memory, rather than as a member of the sparse PDIR.
pde_wx_demote PA-RISC 2.0 (64-bit kernels only) User space fic.
pde_phys PA-RISC 1.1 Physical page number; the physical memory address divided by the page size (4096 bytes).
pde_phys_u PA-RISC 2.0 Physical page number: most significant 25 bits.
pde_phys PA-RISC 2.0 Physical page number: least significant 27 bits address divided by the page size.
var_page PA-RISC 2.0 Page size.
pde_next both Pointer to next entry, or null if end of list.

(1) For detailed information on access rights, see the PA-RISC 2.0 Architectural reference, chapter 3, "Addressing and Access Control." For information about how programs can manipulate this field, see mmap(2) and mprotect(2) manpages.

(2) Stingy cache flush is a performance enhancement by which the kernel recognizes whether or not to flush the cache.

Instruction and Data Cache

Cache is fast, associative memory on the processor module that stores recently accessed instructions and data. From it, the processor learns whether it has immediate access to data or needs to go out to (slower) main memory for it.

Cacheable data going to the CPU from main memory passes through the cache. Conversely, the cache serves as the means by which the CPU passes data to and from main memory. Cache reduces the time required for the CPU to access data by maintaining a copy of the data and instructions most recently requested.

A cache improves system performance because most memory accesses are to addresses that are very close to or the same as previously accessed addresses. The cache takes advantage of this property by bringing into cache a block of data whenever the CPU requests an address. Though this depends on size of the cache, associativity, and workload, a vast majority of the time (according to performance measurements), the cache has what you're looking for the next time, enabling you to reference it.

Cache Organization

Depending on model, PA-RISC processors are equipped with either a unified cache or separate caches for instructions and data (for better locality and faster performance). In multiprocessing systems, each processor has its own cache, and a cache controller maintains consistency.

Cache memory itself is organized as follows: