Introduction to PCIe Express:

This blog describes the fundamentals of Peripheral Component Interconnect Express (PCI express) protocol. In personal computers, peripheral devices connect to processor subsystem using Peripheral Component Interconnect (PCI), Peripheral Component Interconnect eXtended (PCI-X), Accelerated Graphics Port (AGP) and PCIe buses. Peripherals can be graphic cards, hard disk drives, SSDs, WiFi and ethernet devices. PCIe replaces PCI, PCI-X and AGP bus protocols used in computing machines of earlier days. Advanced version of PCI is PCI-express(PCIe). PCIe, PCI-e refers to PCI express protocol.

Comparison of PCI and PCIe:

Fig. 1 shows the legacy PCI and PCIe ports. PCI is a parallel interface whereas PCIe is a serial interface. PCI uses individual buses for each of the devices connected to it instead of a shared one like what PCIe uses.

The difference in speed between standard PCI interface and 16 slot PCIe is large. The legacy PCI has a data rate of 133MB/s but the PCIe has a data rate of 16GB/s.

Also, PCI slots are the same sizes for all devices. PCIe slots differ depending on which form factor it accepts. The longest would be the 16-lane slot and, the shortest is for the 1-lane slots.

PCI and PCIe slots on motherboards

Fig. 1 PCI and PCIe slots on motherboards

Fig. 2 shows the topology of PCI and PCIe. Legacy PCI is a parallel data transfer protocol. But PCIe is a serial data transfer protocol.

 Legacy PCI and PCIe slots topologies

Fig. 2 Legacy PCI and PCIe slots topologies

PCI Special Interest Group (PCI-SIG):

PCI-SIG defines and maintains the technical specifications and standards for PCI and PCIe. PCI-SIG is a special interest group of 900 companies. It defined PCIe in 1995 as PCIe 1.1. Since then, it has developed four versions of PCIe standards for improved architectures. PCIe supports high data throughput, low power and is of smaller size. PCIe makes today’s laptops, computers smaller, making them powerful, portable, and handy. Lanes in PCI-e are many interfaces on which data transfers can happen in parallel. Laptop expansions, computer storage interfaces like SATA express use PCI-e with many lanes. Table 1 shows the PCIe architectures and their bandwidth details:

Table 1 PCIe architectures and their bandwidth details

 Legacy PCI and PCIe slots topologies

PCIe Features:

The main features of PCIe protocol are:

  1. Point to point serial transfer protocol with master-slave configuration.
  2. Scalable with lane aggregation supporting multiples of transfer rates.
  3. Uses the same memory, IO, address space, and configuration as PCI and is compatible with it.
  4. Uses packet-based transaction protocol like ethernet
  5. It has improved data integrity with error handling

PCIe Pin descriptions:

PCIe comes in two configurations: 1 lane called PCIe x1 and 16 lane PCIe x16. It is a serial bus point-to-point protocol. General processors use the smallest PCI x1 slots. Graphic cards use the longest PCI x16 slots. PCIe x1 interface has 36 pins arranged in pair of 18 pins. Out of 36 pins, only 6 pins are functional pins and the remaining are power or auxiliary pins. The six functional pins operate as differential pair of signals. Differential signals are more immune to external interferences. They consume low power and help in clock recovery. PCIe x1 signal description is shown in Table 2.

PCIe 1x signal description

Table 2 PCIe 1x signal description

Multi-lane PCIe uses many of these functional signals except the REFCLK differential pair. For example, a two-lane PCIe uses five signal pairs with REFCLK and two PET and two PER pairs. PCIe x16 uses thirty-three signal pairs.

PCIe stack

PCIe achieves reliable data transfers using a three-layer PCIe protocol stack as in Fig. 3.

 PCIe protocol stack

Fig. 3 PCIe protocol stack

The physical layer handles reliable transmission on the link with 8/10 encoding. The physical layer also does Clock recovery from the data it receives. Frequent data toggling ensures clock generation. But when the data is not toggling frequently, ten-bit data encoding toggles the data for this. A cyclic redundancy check (CRC) is used to help in correcting any data errors in the interface.

Datalink layer checks received packets for packet errors with the help of retransmissions for errored packets and manages acknowledgments.

The transaction layer gets 32-bit words called double words (DWs) on the 32-bit interface from the master device which is sent as packets containing address, data to the data link layer. Transaction layer DWs are called transaction layer packets (TLPs). TLP packet contains header and payload fields as shown in Fig. 4

TLP packet structure

Fig. 4 TLP packet structure

The detailed packet format is shown in Fig. 5

TLP Detailed packet format

Fig. 5 TLP Detailed packet format

Data transfers:

PCIe transactions are basically requested and completions. There are four types of requests: Depending on the destination requests are classified as follows:

  • Memory write/read
  • IO write/read
  • Configuration
  • Message

Depending on whether they require completion, they are further classified as posted or non-posted requests. The request is non posted if they do not need completion.

Memory or IO Data write:

When the master device wants to write 32-bit data onto the peripheral device, it initiates a write transfer request on the PCIe bus. This packet consists of a header, which is either 3 or 4 DWs long (depending on if 32 or 64-bit addressing is used) and one 32-bit DW to be written. Write to memory happens in bursts. When the write requests are non-post requiring completion, the throughput reduces to Mbps as each writes request must wait for completion. IO and configuration transactions are single transactions. When PCIe master requests to read Memory, The PCIe device reads the data and responds to master with the read data as completion with data message.

Terminologies associated with PCIe:

Other terminologies associated with PCIe topologies are the following:

  • Root Complex
  • End Point
  • PCIe bridge

Root Complex: It is the “root” of the PCI inverted tree topology and acts on behalf of the CPU to communicate with the rest of the devices. It connects system CPU to PCIe topology. It initiates configuration requests as the requester. Fig. 6 shows the position of the root complex in PCIe topology.

Root complex position in PCIe topology

Fig. 6 Root complex position in PCIe topology

Endpoint: According to the PCIe specification in the PCIe topology there can be 256 buses, 32 devices on each bus, and 8 functions in each device.

An endpoint can support a maximum of up to 8 functions and every function has its own separate configuration space. A function in an endpoint can be a separate individual entity where it has its own functionality. PCIe-based NVM and PCIe-based SSDs are two end-point devices on a computer system.

PCIe PCI bridge: They are adapters that allow PCI devices to connect to PCIe slots in systems by doing protocol conversions from PCI specification to PCIe 1x specification. Master sends requests with necessary parameters to the PCIe bridge. PCIe bridge converts requests into point-to-point transfers on the requested lane in the interface.

LTSSM: LTSSM is an abbreviation of link training and status state machine which manages PCIe devices. It is the main state machine control that detects, Polls, Configures, Recovers, Resets, and Disables the devices at the right times during operation.