How to identify CPU processor architecture on Linux

Multi-core processor architecture becomes increasingly popular nowadays. This trend is accelerated by the need for supporting multi-tenant hardware virtualization, high-performance computing applications, and Internet-scale workloads in data centers. As a server administrator and cloud architect, you must be aware of the CPU processor architecture of servers, so that server applications can take full advantage of underlying hardware capability.

The trend of high core density hardware also guides the evolution of software development, introducing new types of parallel programming models. Multi-threaded applications developed under these models must be able to leverage parallel execution across different cores, multi-level cache, CPU/memory affinity, etc.

In this tutorial, I describe how to identify CPU processor architecture from the command line on Linux. A CPU processor architecture is characterized by the number of physical sockets/processors, the number of cores per processor, multi-level (L1/L2/L3) cache, NUMA (Non-uniform memory access) configuration, etc.

Method One

likwid (Like I Knew What I’m Doing) is a suite of command line tools that are designed to support application designers for multi-threaded application development. likwid works with Linux kernel 2.6 and higher, and is regularly updated to support the latest generations of Intel/AMD processors. Currently it supports Intel Core2, Nehalem, Westmere and Sandy Bridge, as well as AMD K8, K10, and Bulldozer (Interlagos).

To install likwid on Linux:

$ tar xvfvz likwid-3.0.0.tar.gz
$ cd likwid-3.0.0
$ sudo make install

likwid comes with several command-line tools:

  • likwid-topology: Display the NUMA and cache topology.
  • likwid-perfctr: Display the hardware performance counters of processors.
  • likwid-features: Display and change hardware prefetch control bits on Intel Core 2 processors.
  • likwid-pin: Pin a multi-threaded application to a specific CPU.
  • likwid-bench: Benchmarking tool for rapid prototyping of threaded assembly kernels.
  • likwid-mpirun: Script enabling CPU pinning of MPI and MPI/threaded hybrid applications.
  • likwid-perfscope: Frontend for likwid-perfctr which allows real-time plotting of performance metrics.
  • likwid-powermeter: Tool for accessing RAPL counters and query Turbo mode steps on Intel processor.
  • likwid-memsweeper: Tool to clean up ccNUMA (cache-coherent NUMA) memory domains.

To visualize the CPU processor architecture:

$ likwid-topology -g
-------------------------------------------------------------
CPU type:    Intel Core Westmere processor
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets:    2
Cores per socket:    4
Threads per core:    2
-------------------------------------------------------------
HWThread    Thread        Core        Socket
0        0        0        0
1        0        0        1
2        0        10        0
3        0        10        1
4        0        1        0
5        0        1        1
6        0        9        0
7        0        9        1
8        1        0        0
9        1        0        1
10        1        10        0
11        1        10        1
12        1        1        0
13        1        1        1
14        1        9        0
15        1        9        1
-------------------------------------------------------------
Socket 0: ( 0 8 4 12 6 14 2 10 )
Socket 1: ( 1 9 5 13 7 15 3 11 )
-------------------------------------------------------------

*************************************************************
Cache Topology
*************************************************************
Level:    1
Size:    32 kB
Cache groups:    ( 0 8 ) ( 4 12 ) ( 6 14 ) ( 2 10 ) ( 1 9 ) ( 5 13 ) (
7 15 ) ( 3 11 )
-------------------------------------------------------------
Level:    2
Size:    256 kB
Cache groups:    ( 0 8 ) ( 4 12 ) ( 6 14 ) ( 2 10 ) ( 1 9 ) ( 5 13 ) (
7 15 ) ( 3 11 )
-------------------------------------------------------------
Level:    3
Size:    12 MB
Cache groups:    ( 0 8 4 12 6 14 2 10 ) ( 1 9 5 13 7 15 3 11 )
-------------------------------------------------------------

*************************************************************
NUMA Topology
*************************************************************
NUMA domains: 2
-------------------------------------------------------------
Domain 0:
Processors:  0 2 4 6 8 10 12 14
Relative distance to nodes:  10 20
Memory: 4207.48 MB free of total 8181.75 MB
-------------------------------------------------------------
Domain 1:
Processors:  1 3 5 7 9 11 13 15
Relative distance to nodes:  20 10
Memory: 4020.77 MB free of total 8192 MB
-------------------------------------------------------------

*************************************************************
Graphical:
*************************************************************
Socket 0:
+-----------------------------------------+
| +-------+ +-------+ +-------+ +-------+ |
| |  0  8 | | 4  12 | | 6  14 | | 2  10 | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| |  32kB | |  32kB | |  32kB | |  32kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| | 256kB | | 256kB | | 256kB | | 256kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------------------------------------+ |
| |                 12MB                | |
| +-------------------------------------+ |
+-----------------------------------------+
Socket 1:
+-----------------------------------------+
| +-------+ +-------+ +-------+ +-------+ |
| |  1  9 | | 5  13 | | 7  15 | | 3  11 | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| |  32kB | |  32kB | |  32kB | |  32kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| | 256kB | | 256kB | | 256kB | | 256kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------------------------------------+ |
| |                 12MB                | |
| +-------------------------------------+ |
+-----------------------------------------+

The above is an example output of HP ProLiant DL380 G7, where it shows two physical sockets, Hyper-Threading enabled quad-core CPU in each socket, 32kB L1 cache, 256kB L2 cache, and 12MB L3 cache.

Method Two

hwloc is a command-line suite that gathers various attributes of the underlying processor architecture, such as NUMA memory nodes, multi-level caches, processor sockets, processor cores, PCI devices/bridges, etc.

To install hwloc on Debian, Ubuntu or Linux Mint:

$ sudo apt-get install hwloc

To install hwloc on Fedora, CentOS or RHEL:

$ sudo yum install hwloc

Once hwloc package is installed, you can use lstopo to show processor architecture as follows.

$ lstopo --no-io

If you are running lstopo in Linux desktop environment, it will pop up a window which visualizes the underlying processor architecture and cache hierarchy nicely as follows.

If lstopo is called in a desktop-less server environment, it will show the output in text format as follows.

Machine (16GB)
  NUMANode L#0 (P#0 8182MB) + Socket L#0 + L3 L#0 (12MB)
    L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#8)
    L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
      PU L#2 (P#2)
      PU L#3 (P#10)
    L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
      PU L#4 (P#4)
      PU L#5 (P#12)
    L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
      PU L#6 (P#6)
      PU L#7 (P#14)
  NUMANode L#1 (P#1 8192MB) + Socket L#1 + L3 L#1 (12MB)
    L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4
      PU L#8 (P#1)
      PU L#9 (P#9)
    L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5
      PU L#10 (P#3)
      PU L#11 (P#11)
    L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6
      PU L#12 (P#5)
      PU L#13 (P#13)
    L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7
      PU L#14 (P#7)
      PU L#15 (P#15)

You can let lstopo export processor architecture visualization to a separate image file by specifying an output file as follows.

$ lstopo --no-io topo.png

Method Three

numactl is a command line tool for tuning NUMA hardware (such as pinning processes or threads to specific physical cores or ccNUMA nodes).

To install numactl on Debian, Ubuntu or Linux Mint:

$ sudo apt-get install numactl

To install numactl on Fedora, CentOS or RHEL:

$ sudo yum install numactl

If you want to check available NUMA nodes with numactl, do the following:

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14
node 0 size: 8181 MB
node 0 free: 4235 MB
node 1 cpus: 1 3 5 7 9 11 13 15
node 1 size: 8191 MB
node 1 free: 4048 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

Subscribe to Xmodulo

Do you want to receive Linux FAQs, detailed tutorials and tips published at Xmodulo? Enter your email address below, and we will deliver our Linux posts straight to your email box, for free. Delivery powered by Google Feedburner.

The following two tabs change content below.
Dan Nanni is the founder and also a regular contributor of Xmodulo.com. He is a Linux/FOSS enthusiast who loves to get his hands dirty with his Linux box. He likes to procrastinate when he is supposed to be busy and productive. When he is otherwise free, he likes to watch movies and shop for the coolest gadgets.
Your name can also be listed here. Write for us as a freelancer.

11 thoughts on “How to identify CPU processor architecture on Linux

  1. lscpu from util-linux/util-linux-ng is also nice. for example:

    # lscpu
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 16
    On-line CPU(s) list: 0-15
    Thread(s) per core: 2
    Core(s) per socket: 4
    Socket(s): 2
    NUMA node(s): 2
    Vendor ID: GenuineIntel
    CPU family: 6
    Model: 44
    Stepping: 2
    CPU MHz: 2533.410
    BogoMIPS: 5066.58
    Virtualization: VT-x
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 256K
    L3 cache: 12288K
    NUMA node0 CPU(s): 0,2,4,6,8,10,12,14
    NUMA node1 CPU(s): 1,3,5,7,9,11,13,15

  2. Yes, this is more like a review of three pieces of software than an actual tutorial. Identifying your processor architecture may be necessary before you can even download and install third-party tools, so you need to understand how to do that without having to install them first.

      • It's common that you need the uname -p value to know whether to download a 32-bit or 64-bit version of software.

        It's rare indeed that you need to know number of cores or sockets. Most software that cares will ask the system itself.

Leave a comment

Your email address will not be published. Required fields are marked *

Current ye@r *