Building a PC: UEFI/BIOS settings + Disk Management + SSH Server Config + Installing Ubuntu OS, Nvidia GPU, Xilinx FPGA
With the increase in day to day usage of ML/AI powered applications and other emerging applications such as Data Analytics, Graph Processing, Bigdata and high resolution games such as FIFA or GTA V, the need for a high end PC has become very common. The ROI of owning a PC is comparatively lesser as compared to renting out cloud infrastructure on a long run. This article describes the steps to assemble a PC from scratch and install high end accelerators (such as GPU and FPGA). This article will also describe the numerous issues encountered during the process and the potential workarounds/fixes. As an example, to support every hardware component and make them compatible with each other and the drivers/firmware, the OS had to be downgraded from Ubuntu 20.04 to the immediate previous version and this process went on for 5 times until the OS was compatible with the GPU and FPGA drivers. The major issue during this process was that there were certain libraries (libboost-signals1.65.1) which could not be installed using an apt-get and hence a different version of the OS had to be reinstalled as these libraries come preinstalled with the OS.
The PC Spec which was assembled is mentioned below:
- CPU: Intel 10th Gen Core i9 Extreme Edition (36 Threads)
- RAM: 128 (4 x 32) GB DDR4 4000 (PC4 32000)
- Asus ROG Strix X299-E Gaming II ATX LGA2066 Motherboard
- Samsung 970 EVO Plus 2 TB M.2 NVME
- Samsung 860 Evo 1 TB 2.5" Solid State Drive
- Seagate Barracuda Compute 2 TB 3.5" 7200RPM Internal Hard Drive
- Cooler Master Hyper 212 Evo CPU Cooler
- Cooler Master MasterBox TD500 Mesh w/ Controller ATX Mid Tower Case
- Corsair RM (2019) 850 W ATX Power Supply
- NVIDIA RTX 2080 TI (11GB GDDR6 memory)
- Xilinx Alveo Data Center Accelerator Card
Part 1: Assembling components
The Intel Processor (LGA2066 socket) was placed in the motherboard and the fan was installed on top of the processor using screws. The fan stands like a tower and hence it is important to ensure that the width of the cabinet is wide enough to accommodate the standing fan. The fan was powered using the CPU PWM output whose PWM setting was set from the UEFI/BIOS CPU Fan control menu. The RAM DIMMs were inserted into RAM slots as per the order mentioned in the mother board manual. The NVME SSD was attached to the M.2 slot (ensure that the NVME is inserted at a 30 degree angle wrt to the motherboard and then apply pressure to make it parallel to the motherboard). Generally (and the motherboard used in this case) had a small screw to hold the NVME parallel to the motherboard. Then the GPU and FPGA were inserted to the PCIe lanes and connected to the power unit (2 x 8-Pin for GPU and 8-pin for FPGA). The HDMI output port of the GPU was connected to the external display/monitor. The SSD and HDD were connected to the motherboard using SATA cable and these were powered from the power supply unit. The mother board was connect to the power unit. Lastly, a connect was made form the power unit to the mother board for the CPU power. The mother board comes with a wireless antenna to connect Wi-Fi. It was hooked to the dual band ports in the mother board.
The motherboard used in this case had 7 segment LEDs to display the boot process/state at which the motherboard is currently at. For instance, after assembling PC and powering it up, the motherboard showed a code which as per the motherboard manual suggested that the RAM’s weren’t working properly. Reinstalling the RAM DIMM’s again properly went past this error. These states are extremely helpful to debug any incorrect physical connections or unknow issues. When the state goes into Boot into BIOS/OS, it means that all the physical connects are correct or at least the PC is booting into BIOS.
Part 2: UEFI/BIOS Setting and OS Installation
As far as possible, its best to keep the default settings. It was made sure that the CPU and RAM frequencies are as expected. Here, you will have the option to change the way the CPU drives the PWM output to the fan. Ensure that the PWM is set in such a way that as temperature rises, the fan speed increases. We also need to secure boot (which can be turned off from the BIOS). UEFI Secure boot is a kind of verification mechanism for ensuring that code launched by system is trusted. It forces the OS to ensure that all system level drivers are authenticated/signed. However, on Ubuntu, we often need third party hardware and drivers such as FPGAs, Wireless Cards which are generally not signed. So if secure boot is enabled these drivers will not load during boot and hence the user might not be able to use them. Hence, as a work around, we decided to turn off secure boot. However, we realized this late and hence we turned off secure boot using a process described later in the article. There might be ways to acquire signatures for the third party drivers and authenticate them so that they work with secure boot. However, this article does not explore that territory.
Meanwhile, the boot order was changed in the BIOS such that the PC boots from a bootable ubuntu flash drive after exiting BIOS.
Issue: After restarting the PC did not boot from the flash drive but a blank screen was displayed. After numerous tries, the root case for this blank screen was something to do with the NVIDIA GPU driver. The solution is described below
- At BIOS/UEFI screen, press ‘ESC’ to enter grub2 config, select “Ubuntu” and press “E”.
- Remove ‘quite splash’ and append ‘nomodeset’ at the end of the line that starts with ‘linux’.
- Press F10 to boot.
Install the OS in the NVME disk since booting from NVME is extremely quick. After Ubuntu OS is installed, configure the GRUB so that on next restart, the screen does not freeze.
4. sudo vi /etc/default/grub
5. Find the line starting with GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash” and add ‘nomodeset’ at the end. Save and close the file.
6. Update GRUB using sudo update-grub2
Now the screen won’t freeze on restart. However, installing a proper Nvidia driver should fix the screen freeze. Moreover, the screen resolution was extremely poor as well and was fixed by installing NVIDIA Linux x86_64 Driver Version >= 418.39 (alongside CUDA 10.1).
Ubuntu installation process is simple and hence is not described at depth in this article.
Part 3: Disk Partitioning + File System Mounting
The OS was installed in the NVME. During the OS installation there is an option to partition the NVME disk and if desired the NVME could be partitioned. However, it was not partitioned.
The below process describes the process to detect a new disk, partitioning it, creating a file system and mounting it to root.
- Detecting the drive: The drive should be visible to the OS and given everything is ubuntu is a file system, hence the new disk should appear like a file in the /dev directory. Hence “ls /dev/sd*” (eg. /dev/sda, /dev/sda1, /dev/sdb etc.) should list the different disks on the system.
2. Creating Partitions: Using fdisk utility tool, partitions can be created in ubuntu. To partition sda, simply type the command “sudo fdisk /dev/sda” and type “p” to view the current partitions. Since the device was new, it will not have any partitions and hence type “n” to create a new partition and then “p” again to create it as a primary partition. Here we create only one partition and hence we mention First cylinder as 1 and Last cylinder as as the size of the disk. Then type “w” to save the configuration. Now we will have a partition as /dev/sda1.
3. Creating File System: After creating partitions, we need to create a file system which helps the OS to store files/directories and search for different files/directories by storing each file as an inode (vnode in case of NFS or AFS system). Use mkfs.ext3 tool and provide the label and partition name (sudo mkfs.ext3 -L /label /dev/sda1).
4. Mounting the File system: We need to mount the file system (in this case to root directory on the NVME disk) such that it can be accessed by users. Make a directory (in this case /label) and then use sudo mount /dev/sda1 /label to mount the disk.
Type in “mount” on the command line for sanity check. The output of mount should display the newly mounted file system and the mount point.
Part 4: Setting up SSH Server and Network Configuration
Enabling ssh server will allow the machine to listen on ssh port and accept remote connections. To setup ssh server, follow the below steps:
sudo apt update
sudo apt install openssh-server
Wait for sometime for the ssh server to start listening for incoming connections. To verify the server, type sudo systemctl status ssh command and ensure that the server status is active (running).
In case you want to stop the ssh: sudo systemctl stop ssh
To start ssh: sudo systemctl start ssh
For ssh to work, you need to know the IP address of the machine which can be found by typing “ifconfig” in the command line. However, DHCP server might randomly assign available IP addresses at every startup. Hence, it is advisable to check with the network provider (ISP) and get a static IP assigned for the system. Once the static IP is assigned, manually configure the IP, subnet mask, default gateway and DNS server in the wired/wireless settings. If the system is assigned a name then, the DNS server should register the name assigned to the PC with the static IP.
Part 5: Installing NVIDIA CUDA
With the rise in usage of ML/AL, and NVIDIA continuously improving their software stack, it has become extremely easy to install CUDA (refer NVIDIA website for detailed and official steps at https://developer.nvidia.com/cuda-10.1-download-archive-base). Following the below steps should also install CUDA on the PC. Note that this is for Ubuntu 10.04 and CUDA 10.1 running on a 64 bit machine.
sudo apt update
sudo add-apt-repository ppa:graphics-drivers
sudo apt-key adv — fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-key adv — fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pubecho “deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /” | sudo tee /etc/apt/sources.list.d/cuda.listsudo apt-get update
sudo apt-get -o Dpkg::Options::=” — force-overwrite” install cuda-10–1 cuda-drivers
Add Cuda installation path to bashrc and verify that CUDA was installed properly using “nvidia-smi” and “nvcc -version”.
cuDNN can be installed using “sudo apt install libcudnn7” and installation can be verified using
/sbin/ldconfig -N -v $(sed ‘s/:/ /’ <<< $LD_LIBRARY_PATH) 2>/dev/null | grep libcudnn
Part 6: Installing Xilinx XRT and Deployment Target Platform
OS: Ubuntu 18.04 LTS, VITIS: 2020.2, XRT Version: 2019.1
Note that we tried other version but failed because there was a mismatch between the firmware on the host vs that on the FPGA card. Finally after debugging, the above mentioned versions worked for Alveo Datacenter Accelerator Card. Also do note that secure boot had to be disabled (refer part 7) to make the card working as per the solution provided in https://www.xilinx.com/support/answers/73454.html
Install Vitis (version 2020.2). Vitis installs Vivado as well. Download the runtime and target platform files form the official Xilinx website. All the steps below are followed from the official document (https://www.xilinx.com/support/documentation/boards_and_kits/accelerator-cards/2019_1/ug1301-getting-started-guide-alveo-accelerator-cards.pdf)
Verify that the card is properly attached to the PCIe slot and powered by
sudo lspci -vd 10ee:
This command should show the Xilinx Device, its memory addresses and capabilities.
sudo apt-get install ocl-icd-libopencl1
sudo apt-get install opencl-headers
sudo apt-get install ocl-icd-opencl-dev
Install the runtime using sudo apt install <xrt_version.deb>
Install deployment shell by first creating a directory inside the /opt/xilinx and then running
sudo apt install <xilinx-card>.deb
After the installation, the terminal will show a command to flash the FPGA
sudo /opt/xilinx/xrt/bin/xbutil flash -a
Perform a cold reboot of the system and then type
sudo /opt/xilinx/xrt/bin/xbutil flash scan
The output will show card details and the Shell version in the card and the SC version on the system. Ensure that both the version match. If both the versions do not match then follow https://www.xilinx.com/html_docs/accelerator_cards/alveo_doc_280/spv1560519139855.html to change the shell and XRT versions.
Finally use the below command to verify the card is working by performing some host to card and card to host transfers.
The final output should look something like the below image.
Part 7: Disabling Secure Boot
To disable secure boot, either turn it off from the UEFI menu or follow the below command line
sudo mokutil — disable-validation
Reboot the system. Upon restart a blue screen appears. Press “Enter” and then select “Change Secure Boot state”. Enter the password and then select “Yes” to disable secure boot. Boot the system and open terminal and type “sudo mokutil — sb-state”. Check that the secure boot state is disable.
Add multiple such machines connected to each other in an adhoc network and configure a coordinator machine (with a public IP) which schedules tasks on these machines. Eureka…you have created a small data center!! 😝
Disclaimer: The intent of this article is to help people who are trying to assemble a computer and might come across unforeseen issues. This article might not have encapsulated all the errors but it is a result of one week of assembling and debugging to make the PC as well as the accelerators work. All the commands mentioned in this article were found in websites such as StackOverflow, Medium, NVIDIA, Xilinx, Ubuntu etc. A huge thanks to everyone (every website) who helped provide a fix to the issues faced during the process. Special thanks to Joon Kyung Kim for helping me with this.