Enable CAPI2.0 SNAP
Work on github Snap is also a public Github repository. Create a "fork" (Click the "fork" button) on https://github.com/open-power/snap. Keep working on your own snap fork, when it works, submit a pull request to "open-power/snap" and require merging into the public upstream. git clone https://github.com/[YOUR_USERNAME]/snap capi2-bsp is a submodule of snap. It is shown in ".gitmodules" file (this is a hidden file). Please point it to your own capi2-bsp fork. Then git submodule init git submodule update Anyway, make sure that "hardware/capi2-bsp" is the one just generated in last chapter.
SNAP structure On the FPGA side, there are three parts that need to consider when moving to a new FPGA card. They are (a) BSP, (b) snap_core, (c) DDR memory controller (mig). And there are also some components in SNAP need to be updated for a new FPGA card.
Project hierarchy for SNAP
Module snap_core on CAPI2.0 implemented the data path with DMA interface. Buffer interface is not used. The following picture shows the SNAP github repository folders and files.
Repository structure
All of the user-developed accelerators are in "actions" directory. There are already some examples there. Each "action" has its "sw", "hw", "tests", and other sub-directories. The hardware part uses "action_wrapper" as its top. "software" directory includes libsnap, header files and some tools. "hardware" directory is the main focus. "deconfig" has the config files for silent testing purpose, and "scripts" has the menu settings and other scripts. How does SNAP work and what are the files used in each step? make snap_config: The menu to select cards and other options is controlled by "script/Kconfig" make model: This step creates a Vivado project. It firstly calls "hardware/setup/create_snap_ip.tcl" to generate the IP files in use, then calls "hardware/setup/create_framework.tcl" to build the project. About "create_framework.tcl": It adds BSP (board support package). In CAPI1.0, it is also called PSL Checkpoint file (b_route_design.dcp) or base_image. It uses the path pointed to b_route_design.dcp and adds it into the design. In CAPI2.0, it will call the make process in capi2-bsp submodule to generate "capi_bsp_wrap" if it doesn't exist. This step is skipped if "capi_bsp_wrap" is already generated. Then "create_framework.tcl" adds the capi_bsp_wrap (xcix or xci file) into the design. It adds FPGA top files and snap_core files (in hardware/hdl/core). It adds constrain files: in "hardware/setup/[FPGACARD]" or in "hardware/capi2-bsp/[FPGACARD]" It adds user files (in "actions/[ACTION_NAME]/hw"). User's action hardware uses top file named "action_wrapper.vhd" It adds simulation files (in "hardware/sim/core") including simulation top files and simulation models. (If no_sim is selected in snap_config menu, this step is skipped.) After above steps, "hardware/viv_project" is created. Open it with Vivado GUI, and check the design hierarchy. And it will call the selected simulator to compile the simulation model. make image: This step runs synthesis, implementation and bitstream generation. It calls "hardware/setup/snap_build.tcl" and also uses some related tcl scripts to work together. In this step, "hardware/build" will be created and the output products like bit images, checkpoints (middle products for debugging) and reports (reports of timing, clock, IO, utilization, etc.) If everything runs well and timing passes, user will get the bitstream files (in "build/Images" sub directory) to program the FPGA card.
Modifications to snap git repositories For a new FPGA card, the detailed items to update are listed as below. Hardware RTL, setup, simulation Software and tools Testing Publishing The best way is to grep some keywords like "S241" or "AD8K5" under the directories and look for the locations that need modifications. A file ending with "_source", like "psl_fpga.vhd_source", means this file will be pre-processed to generate the output file without "_source" suffix, like "psl_fpga.vhd". There are #ifdef macros or comments like -- only for NVME_USED=TRUE. They help to create a target VHDL/Verilog file with different configurations. Below lists the files to change: snap_config and environmental files Hardware: psl_accel and psl_fpga (top) RTL files Hardware: tcl files for the workflow Hardware: xdc files for IO, floorplan, clock and bitstream settings Hardware: create DDR Memory controller IP (mig) in create_snap_ip.tcl, create DDR memory sim model, and other xdc files Hardware: create_ip, sim model and xdc files for other IPs Software: New card type, register definition Testing: jenkins Readme and Documents Config files to change File name Changes to do scripts/Kconfig adding card to the Kconfig menu. Provide Flash information (size/type/user address) hardware/doc/SNAP-Registers.md SNAP registers for new card - doc hardware/setup/snap_config.sh SNAP registers - setting
RTL/xdc/tcl files to change File name Changes to do hardware/hdl/core/psl_accel_${FPGACARD}.vhd_source specific to cardhardware/hdl/core/psl_accel_types.vhd_sourcespecific to cardhardware/hdl/core/psl_fpga_${FPGACARD}.vhd_source specific to cardhardware/setup/${FPGACARD}/capi_bsp_pblock.xdc specific to cardhardware/setup/${FPGACARD}/snap_${FPGACARD}.xdc specific to cardhardware/setup/${FPGACARD}/snap_ddr4pins.xdc specific to cardhardware/setup/build_mcs.tcldeclare card namehardware/setup/create_framework.tcldeclare card namehardware/setup/create_snap_ip.tcldeclare card name and the IPs in usehardware/setup/flash_mcs.tcldeclare card namehardware/setup/snap_bitstream_post.tcldeclare card namehardware/setup/snap_bitstream_pre.tcldeclare card namehardware/setup/snap_bitstream_step.tcldeclare card namehardware/setup/snap_impl_step.tcldeclare card namehardware/sim/ddr4_dimm_???.svDDR memory model for simulation. Please get the information about how many DDR chips are connected together, the density and data width of each chip, and whether there is one chip is used for ECC (redundant). Take an existing one as a template and modify.hardware/sim/top_capi?0.sv_sourceInstantiate the DDR memory modelhardware/snap_check_psl (Only for CAPI1.0)declare card name
Software files to change File name Changes to do software/lib/snap.cdeclare card namesoftware/tools/snap_find_carddeclare card name + SUBSYSTEM_IDsoftware/include/snap_regs.hSNAP registers - setting
Other files to change File name Changes to do actions/scripts/snap_jenkins.shjenkins tests (optional)defconfig/${FPGACARD}*.defconfigFor silent jenkins testing (optional)README.mdAnnounce a new card is supported
Update capi-utils capi-utils is the third git repository that needs a few modifications. Same as before, fork it, make the modifications and submit a pull request. git clone https://github.com/[YOUR_USERNAME]/capi-utils There is only one file to be modified: "psl-devices". Add a new line, for example 0x1014 0x0665 U200 Xilinx 0x1002000 64 SPIx4 It lists the Subsystem Vendor ID, Subsystem Device ID, Card name, FPGA chip, then it is the "User_image_address" on the flash. For SPI device, size of block is 64Bytes. "SPIx4" is the flash interface type. It may also be "DPIx16" or "SPIx8". "SPIx8" uses two bitstreams so another starting address also needs to be provided. Script "capi-flash-script" needs two input bitstream files (primary and secondary) to program the flash.
Strategy to enable a new card To enable a new card on SNAP, complete following tasks one by one.
Stage 1: Verify PCIe interface Generate capi_bsp_wrap in capi2-bsp. Make modifications to snap git repository as described above. Select an action example without DDR, for example: hls_helloworld. Go through the make model and make image processes and build the bitstream files. Plug the card onto Power9 server and connect a JTAG/USB cable to a laptop. Install Vivado Lab on this laptop (it requires Windows or Linux operating system). Start Vivado Lab tool, open Hardware manager. Power on the server. Soon the FPGA target is recognized by Vivado Lab tool. Program the generated bitstream files (bin or mcs) to the card. On Vivado Lab tool, select the FPGA chip and right-click, choose "Add Configuration Memory Device..." and program the bin or mcs files to the flash. See in picture and Wait it done (It may take 10 minutes). Unplug the JTAG/USB cable, reboot the server. After the server is booted, log into OS, run lspci to see if the card is there. (Usually with Device ID 0x0477). Then download snap, capi-utils, libcxl (from github). Go to snap directory, make apps and run the application. There is another way to replace step 6 to 8 which is called "Fast program bit-file when power on". Prepare the bit file on laptop in advance. Not like bin/mcs files which are for the flash, the bit file is used to program the FPGA chip directly. When the server is powered on, after Vivado Lab sees the FPGA, right click the device, "program device..." and select the bit file immediately. This action only takes about 10 seconds and can be done before skiboot on the server starts to scan PCIe devices. Be aware of the fact that now only FPGA chip is programmed, (the flash memory is still empty or holding old data), so when the server is powered off or reboot the recent programming to FPGA chip will be lost.
Vivado Lab Edition
Add Configuration Memory Device and Program the flash
When installing Vivado Lab, please choose as same version as the Vivado tool which was used to build images. Tips to debugging: Seeing 0477 by lspci is the most important milestone. If not, please check file "/sys/firmware/opal/msglog" to see whether there are link training failed messages. A successful message looks like this, which means this PCIe device has been scanned and recognized. The number followed "PHB#" is the PCIe device identifier in the format of "domain:bus:slot.func": [ 63.403485191,5] PHB#0000:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=CPU1 Slot2 (16x) [ 63.403572553,5] PHB#0000:01:00.0 [EP ] 1014 0477 R:02 C:1200ff ( device) LOC_CODE=CPU1 Slot2 (16x) Check dmesg. Run "dmesg > dmesg.log" and search "cxl" in "dmesg.log" file. A normal output should be look like this: [ 9.301403] cxl-pci 0000:01:00.0: Device uses a PSL9 [ 9.301523] cxl-pci 0000:01:00.0: enabling device (0140 -> 0142) [ 9.303327] cxl-pci 0000:01:00.0: PCI host bridge to bus 0006:00 [ 9.306749] cxl afu0.0: Activating AFU directed mode Today most of the linux kernel versions already include cxl module. Doublecheck it by: modinfo cxl If the PCIe device has been recognized as CAPI, ls /dev/cxl and "afu*" devices should be there. Then application software can open the device like operating an ordinary file. ls /dev/cxl afu0.0m afu0.0s Some other useful commands to check PCIe config (with the right PCIe identifier "domain:bus:slot.func") sudo lspci -s 0000:01:00.0 -vvv It shows the settings coded in Xilinx PCIe core, like Subsystem Device ID: 0000:01:00.0 Processing accelerators: IBM Device 0477 (rev 02) (prog-if ff) Subsystem: IBM Device 0660 Link Speed LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Vital Product Data which was coded in capi_vsec.vhdl Capabilities: [b0] Vital Product Data Product Name: U200 PCIe CAPI2 Adapter Read-only fields: [PN] Part number: Xilinx.U200 [V1] Vendor specific: 0000000000000000 [V2] Vendor specific: 0000000000000000 [V3] Vendor specific: 0000000000000000 [V4] Vendor specific: 0000000000000000 [RV] Reserved: checksum good, 3 byte(s) reserved End And see VSEC and kernel module: Capabilities: [400 v1] Vendor Specific Information: ID=1280 Rev=0 Len=080 <?> Kernel driver in use: cxl-pci Kernel modules: cxl If nothing shows by ls /dev/cxl, check PCIe config space: sudo hexdump /sys/bus/pci/devices/0000\:00\:00.1/config Please pick up the correct PCIe device identifier (0000:00:00.1). Make sure the VSEC is properly linked. If not, go back to check "capi_vsec.vhdl". 0000000 1014 0477 0146 0010 ff02 1200 0000 0000 0000010 000c 0000 2200 0006 000c 1000 2200 0006 0000020 000c 0000 0000 0002 0000 0000 1014 0668 0000030 0000 0000 0040 0000 0000 0000 00ff 0000 0000040 4801 0003 0008 0000 7005 0080 0000 0000 0000050 0000 0000 0000 0000 0000 0000 0000 0000 0000060 7011 0000 0000 0000 0000 0000 0000 0000 0000070 b010 0002 8022 0000 2950 0000 f103 0043 0000080 0000 1103 0000 0000 0000 0000 0000 0000 0000090 0000 0000 0016 0000 0010 0000 000e 0000 00000a0 0003 001e 0000 0000 0000 0000 0000 0000 00000b0 0003 0000 2082 5300 0000 0000 0000 0000 00000c0 0000 0000 0000 0000 0000 0000 0000 0000 * 0000100 0001 1c01 0000 0000 0000 0044 2030 0046 --> next_ptr: 1c0 0000110 0000 0000 e000 0000 0000 0000 0000 0000 0000120 0000 0000 0000 0000 0000 0000 0000 0000 * 00001c0 0019 1f01 0000 0000 0000 0000 0000 0000 --> next_ptr: 1f0 00001d0 0000 0000 0000 0000 0000 0000 0000 0000 * 00001f0 0002 e801 0000 0000 0000 3100 0000 0000 --> e80 (or 400) points to VSEC 0000200 0000 0000 00ff 8000 0000 0000 0000 0000 0000210 0000 0000 0000 0000 0000 0000 0000 0000 * 0000e80 000b 0001 1280 0800 0801 0021 0006 0200 --> VSEC starts from e80 (or 400) 0000e90 0000 b000 0000 0000 0000 0000 0000 0000 0000ea0 0100 0000 0040 0000 0200 0000 0400 0000 0000eb0 0000 0000 0000 0000 0000 0000 0000 0000 * 0000ed0 0000 0000 0000 0000 0000 8000 0000 0000 0000ee0 0000 0000 0000 0000 0000 0000 0000 0000 * 0001000
Stage 2: Verify Flash interface Use capi-utils to program the bitstream files. If it succeeds, it proves that the Flash interface has been configured correctly. After this step, JTAG connector is not needed anymore. Use "capi-flash-script" to program the FPGA bitstreams. The mechanic behind "capi-flash-script" is: There is a flash controller on FPGA (in capi_bsp_wrap), and it connects to PCIe config space. The flash controller exposes four VSEC registers to allow host system to control. They are: Flash Address Register Flash Size Register Flash Status/Control Register Flash Data Port The details are decribed in Coherent Accelerator Interface Architecture, Chapter 12.3, "CAIA Vendor-Specific Extended Capability Structure". So the C file in capi-utils reads FPGA bitstream "bin" file, and writes the data to VSEC "Flash Data Port" register. So the bytes are sent through PCIe, to Flash controller and finally arrive to flash memory on the card.
Stage 3: Verify DDR interface Select another action example (hdl_example with DDR) or hls_memcopy. make model and make sim. Make sure the DDR simulation model works well. make image to generate the bitstream files. Use capi-utils to program the bitstream "bin" file to the card. Run the application to see whether it works. Basically SNAP only implemented one DDR Bank (or channel) while most cards have two to four banks. (N250S+ is one of the rare card which has only one DDR bank). To implement more DDR channels, depending on user's needs, there are two options: the first is to just extend the size of the first bank by adding this second bank on the same DDR memory controller. The other option is to use two (or more) memory controllers in parallel to have a higher throughput. This later option means duplicating the DDR memory controller in place and this will take twice the place in the design. In this case, the action_wrapper also requires change to add the additional DDR ports. For HLS design, another HLS DDR port should be added into "actions/[YOUR_ACTION]/hw/XXX.CPP". As for an opensource project, everyone is welcomed to add your contribution by implementing it and add it to the SNAP design.
Stage 4: Verify Other IO interface This step is decided by the card's capabilities and the specific IOs that the card can provide. Like the second or more DDR channels, user can add their code for other IO interface freely.
Stage 5: Performance Validation Check the result of "snap/actions/hls_memcopy/tests/test_*_throughput.sh" for bandwidth and "snap/actions/hls_latency_eval/test/test*.sh" for latency.
Stage 6: Pressure Test Prepare bitstream files for basic tests, throughput tests, latency tests, max-power tests. Adding image flashing tests, card reset tests and others. Run them intensively.
Cleanup and submit Now a new FPGA card has been enabled to CAPI2.0 SNAP. Cleanup your workspace, check files and submit your work! Submit the pull request of your "capi2-bsp fork" before "snap fork". Assign the reviewer and wait capi2-bsp to be merged into https://github.com/open-power/capi2-bsp master branch Update the submodule pointer to the latest "open-power/capi2-bsp" master and then submit the pull request of your forked snap. Capi-utils is independent. Just create a pull request and assign a reviewer. It can only been merged into master branch after having been reviewed.