openpower.foundation/content/blog/oregon-state-power9-resourc...

35 lines
6.9 KiB
Markdown

---
title: "Oregon State University Provides Power9 GPU Resources"
date: "2018-10-19"
categories:
- "blogs"
tags:
- "featured"
---
By: Chris Sullivan, assistant director for biocomputing, Oregon State University Center for Genome Research and Biocomputing
The Oregon State University Open Source Lab (OSUOSL) and Center for Genome Research and Biocomputing (CGRB) are excited to now provide access to POWER9 _AC922 Newell_ Systems (8335-GTG).
The AC922 is the newest in the IBM set of AI-based servers used by many of the Oregon State research groups to overcome limits when processing large data sets. To ensure developers can take full advantage of these exciting new machines, we are allowing free access to several of these AC922 setups. We believe these new machines significantly change the way we can address limits in scope and remove bias in the work we currently do. The only limit we see is having access to all the great open source tools available on other platforms -  providing developers with access can help overcome that problem.
The systems accessible to developers are set up with two processor sockets, offering 20-core (with 160 thread) at 3.0 GHz, four Tesla V100 with NVLink GPUs, 1TB of system memory, two 1.6TB CAPI-enabled NVMe SSD Controller and 40G network cards. These are the standard setups we look at for processing data as the high thread count on the CPU side allows us to process quickly along with the ability to do massive deep-learning and AI processing.
## **Using GPUs to Classify Oceans of Data**
For example, we currently take video from various locations in the ocean and process that data to identify all plankton to help [manage ocean health](https://developer.ibm.com/linuxonpower/2018/09/10/using-gpus-classify-oceans-data/). These AC922 machines are able to do all the video processing using FFMPEG with threading on the CPU side, generate images and then directly send the data to the GPUs with NVLink to process the images using a Convolutional Neural Network (CNN) to identify the plankton.
This is only one example where we can treat this machine as a cluster in a box and do all the work starting with video files and ending with CSV output with counts. We have found that the higher the threading the better the return when using the Power9 (as well as the Power8) processors.
Below is a list of processors we have available to test and some quick numbers showing the benefits of threading on these machines.
<table width="488"><tbody><tr><td width="108">&nbsp;</td><td colspan="2" width="127"><strong>EPYC 7601 32-Core 64 thread</strong></td><td colspan="2" width="125"><strong>Xeon E5-2620 8 core 16 thread</strong></td><td colspan="2" width="126"><strong>POWER9 20 core 40 thread</strong></td></tr><tr><td width="108">&nbsp;</td><td width="53">1200</td><td width="74">MHz</td><td width="52">3400</td><td width="73">MHz</td><td width="52">2016</td><td width="73">MHz</td></tr><tr><td width="108">&nbsp;</td><td width="53">seconds</td><td width="74"><strong>s * MHz</strong></td><td width="52">seconds</td><td width="73"><strong>s * MHz</strong></td><td width="52">seconds</td><td width="73"><strong>s * MHz</strong></td></tr><tr><td width="108"><strong>Fibonacci</strong></td><td width="53">76.4435</td><td width="74"><strong>91732.2000</strong></td><td width="52">53.8354</td><td width="73"><strong>183040.3600</strong></td><td width="52">47.7507</td><td width="73"><strong>96265.4112</strong></td></tr><tr><td width="108"><strong>Pi</strong></td><td width="53">154.2242</td><td width="74"><strong>185069.0400</strong></td><td width="52">105.5235</td><td width="73"><strong>358779.9000</strong></td><td width="52">129.1436</td><td width="73"><strong>260353.4976</strong></td></tr><tr><td width="108"><strong>Float math</strong></td><td width="53">41.2044</td><td width="74"><strong>49445.2800</strong></td><td width="52">34.5253</td><td width="73"><strong>117386.0200</strong></td><td width="52">47.7137</td><td width="73"><strong>96190.8192</strong></td></tr><tr><td width="108"><strong>Factorize 1 process</strong></td><td width="53">69.0709</td><td width="74"><strong>82885.0800</strong></td><td width="52">58.8655</td><td width="73"><strong>200142.7000</strong></td><td width="52">71.8679</td><td width="73"><strong>144885.6864</strong></td></tr><tr><td width="108"><strong>Factorize 2 process</strong></td><td width="53">71.9220</td><td width="74"><strong>86306.4000</strong></td><td width="52">48.7508</td><td width="73"><strong>165752.7200</strong></td><td width="52">52.2643</td><td width="73"><strong>105364.8288</strong></td></tr><tr><td width="108"><strong>Factorize 8 process</strong></td><td width="53">22.2354</td><td width="74"><strong>26682.4800</strong></td><td width="52">18.2673</td><td width="73"><strong>62108.8200</strong></td><td width="52">15.2357</td><td width="73"><strong>30715.1712</strong></td></tr><tr><td width="108"><strong>Factorize 16 process</strong></td><td width="53">16.4457</td><td width="74"><strong>19734.8400</strong></td><td width="52">15.1000</td><td width="73"><strong>51340.0000</strong></td><td width="52">11.3186</td><td width="73"><strong>22818.2976</strong></td></tr><tr><td width="108"><strong>Factorize 32 process</strong></td><td width="53">23.9592</td><td width="74"><strong>28751.0400</strong></td><td width="52">23.7475</td><td width="73"><strong>80741.5000</strong></td><td width="52">11.9565</td><td width="73"><strong>24104.3040</strong></td></tr><tr><td width="108"><strong>Factorize 36 process</strong></td><td width="53">24.2955</td><td width="74"><strong>29154.6000</strong></td><td width="52">25.7965</td><td width="73"><strong>87708.1000</strong></td><td width="52">11.6990</td><td width="73"><strong>23585.1840</strong></td></tr></tbody></table>
**Table 1:** Processing time for different calculations showing the lower times for Power9 machines. The big return on this hardware is the threading and this table shows over 2 times faster times on Power9 as we increase threads. Many groups have achieved an order of 4 times greater return when running against the most current x86-based machines.  
The CGRB is focused on working with processor companies that are changing the threading on CPUs and bringing GPUs into play, like IBM and the new AC922. Right now for workloads that take months to complete on x86 boxes we are working with developers to move tools to Power9 so we can take advantage of these returns. Because the value around these machines is centered on threading and AI, we invite developers to come and get free access to a few Power9 and other Power8 machines to port tools and optimize performance.
To get access, simply sign up for an account at the link below and we will get back to you.**OSUOSL GPU Access:** [https://osuosl.org/services/powerdev/request\_gpu/](https://osuosl.org/services/powerdev/request_gpu/)**
AC922 Hardware:** [https://www.ibm.com/us-en/marketplace/power-systems-ac922](https://www.ibm.com/us-en/marketplace/power-systems-ac922)