University Research Facility in Big Data Analytics (UBDA) is the first university-level research facility in Big Data Analytics in Hong Kong for cross-disciplinary research collaborations, teaching, and learning, as well as a partnership with industry.
UBDA provides a dedicated, secure, and scalable 24/7 big data platform for building the analytic solutions.
UBDA offers several benefits, including:
- GPUs and CPUs accelerated computing platform
- Consultancy platform to formulate the big data related research problems
- Opportunity to have joint labs, projects, and sponsorships
UBDA is a big data platform to store and analyze your data for finding the hidden patterns, exploring unknown correlation, improving prediction, supporting decision making, recommending services, and products and other analytic solutions.
The main objective of UBDA establishment is to meet the increasing demand for computing resources and expertise in big data analytics. It has significant value in promoting open innovation in all aspects of human, social, and technology development.
UBDA offers an advanced infrastructure including the computing platform, data repository, and data analytics tools, and libraries and provides a platform for cross-disciplinary collaboration among PolyU researchers and external partners to develop, support, service and sustain research into big data analytics.
The main features of the UBDA platform are as follows:
- Running programs with multiple CPU cores among multiple computing nodes.
- A user can execute their own developed applications like C, Fortran, Python, R, etc., in UBDA platform with multiple CPU cores and computing nodes via MPI/MPICH.
- Perform data analytics and Machine Learning with nVidia GPU support.
- The multiple GPU-cards enabled machines are available to perform the data analytics and Machine Learning with the tools, like Tensorflow with the nVidia GPU support.
- Perform data analytics with Big Data Analytic solutions, like Hadoop and Spark.
- A user can perform big data analytics via the provided Apache Hadoop and Spark solution with the support of multiple computing nodes.
Layered Architecture of UBDA Platform
The UBDA platform has five layers: Storage Layer, Networking Layer, Computing Cluster Layer, Application Layer, Service Layer.
The description of each layer of UBDA Platform is as follows:
The bottom layer is the Storage layer, which consists of over 400 TB storage system with parallel file systems to allow researchers to store and process their research data in a reliable environment.
The Network Layer contains two types of networks (InfiniBand and Ethernet). The InfiniBand is a high-speed 100G (EDR) network, which is mainly for internal interconnection of computing nodes with low latency and non-blocking data transfer to support big data analysis. The Ethernet network is mainly for external connection to the campus network and public internet.
Computing Cluster Layer
The Computing Cluster Layer consists of a pool of various types of hardware server nodes, including general CPU nodes, MIC nodes and GPU nodes which can be configured to form the required clusters for different kinds of big data analytics projects, supporting both data-intensive and computation intensive processing tasks. The total initial capacity is planned to be able to support around 123 Tflops.
The Application Layer provides modeling and programming support for developing applications of different areas and is composed of domain-specific models, languages, and algorithms, some of which are represented as software tools and libraries. Researchers can also install their own software to support their research.
The Service Layer is a common management layer providing the interface for accessing and using the underlying big data facility. It allows the users to log in to the UBDA system to manage their profile, access to the allocated resources, install and configure their applications, and manage their jobs through the job scheduler.
The UBDA Platform's Infrastructure is as follows:
Over 400TB Parallel File system supporting Big Data Analytics and High Computing Intensive workloads
High-Speed 100G Internal network with Infiniband EDR
|CPU||Dual CPU sockets and Quad CPU sockets Computing Nodes with 1592 CPU cores and over 9 TB memory|
|GPU||Computing Nodes with nVidia P100 GPU, Over 86000 CUDA cores|
|MIC||Intel Xeon Phi CPU Computing Nodes with 136 CPU cores|
|Big Data Analytics||Apache Hadoop and Spark, Machine Learning/AI: CUDA, TensorFlow with GPU support|
|Programming/Scripting tools||Intel Compilers in “Intel Parallel Studio XE Cluster Edition ”, GNU C, C++, Fortran, Perl, Python, R|
|MPI Support||Intel MPI, OpenMPI, and MPICH2|
|Others||OpenFOAM, ANSYS (Fluent)|