RIPL: A Parallel Image Processing Language for FPGAs

Robert Stewart, Kirsty Duncan, Paulo Garcia, Gregory John Michaelson, Deepayan Bhowmik, Andrew Michael Wallace

Research output: Contribution to journalArticle

Abstract

Specialised FPGA implementations can deliver higher performance and greater power efficiency than embedded CPU or GPU implementations for real time image processing. Programming challenges limit their wider use, because the implementation of FPGA architectures at the Register Transfer Level is time consuming and error prone. Existing software languages supported by High Level Synthesis, whilst providing a productivity improvement, are too general purpose to generate efficient hardware without the use of hardware specific code optimisations. Such optimisations leak hardware details into the abstractions that software languages are there to provide, and they require knowledge of FPGAs to generate efficient hardware e.g. by using language pragmas to partition data structures across memory blocks.

This paper presents a thorough account of RIPL (the Rathlin Image Processing Language), a high level image processing Domain Specific Language for FPGAs. We motivate its design, based on higher order algorithmic skeletons, with requirements from the image processing domain. RIPL’s skeletons suffice to elegantly describe image processing stencils, as well as recursive algorithms with non-local random access patterns. At its core, RIPL employs a dataflow intermediate representation. We give a formal account of the compilation scheme from RIPL skeletons to static and cyclo-static dataflow models to describe their data rates and static scheduling on FPGAs.

RIPL compares favourably compared to the Vivado HLS OpenCV library and C++ compiled with Vivado HLS. RIPL achieves between 54 and 191 frames per second (FPS) at 100MHz for four synthetic benchmarks, faster than HLS OpenCV in three cases. Two real world algorithms are implemented in RIPL, visual saliency and mean shift segmentation. For visual saliency algorithm, RIPL achieves 71 FPS compared to optimised C++ at 28 FPS. RIPL is also concise, being 5x shorter than C++ and 111x shorter than an equivalent direct dataflow implementation. For mean shift segmentation, RIPL achieves 7 FPS compared to optimised C++ on 64 CPU cores at 1.1, and RIPL is 10x shorter than the direct dataflow FPGA implementation.
Original languageEnglish
Article number7
JournalACM Transactions on Reconfigurable Technology and Systems
Volume11
Issue number1
DOIs
Publication statusPublished - 14 Mar 2018

Fingerprint

Field programmable gate arrays (FPGA)
Image processing
Hardware
Program processors
Visual languages
Data structures
Productivity
Scheduling
Data storage equipment

Cite this

@article{1d4c74d6dcc74127a5615e840d141016,
title = "RIPL: A Parallel Image Processing Language for FPGAs",
abstract = "Specialised FPGA implementations can deliver higher performance and greater power efficiency than embedded CPU or GPU implementations for real time image processing. Programming challenges limit their wider use, because the implementation of FPGA architectures at the Register Transfer Level is time consuming and error prone. Existing software languages supported by High Level Synthesis, whilst providing a productivity improvement, are too general purpose to generate efficient hardware without the use of hardware specific code optimisations. Such optimisations leak hardware details into the abstractions that software languages are there to provide, and they require knowledge of FPGAs to generate efficient hardware e.g. by using language pragmas to partition data structures across memory blocks.This paper presents a thorough account of RIPL (the Rathlin Image Processing Language), a high level image processing Domain Specific Language for FPGAs. We motivate its design, based on higher order algorithmic skeletons, with requirements from the image processing domain. RIPL’s skeletons suffice to elegantly describe image processing stencils, as well as recursive algorithms with non-local random access patterns. At its core, RIPL employs a dataflow intermediate representation. We give a formal account of the compilation scheme from RIPL skeletons to static and cyclo-static dataflow models to describe their data rates and static scheduling on FPGAs.RIPL compares favourably compared to the Vivado HLS OpenCV library and C++ compiled with Vivado HLS. RIPL achieves between 54 and 191 frames per second (FPS) at 100MHz for four synthetic benchmarks, faster than HLS OpenCV in three cases. Two real world algorithms are implemented in RIPL, visual saliency and mean shift segmentation. For visual saliency algorithm, RIPL achieves 71 FPS compared to optimised C++ at 28 FPS. RIPL is also concise, being 5x shorter than C++ and 111x shorter than an equivalent direct dataflow implementation. For mean shift segmentation, RIPL achieves 7 FPS compared to optimised C++ on 64 CPU cores at 1.1, and RIPL is 10x shorter than the direct dataflow FPGA implementation.",
author = "Robert Stewart and Kirsty Duncan and Paulo Garcia and Michaelson, {Gregory John} and Deepayan Bhowmik and Wallace, {Andrew Michael}",
year = "2018",
month = "3",
day = "14",
doi = "10.1145/3180481",
language = "English",
volume = "11",
journal = "ACM Transactions on Reconfigurable Technology and Systems",
issn = "1936-7406",
publisher = "ACM",
number = "1",

}

RIPL: A Parallel Image Processing Language for FPGAs. / Stewart, Robert; Duncan, Kirsty; Garcia, Paulo; Michaelson, Gregory John; Bhowmik, Deepayan; Wallace, Andrew Michael.

In: ACM Transactions on Reconfigurable Technology and Systems, Vol. 11, No. 1, 7, 14.03.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - RIPL: A Parallel Image Processing Language for FPGAs

AU - Stewart, Robert

AU - Duncan, Kirsty

AU - Garcia, Paulo

AU - Michaelson, Gregory John

AU - Bhowmik, Deepayan

AU - Wallace, Andrew Michael

PY - 2018/3/14

Y1 - 2018/3/14

N2 - Specialised FPGA implementations can deliver higher performance and greater power efficiency than embedded CPU or GPU implementations for real time image processing. Programming challenges limit their wider use, because the implementation of FPGA architectures at the Register Transfer Level is time consuming and error prone. Existing software languages supported by High Level Synthesis, whilst providing a productivity improvement, are too general purpose to generate efficient hardware without the use of hardware specific code optimisations. Such optimisations leak hardware details into the abstractions that software languages are there to provide, and they require knowledge of FPGAs to generate efficient hardware e.g. by using language pragmas to partition data structures across memory blocks.This paper presents a thorough account of RIPL (the Rathlin Image Processing Language), a high level image processing Domain Specific Language for FPGAs. We motivate its design, based on higher order algorithmic skeletons, with requirements from the image processing domain. RIPL’s skeletons suffice to elegantly describe image processing stencils, as well as recursive algorithms with non-local random access patterns. At its core, RIPL employs a dataflow intermediate representation. We give a formal account of the compilation scheme from RIPL skeletons to static and cyclo-static dataflow models to describe their data rates and static scheduling on FPGAs.RIPL compares favourably compared to the Vivado HLS OpenCV library and C++ compiled with Vivado HLS. RIPL achieves between 54 and 191 frames per second (FPS) at 100MHz for four synthetic benchmarks, faster than HLS OpenCV in three cases. Two real world algorithms are implemented in RIPL, visual saliency and mean shift segmentation. For visual saliency algorithm, RIPL achieves 71 FPS compared to optimised C++ at 28 FPS. RIPL is also concise, being 5x shorter than C++ and 111x shorter than an equivalent direct dataflow implementation. For mean shift segmentation, RIPL achieves 7 FPS compared to optimised C++ on 64 CPU cores at 1.1, and RIPL is 10x shorter than the direct dataflow FPGA implementation.

AB - Specialised FPGA implementations can deliver higher performance and greater power efficiency than embedded CPU or GPU implementations for real time image processing. Programming challenges limit their wider use, because the implementation of FPGA architectures at the Register Transfer Level is time consuming and error prone. Existing software languages supported by High Level Synthesis, whilst providing a productivity improvement, are too general purpose to generate efficient hardware without the use of hardware specific code optimisations. Such optimisations leak hardware details into the abstractions that software languages are there to provide, and they require knowledge of FPGAs to generate efficient hardware e.g. by using language pragmas to partition data structures across memory blocks.This paper presents a thorough account of RIPL (the Rathlin Image Processing Language), a high level image processing Domain Specific Language for FPGAs. We motivate its design, based on higher order algorithmic skeletons, with requirements from the image processing domain. RIPL’s skeletons suffice to elegantly describe image processing stencils, as well as recursive algorithms with non-local random access patterns. At its core, RIPL employs a dataflow intermediate representation. We give a formal account of the compilation scheme from RIPL skeletons to static and cyclo-static dataflow models to describe their data rates and static scheduling on FPGAs.RIPL compares favourably compared to the Vivado HLS OpenCV library and C++ compiled with Vivado HLS. RIPL achieves between 54 and 191 frames per second (FPS) at 100MHz for four synthetic benchmarks, faster than HLS OpenCV in three cases. Two real world algorithms are implemented in RIPL, visual saliency and mean shift segmentation. For visual saliency algorithm, RIPL achieves 71 FPS compared to optimised C++ at 28 FPS. RIPL is also concise, being 5x shorter than C++ and 111x shorter than an equivalent direct dataflow implementation. For mean shift segmentation, RIPL achieves 7 FPS compared to optimised C++ on 64 CPU cores at 1.1, and RIPL is 10x shorter than the direct dataflow FPGA implementation.

U2 - 10.1145/3180481

DO - 10.1145/3180481

M3 - Article

VL - 11

JO - ACM Transactions on Reconfigurable Technology and Systems

JF - ACM Transactions on Reconfigurable Technology and Systems

SN - 1936-7406

IS - 1

M1 - 7

ER -