安全驾驶的英文驶翻译驶英语怎么说-zhan


2023年4月6日发(作者:灯具厂)

基于FPGA的快速图像处理系统的设计

摘要

我们评估、改进硬件、软件架构的性能,目的是为了适应各种不同的图像

处理任务。这个系统架构采用基于现场可编程门阵列(FPGA)和主机电脑。

PC端安装LabVIEW应用程序,用于控制图像采集和工业相机的视频捕获。

通过USB2.0传输协议执行传输。FPGA控制器是基于ALTERA的CycloneII

芯片,其作用是作为一个系统级可编程芯片(SOPC)嵌入NIOSII内核。该

SOPC集成了CPU,片内、外部内存,传输信道,和图像数据处理系统。采用

标准的传输协议和通过软硬件逻辑来调整各种帧的大小。与其他解决方案作比

较,对其一系列的应用进行讨论。

关键词:软件/硬件联合设计;图像处理;FPGA;嵌入式

1、导言

传统的硬件实现图像处理一般采用DSP或专用的集成电路(ASIC)。然而,

随着对更高的速度和更低的成本的追求,其解决方案转移到了现场可编程门阵

列(FPGA)身上。FPGA具有并行处理的特性以及更好的性能。当一个程序

需要实时处理,如视频或电视信号的处理,机械操纵时,要求非常严格,FPGA

可以更好的去执行。当需要严格的计算功能时,如滤波、运动估算、二维离散

余弦变换(二维DCTs)和快速傅立叶变换(FFTs)时,全是昏君的朝代 FPGA能够更好地

优化。在功能上,FPGA更多的硬件乘法器、更大的内存容量、更高的系统集

成度,轻而易举地超越了传统的DSP。以计算机为基础的成像技术的应用和

基于FPGA的并行控制器,这需要生成一个软硬件接口来进行高速传输。本

系统是一个典型的软硬件混合设计产品,其中包括电脑主机中运行的

LvbVIEW进行成像,配备了摄像头和帧采集,在另一端的Altera的FPGA开

发板上运行图像滤波器和其他系统组件。图像数据通过USB2.0进行高速传输。

各硬件部件和FPGA板的控制部分通过嵌入的NIOSII处理器进行关联,并利

用USB2.0作为沟通渠道。

2、设计工具概述

通过FPGA设计DSP系统往往采用高级别算法开发工具和硬件描述语言,

例如MATLAB。它也可采用具有第三方知识产权的IP内核执行典型的DSP

功能或高速通信协议。在我们的应用中,我们使用的模型设计工具例如

MathworksSimulink来建立DSP。将其生成HDL代码后利用QuartusII与其他

硬件设计文件综合。

SOPC-Builder作为一个工具驻留在Quartus环境中,其作用是将NIOSII

与外部逻辑硬件或标准外设融为一体。SOPC-Builder提供了一个界面结构,

以互联NIOSII和外部存储器、滤波器、以及主机电脑。

3、滤波器的模型和应用设计

这个工作的主要目标就是评估主、协处理器进行图像处理的性能,包括嵌

入式的NIOSII的性能以及电脑主机与FPGA板之间的USB2.0传输性能。现

有FPGA的性能可能会造成图像处理的局限性。为了完成目标,我们建立了

一个典型的图像处理应用,以针对FPGA协处理器。包括一个噪声滤波器和

一个边缘检测器。降噪和边缘检测这两个基本过程运用到各种机器视觉中,如

目标识别,医学成像,下一代的汽车行进路线检测,人员追踪,控制系统等方

面。

我们的噪声模型和边缘检测使用了AlteraDSPBuilderLibrariesin

Simulink。这方面有个例子可以从[11]找到,利用高斯33kernel降噪。边缘

检测利用典型的Prewitt或Sobel滤波器。这些功能可用于合并一系列边缘检

测后减少噪声。图1为滤波器的设计框图。

图1滤波器的设计框图

除了噪声检测和边缘滤波,还有中间处理逻辑关系的模块用于协调NIOS

II数据和控制路径还有滤波模块工作时序。这种中间的硬件结构定义为Avalon

界面[12]。这个接口不能在Simulink环境下仿真,是相当于嵌入系统的Verilog

文件。Avalon执行由一个16位数据输入和输出的路径,相应的读写控制信号

和一个控制接口可以选择中间输出高斯滤波或边缘检测。数据的输入输出在逻

辑模块的帮助下存入FIFO寄存器。每个接收到的图像帧存入外部SDRAM内

存缓冲区,并转换为适用于NIOSII操作的16位数据流的方式。在第五和第六

节将讨论NIOSII编码的问题。传入的图像通过一个简单的二维数字有限脉冲

响应卷积滤波器,处理在33区域范围内相邻像素的灰阶强度。产生缓冲的

原理图如图2所示。

图2

我们假设图像大小为640*480像素。该缓冲电路以同样的方法来为滤波器

提供缓冲空间。如果改变帧的大小,我们需要重新设计和编译。延迟数量取决

于块的大小,延迟深度取决于每行有多少像素。开发板上具有片外RAM因此

不会消耗FPGA逻辑要素。图3从左至右分别为原始图像、高斯滤波图像、

边缘滤波图像。

图3

4、嵌入式系统设计

协处理器执行上述所描述的做为组件的NIOSII处理器。NIOSII处理器在

这里的作用是处理数据流。这种设计经常用于基础工业和学术项目。一旦安装

综合软件,NIOSII将成为Quartus中的一个元件。

DSP-Builder将设计出来的模型转换成HDL编码以便适用于其他硬件组

件。通过综合软件,滤波器可以很容易地集成到SOPC中并与NIOSII结合。

NIOSII软核与其他模块构成了一个完整的系统,包括外部存储器控制器、DMA

通道、以及一个定制的USB高速通信IP核。VGA控制器可以将最终结果输

出至屏幕。诸如此类的功能,可以通过获得开源的IP核来或是第三方公司提

供的评估版IP核来实现。

USB2.0高速接口通过一块扩展板被添加到FPGA母板上。做为系统级的

解决方案,通过Santa-Cruz周边设备连机器可以将扩展子板插入到任何的

Altera母板上。这个子板提供了一个基于PHYCY7C68000的USB2.0收发器。

一个符合UTMI规范的继承USB控制功能的NIOSII系统。第8节我们将对

IP核的实际性能进行评估。图4为FPGA的流程图。图6为FPGA开发板和

图像采集部分。

图4FPGA设计流程图

图5

5、NIOS软核设计:

NIOS配置完毕后,将nios的代码下载。利用C语言来写nios中的代码是

有双重目的的:(a)它控制硬件业务,如硬件之间的DMA传输单元。它还提

供一个编程接口,处理数据通道,通过”API”命令如“open”“read”,“write”

和“close”来控制。(b)它允许系统进行简单的对输入信号进行软件处理而

不是使用专用的硬件来处理。例如,nios指令代码可以用来转换图像阵列成为

适合的一维数据流。

6、Activityflow

根据软件和硬件的活动,其混合结构的功能可概括如下:(a)图像流是从

电脑主机经过usb2.0高速串行总线到达FPGA母板。在下一个章节将会描述

使数据通过usb输入输出的应用程序编程接口。(b)内置的DMA数据总线将

内存中的数据传送到nios中处理然后依靠Avalon传至硬件数字逻辑。(c)

通过硬件加速器来处理数据流。(d)硬件逻辑对图像数据进行滤波后在通过

DMA传送至存储器中。(e)最终结果输出到VGA的数模转换通道上。做为nios

处理器的外围设备,支持DMA传输方式。然而做为VGA接口的数模转换芯片并

不是实时执行所有数据的转换。因此有一个比较可能的做法就是将数据通过

usb返回至电脑主机再做进一步处理成为简单的图像数据。需要指出的是这个

设计不仅仅是为了做为黑盒子那样的专门应用,这是代表了一种设计方法,可

广泛地定制应用。

7、接口设计与应用

基于PC的应用软件和部分视觉系统的的实施适用于各种工业应用。这套

系统包括了windowsXP操作系统、奔腾4处理器、usb2.0高速串行总线控制

器和NI1408PCI图像采集卡。主机的应用程序是基于LabVIEW虚拟仪器,它

用于控制图像采集,并进行初步的图像处理。图6为PC端LabVIEW控制界

面。

图6LabVIEW控制界面

图像采集卡最多可支持5个工业相机进行不同的任务。我们的系统中应用

CCD相机捕捉全帧大小为640*480黑白画面,但是最终采集后的是320*240

的。这样可以生成更小的数据量易于持续传输。LabVIEW主程序与USB之间

的通信使用了API函数和动态链接库。LabVIEW的优势在于其集成了一个图

像处理平台,能够进行快速的图像数据处理或预处理。当FPGA板接收完一

个完整的图像阵列后,系统将图像送至滤波器,经过滤波处理后将数据送至

VGA控制器中的缓存模块。

8、系统性能评估

上面已经建立了一套图像捕获装置。通过发送一些测试数据来测试USB

对pc和FPGA实验板之间接收和发送性能。经过测试我们发现徐文长传翻译 主机和目标板

之间的发送接收有效载荷为307,200字节。当nios的Hal驱动程序版本为1.2

时接收速度达到65Mbits/s,传输速度达到80Mbps。全速传输效率为9秒。

9、与其他系统进行对比

下面我们对比一下其他图像处理的解决方案以及性能和灵活性。为此我们

通过搭建其他解决方案并进行一系列实验来来获取对比数据。我们设计了不同

的滤波器来验证计算复杂性。经过与结果相比在奔腾4处理器和512兆内存的

计算机上结果如图7所示。

图7

10、结论

本文提出了一个融合电脑主机和FPGA的设计方案。并研究了基于此系统

下的图像处理性能。这也代表了一种设计方法,可用于广泛的定制应用。它是

基于FPGA可编程器件并以内嵌nios处理器的形式执行。

Designandevaluationofa

hardware/softwareFPGA-basedsystem

forfastimageprocessing

rosa,*,asb

Abstract

Weevaluatetheperformanceofahardware/softwarearchitecturedesignedto

temarchitectureis

basedonhardwarefeaturingaFieldProgrammableGateArray(FPGA)

EWhostapplicationcontrollingaframe

grabberandanindustrialcameraisusedtocaptureandexchangevideodatawith

thehardwareco-processorviaahighspeedUSB2.0channel,implementedwitha

AacceleratorisbasedonaAlteraCycloneIIchipand

isdesignedasasystem-on-a-programmable-chip(SOPC)withthehelpofan

CsystemintegratestheCPU,

externalandonchipmemory,thecommunicationchannelandtypicalimagefilters

edtransferrates

overthecommunicationchannelandprocessingtimesfortheimplemented

hardware/risonwith

othersolutionsisgivenandarangeofapplicationsisalsodiscussed.

Keywords:Hardware/softwareco-design;Imageprocessing;FPGA;Embedded

processor

uction

ThetraditionalhardwareimplementationofimageprocessingusesDigital

SignalProcessors(DSPs)orApplicationSpecificIntegratedCircuits(ASICs).

However,thegrowingneedforfasterandcost-effectivesystemstriggersashiftto

FieldProgrammableGateArrays(FPGAs),wheretheinherentparallelismresultsin

betterperformance[1,2].Whenanapplicationrequiresreal-timeprocessing,like

videoortelevisionsignalprocessingorreal-timetrajectorygenerationofarobotic

manipulator,thespecificationsareverystrictandarebettermetwhenimplemented

inhardware[3–5].Computationallydemandingfunctionslikeconvolutionfilters,

motionestimators,two-dimensionalDiscreteCosineTransforms(2DDCTs)and

FastFourierTransforms(FFTs)arebetteroptimizedwhentargetedonFPGAs[6,7].

Featureslikeembeddedhardwaremultipliers,increasednumberofmemoryblocks

andsystem-on-a-chipintegrationenablevideoapplicationsinFPGAsthatcan

outperformconventionalDSPdesigns[2,8].

Ontheotherhand,solutionstoanumberofimagingproblemsaremore

flexiblewhenimplementedinsoftwareratherthaninhardware,especiallywhen

theyarenotcomputationalldemandingorwhentheyneedtobeexecuted

er,somehardwarecomponentsarehard

tobere-designedandtransferredonaFPGAboardfromscratchwhentheyare

mponentsareframe

grabbersandmultiple-camerasystemsalreadyinstalledaspartofanimaging

applicationorotherroboticcontrolequipment.

Followingtheaboveconsiderationsweconcludethatitisoftenneededto

integratecomponentsfromanalreadyinstalledcomputer-basedimagingapplication

dedicatedtosomeautomationsystem,withFPGA-basedacceleratorsthatexploit

riticalneedarises

foranembeddedsoftware/hardwareinterfacethatcanallowforhigh-bandwidth

communicationbetweenthehostapplicationandthehardwareaccelerators.

Inthispaperweapplyandevaluatetheperformanceofanexamplemixed

hardware/softwaredesignthatincludesontheonesideahostcomputerrunninga

NationalInstruments(NI)LabVIEWimagingapplication,equippedwithacamera

andaframe-grabber,andontheothersideaAlteraFPGAboard[9]runningan

communicationchanneltransferringimagedatafromthehostcomputertothe

hardwareboardisahigh-speedUSB2.0portbymeansofanembeddedmacrocell.

ThevarioushardwarepartsandperipheralsontheFPGAboardarecontrolledand

ultofthisevaluation

onecanexploretherangeofapplicationssuitableforahost/co-process小学必背古诗75首古诗 or

architectureincludinganembeddedNios-IIprocessorandutilizinganUSB2.0

communicationchannel.

Inthefollowing,wefirstgiveashortaccountofthetoolsweusedforsystem

presentanoverviewoftheparticularimagefilteringapplicationwe

embeddedintheFPGAchipfortheevaluationofthehost/co-processorsystem

ribethemodularinterconnectionofdifferentsystempartsand

inethespeedandframe-sizelimits

y,wecompareour

mixedhost/co-processorUSB-baseddesignintermsofotherarchitecturesandother

communicationsmedia.

toolsoverview

ThedesignofaDSPsystemwithFPGAsoftenutilizesbothhigh-level

algorithmdevelopmenttoolsandhardwaredescriptionlanguage(HDL)

canalsomakeuseofthird-partyintellectualproperty(IP)coresimplementing

typicalDSPfunctionsorhighspeedcommunicationprotocols

[1].Inourapplicationweusemodel-baseddesigntoolslikeTheMathworks

Simulink(basedonMathwork’sMATLAB)withthelibrariesofAltera’s

-BuilderusesmodeldesigntoproduceandsynthesizeHDL

code,whichcanthenbeintegratedwithotherhardwaredesignfileswithina

synthesistool,resentwork,

wedesignedimagefiltercomponentsusingDSP-Builderlibrariesandtheresulting

blockswereintegratedwiththerestofthesysteminQuartus’

System-On-a-Programmable-Chip(SOPC)Builder.

purposeistointegrateanembeddedsoftwareprocessorlikeAltera’sNios-IIwith

ndcustomorstandardperipheralswithinanoverallsystem,often

calledSystem-On-a-Programmable-Chip(SOPC).SOPC-Builderprovidesan

interfacefabricinordertointerconnecttheNios-IIprocessingpathwithembedded

andexte《七步诗》古诗完整版 rnalmemory,thefilterco-processors,otherperipheralsandthechannelsof

-IIapplicationswerewritteninANSI

CandwerecompiledanddownloadedtotheFPGAboardbymeansofAltera’s

NiosIIIntegratedDevelopmentEnvironment(IDE),atooldedicatedtoassemble

poseofNios-IIapplicationsistocontrol

processinganddatastreamingbetweenthecomponentsofthesystemandits

ostsideonemaydevelopacontrolapplicationbymeansof

ab-VIEWsoftwarebyNationalInstruments

Corporation[10],whichprovidesaveryflexibleplatformforimageacquisition,

imageprocessingandindustrialcontrol.

ngandimplementationofthefilterdesign

Themaintargetofthisworkistoevaluatetheperformanceofa

host/co-processorarchitectureincludinganembeddedNios-IIprocessorand

utilizingacommunicationchannelbetweenhostandhardwareboard,likea

k-logicperformedbytheembeddedacceleratorcanbeany

purposewe

builtatypicalimage-processingapplicationinordertotargettheFPGA

reductionandedgedetectionaretwoelementaryprocessesrequiredformost

machinevisionapplications,likeobjectrecognition,medicalimaging,lane

detectioninnext-generationautomotivetechnology,peopletracking,control

systems,lnoiseandedgefilteringusingtheAlteraDSPBuilder

pleofthisprocedurecanbefoundin[11].Noise

reductionisappliedwithaGaussian33kernelwhileedgedetectionisdesigned

unctionscanbeappliedcombinedin

nblockdiagramof

romnoiseandedgefilterblocks,

thereisalsoablockrepresentingtheintermediatelogicbetweentheNio-IIdataand

termediatehardwarefabricfollowsa

specificprotocolreferredtoasAvaloninterface[12].Thisinterfacecannotbe

modeledintheSimulinkenvironmentandisratherinsertedinthesystemasa

examplesimplementingtheAvalonprotocolcanbefoundin

Alterareferencedesignsandtechnicalreports[13].Inbrief,ourAvalon

implementationconsistsofa16-bitdata-inputandoutputpath,theappropriate

ReadandWritecontrolsignalsandacontrolinterfacethatallowsforselection

betweentheintermediateoutputfromtheGaussfilterortheoutputfromtheedge

putandoutputtoandfromthetasklogicblocksisimplemented

imageframewhenreceivedbythehardwareboardisloadedintoanexternal

SDRAMmemorybufferandisconvertedintoanappropriate16-bitdatastreamby

ansferbetweenexternalmemorybuffers

andtheNios-IIdatabusisachievedthroughDirectMemoryAccess(DMA)

operationscontrolledbyappropriateinstructioncodefortheNios-IIsoftprocessor.

Nios-IIcodeflowforthissys耷组词 temisdiscussedinSections5and6.

(Fig.1)

Incomingpixelsareprocessedbymeansofasimple2DdigitalFiniteImpulse

Response(FIR)filterconvolutionkernel,workingonthegrayscaleintensitiesof

eachpixel’sneighborsina3inesarebufferedthrough

delay-linesproducingprimitive3

1delayblockproducesa

neighboringpixelinthesamescanline,whileaz640delayblockproducesthe

meimagesizeof

640e-buffercircuitisimplementedinthesamemannerfor

esolutionisincorporatedintheline-buffer

ngeinframesizeisrequiredwe

berofdelayblocksdependsonthesize

oftheconvolutionkernel,whiledelaylinedepthdependsonthenumberofpixels

comingpixelisatthecenterofthemaskandthelinebuffers

ineswith

considerabledepthareimplementedasdedicatedRAMblocksintheFPGAchip

anddonotconsumelogicalelements.

(Fig.2)

Afterlinebuffering,pipelinedaddersandembeddedmultiplierscalculatethe

.3showsthemodel-designfor

implementationofthe3owninFig.3

model-baseddesigntransfersthenecessaryarithmeticintoaparalleldigital

-consumingcalculations,like

multiplicationsareimplementedusingdedicatedmultipliersavailablein

medium-scaleAlteraFPGAs,liketheCycloneIIchip.

(Fig.3)

Whenthetwofiltersworkincombination,theoutputoftheGaussiankernelis

inputtoa3,thekernel-pixelsare

edetector

.4showsthemodel-design

ar

plicitywecombinehorizontal

andverticaledgedetectionfilteringbysimplyaddingthecorresponding

yimageisproducedbythresholdingtheresultbymeansofa

.6showsaninputandthe

successiveoutputsofthehardwareco-processorfora640480pixelimage.

(Fig.4)

(Fig.5)

(Fig.6)

edsystemdesign

Theco-processorpartsdescribedabovewereimplementedascomponentsofan

s-IIsoftwarecpu

whichisusedherefordatastreamingcontrol,isoftenthebasisforindustrialas

eusedinitsevaluationversionalongwiththe

toolsforassemblinganddownloadinginstructioncode[14].Onceinstalledwithin

thesynthesissoftware,theNiosprocessorbecomesintegratedasalibrary

componentinQuartus’SOPCbuildertool.

DSP-Builderconvertsthemodel-baseddesignintoHDLcodeappropriatefor

terisreadilyrecognizedbythe

synthesissoftwareasaSystem-on-aProgrammable-Chip(SOPC)moduleandcan

odules

thatarenecessaryforacompletesystemaretheNios-IIsoftprocessor,external

memorycontrollers,DMAchannels,andacustomIPperipheralforhighspeed

ntrollercanbeaddedinorderto

suchperipheralfunctionscanbe

foundasopensourcecustomHDLIntellectualProperty(IP)orasevaluationcores

providedbyAlteraorthirdpartycompanies.

USB2.0highspeedconnectivi少年中国说梁启超朗诵 tyisaddedtotheFPGAboardbymeansofa

daughter-cardbySystemLevelSolutions(SLS)Corporation[15].Itcanbeaddedto

ughter-card

2.0

IPcorecompliantwithTransceiverMacrocellInterface(UTMI)

ed

evaluationversionsoftheIPcoreandpresentpracticaltransmitandreceiveratesin

AchipalongwiththeembeddedNios-IIprocessorisalwaysa

slavedeviceinthecommunicationviatheUSBchannel,whilethehostcomputeris

alwaysthemasterdevice.

AblockdiagramofthehardwaresystemimplementedontheFPGAboardis

nneltothehostcomputerisalsoshown.

TheembeddedsystemisassembledbymeansoftheSOPC-Buildertoolofthe

synthesissoftware,byselectinglibrarycomponentsanddefiningtheir

einggeneratedbySOPCBuilder,thesystemcanbeinserted

y

additionalcomponentsthatarenecessaryarePLLsforNiosandmemory

esynthesizeandsimulatethedesignbymeansofthetools

describedinSection2,wetargetaCycloneII2C35F672C6FPGAchip

boardalongwiththeperipheralUSB2.0extensionisshowninthepictureofFig.8.

Theboardalsofeaturesexternalmemoryandseveraltypicalperipheralcircuits.

ogramming

Afterdeviceconfiguration,weassembleanddownloadinstructioncodeforthe

Nios-IIprocessor,accordingtothespecificationsofNios’HardwareAbstraction

Layer(HAL).Theinterestedreadercanlookfordetailsinthereferencemanuals

forNios-IIanditsdedicatedIntegratedDevelopmentEnvironment(Nios-IIIDE)

[16].NioscodeisoriginallywritteninANSICandhasatwofold

purpose:

(a)Itcontrolshardwareoperations,likeDMAtransfersbetweenhardware

offersaprogramminginterfaceforhandlingdatachannels,

withAPIcommandslike‘‘open’’,‘‘read’’,‘‘write’’and‘‘close’’.

(b)Itallowsthesystemtoperformsimplesoftwareoperationsontheinput

datainsteadofusingdedicatedhardwarestagesforsuchprocessing.

Forexample,Niosinstructioncodecanbeusedtoconvertimagearrays

intoappropriateone-dimensionaldatastreams.

eforthe

particulardesignopenstheUSBportandwaitstoreadthetransmittedpixelarray.

Itthencontrolspixelstreamingfrominputtofinaloutputasdescribedinthe

followingsection.

tyflow

Theflowofsoftwareandhardwareactivitythatsupportstheoperationofa

systemdesignedaccordingtothemixedarchitecturedetailedinthisworkcanbe

outlinedasfollows:

(a)Animagestreamistransferredfromthehostcomputertothehardware

boardforprocessingthroughthehighspeedUSB2.0communication

ribedinthenextsectionthehostapplication

communicateswiththeUSBportusingAdvancedProgramming

Interface(API)callsfordatainputandoutput.

(b)AccordingtoNiosprocessorinstructions,embeddedDMAhardware

operationstransferdatafrommemorytotheNiosdatapathandintothe

hardwaretasklogicbymeansoftheAvaloninterface.

(c)Thedatastreamisprocessedthroughthehardwareaccelerator.

(d)DMAoperationsstreamthefilteredoutputresultfromthehardware

task-logicbacktoexternalmemory(seeFig.7).

(e)ThefinalresultisoutputtotheonboardVGAdigital-to-analogchannel,

whichisperipheraltotheNios-IIprocessorandissupportedby

r,adigitaltoanalog

converterforaVGAportisnotalwaysimplementedonadevelopment

board,soapossiblealternativeistheresultingbinaryimagetobe

channeledbacktothehostcomputerviatheUSBconnectionfor

essmentofthedesign

performancethatispresentedinSection8includesalltheabove

portanttonotethattheabovesystemisnotmerely

ablack-boxcustomdesignimplementedforaparticularapplication,

butrepresentsadesignmethodologythatcanbeusedforawiderange

hnicaldetailsoftheoperations

abstractedabovearewelldocumentedfortheuseroftheparticular

developmentplatform,sothateveryaspectofthedesigncanbetested

etailedanalysisofthesystemdevelopment

techniquesishoweveroutofthescopeofthepresentarticle.

-basedsetupandapplication

Onthehostpartavisionsystemisimplemented,appropriateforaspectrumof

tcomputerisaWindowsXPPentiumIVfeaturing

hostapplicationisaLabVIEWvirtualinstrument(VI)thatcontrolstheframe

grabberandperformsinitialprocessingofthecapturedimages(seeFig.9).The

framegrabbercansupportuptofiveindustrialcamerasperformingdifferenttasks.

InoursystemtheVIapplicationcapturesafullframesized640480pixelsfroma

CCIRanalogB&WCCDcamera(SamsungBW-2302).Itmaythenreducethe

imageresolutionapplyinganimageextractionfunction,downto320240pixels.

Itcanproduceevensmallerframesinordertotradesizefortransferrate.

TheLabVIEWhostapplicationcommunicateswiththeUSBinterfaceusing

APIfunctionsandaDynamicLinkLibrary(DLL)andtransmitsanumericarray

ntageofusingLabVIEWasabasisfor

developingthehostapplicationisthatitincludesaVISIONlibraryabletoperform

fastmanipulationofimagedataorapreprocessingoftheimageifitisnecessary.

Whenthereceptionoftheimagearrayiscompletedatthehardwareboardend,

thesystemloadstheimagedatatothefiltercoprocessorandsendstheoutputtothe

VGAcontrollerviaSRAMmemory(seeFig.7).Alternativelytheoutputcanbe

sentbacktothehostapplicationbymeansofa‘‘write’’

procedureisrepeatedwiththenextcapturedframe.

tionofthesystemperformance

Theabovesetupwastestedwithvariouscaptureratesandframeresolutions.

Usingseveraltest-versionsoftheSLSUSB2.0megafunctionwemeasuredreceive

(rx)andtransmit(tx)throughputbetweenthehostPCandthetargethardware

payloadof307,thatusing

theNiosIIHALdriver,thelatestevaluationversion1.2oftheIPcoretransfersin

highspeedoperation65Mbitspersecondinreceivemodeandabout80Mbpsin

r,datatransferrate

fromthehostcomputertothehardwareboardisonlyonefactorthataffectsthe

performanceofanimageprocessingsystemdesignedaccordingtoa

host/co-processorarchitecture,realsosoftware

issuestobetakenintoaccountbothatthehostendandattheNios-IIembedded

mple,framecapturingandserializationpriortotransferring

therhand,the

Nios-IIembeddedprocessorcontrolsthedataflowfollowinginstructioncode

downloadedtoembeddedmemory,,theoverall

LabVIEWsoftwareallowsforanefficienthandlingofarraystructuresandalso

possessesimagegrabbingandvisiontoolsthatreduceprocessingtimeonthehost

theabovesoftwarelimitations,therearealsohardwareissuesrelated

toanintegratedSystem-on-a-Programmable-chip,likethetimeneededforDirect

MemoryAccess(DMA)formanceofthe

hardwareboardisdividedintotheprocessingratesofthehardwarefilter

co-processorandtheperformanceoftherestofsystem,likeexternalmemory

condfactoraddsanoverheaddepending

uatethe

performanceoftheproposedarchitecturetakingintoaccountandmeasuringwhen

possiblethefollowingdelaytimes:

(a)Timetograbanimageframeandserializeit.

(b)TransfertimeovertheUSB2.0channel.

(c)Nominaltimeneededbytheco-processorfilterinordertoprocessthe

imageframe.

(d)Overheadtimeneededfordataflowandcontrolintheintegrated

hardwaresystem.

Table1summarizestheresponsetimeoftheaboveoperationsandr辛弃疾词醉里挑灯看剑 eports

lethesystemresultsinapractical

andstablevideorateof20framespersecondatanimageresolutionof320240

rly,largerframeswithdimensions640480pixelscanbetransferred

eboard

isclockedat100MHzthehardwareimagefilterprocessesa640480pixelsframe

inaminimumof3.1mswhileDMAtransfersandothercontrolflowadd

meis

transferredfromthehostcomputertothehardwareboardinapproximately26ms.

OtherpossiblelatenciesincludeproperdatamanipulationbytheNios-IIinstruction

codeanddependontheframesize.

Asaconclusion,delaysaredividedbetweensoftwareandhardwareprocedures

dwarefilterco-processordoesnotsubstantially

dNios-IIprocessingtimeand

transferratesoverthecommunicationchannelcanbeabottleneckforlargeframes.

SinceopencoreUSB2.0embeddedtechnologyisstilldevelopingonemayexpect

2summarizesthe

hardwarerequirementsfortheoverallsystemweimplementedinthisstudy(see

alsoFig.7).Thetablereportsnumberoflogicalelementsandmemorybitsneeded

toimplementthefunctionspresentedaboveinamediumFPGAchip,theAltera

lpotentialofthischipis33000Logicelements

lealsoreportsclockfrequenciesforthesoft

wereimplementedbymeansof

twoPhaseLockedLoops(PLLs).

(Fig.7)

isonwithothersystems

Inthefollowing,wepresentacomparisonwithotherimageprocessing

solutions,rtoestablishsome

numericalcomparisonbetweenthepresentedarchitectureandapurelysoftware

solution,hesizeddesignswith

variationsofthebasicfilteroperations,introducingdifferentdegreesof

implementedinhardwareaSumofAbsolute

Differences(SAD)algorithmfordensedepthmapcalculations,whichisbasedon

correlationoperationsandismuchmoreintensivecomputationallythansimple

aredtheresultswiththesamealgorithmsimplementedin

softwareandrunningonaPentiumIVprocessorat3GHzwith512MBRAM.

tware

resultswereattainedbyprogramminganalyticallythecorrespondingproceduresin

NI’sLabVIEWlanguage,

usedpre-capturedAVIvideosequencesandprocessedeachframewiththesame

esolutionsof

320240and640480pixelswereassessedseparately,sincetheyneeddifferent

transferringtimestoourhardwareco-processor.

Table3showsframeratesforprocessingvideofilesofbothresolutionsasa

irstcolumnofTable3,the

ationalcomplexityin

thesecondcolumnismeasuredasmultiplesofnN,wherenisthesizeofthe

convolutionkernel,equalto33,andNisatotalof320

caseoftheSADalgorithmcomplexityisconsideredtobetheproductnND,

wherenisthesizeofthecomparisonwindowappliedonanimageofNpixelsand

foratotaldisparityrangeofDpixels[17].Ourhardwareversionofthisalgorithm

rd

andfourthcolumnofTable3givetotalframerateforeachoperationwhen

implementedinpuresoftwareandinourproposedarchitecture,

notethattheframespersecond(fps)valueinthecaseoftheSADalgorithmrefers

toasingletransmittedframeinsteadofthestereopairinordertohavearesult

hstereo.

Table3Frame-ratecomparisonbetweenPC-basedimplementationsand

host/coprocessordesignsforanumberofimageprocessingoperationsimagesthe

wholecycleoftransmitting,processingandprojectingonamonitorscreenis14

fps(resolution320240),whenworkingwiththecurrentversionoftheSLS

USB2.0megafunction.

ThelastcolumnofTable3givestotallogicelementsrequiredformapping

uiredhardware

resourcesincreasewithincreasingnumberofstagesinthehardwarepipeline.

However,resourcesdonotnecessarilyincreasewithincreasingcomputational

sonisthatthesamehardwarestructurecaninprinciplebe

adequateforbothsmallandlargeframes,sincethesameparallelcomputationsare

neededperpixelandperclockcycle,aspixelsstreamintothesystempipeline.

Framesofincreasedresolutiononlyneeddeeperlinebuffers,asshowninFig.2.

Hence,-chip

memoryrequirementsfortheimplementationsofTable3varybetween185,000

and230,000memorybitsoutofatotalof480,reimplementationsin

Table3requireanominalprocessingtime0.77msforevery320240frameand

3.1msforevery640480frame(seealsoTable1).SADneedstwicethattimein

onaltimesfordatatransferringandcontrolare

asdiscussedinSection8.

.10theprocessing

r昭昭牵牛星皎皎河汉女 esultsofavideofilewithframeresolution320

hardware/.10bour

hardwareprocessingresultsareshownwithsquaresandrefertoframeresolutionof

640mbs,atthelowerpartofbothdiagramsrepresentframerates

achievedbypuresoftwarerunningonthehostcomputer.

ItcanbeseenthattheperformanceofthehostPCwithoutthehardware

co-processorisgoodinthecaseofsimpleprocessesbut,asitisexpected,itfalls

formanceofour

hardware/softwaremixedarchitectureisalmostconstantascomplexityincreases,

becauseinallcasesprocessingiscompletedinasinglepass,t

decreaseisduetopre-processingsteps,accordingtothetypeofoperation.

However,theperformanceofoursystemisdependentonframesize,asisshown

uctionofframerateinthe

caseofincreasedresolutionisclearlyattributedtoincreasedpayloadduringUSB

transfersandDMAtransactionsfromboardmemorytotheFPGAsystem.

Bybetteroptimizingthesoftwarealgorithmsorbyadoptingfasterprocessors

inthefuture,theperformanceofthesoftwaresystemcanbeenhanced,howeverthe

gapwillstillpersist,withincreasingcomputationaldemands.

Thebottomlineoftheaboveanalysisisclearlythatthesuitabilityofthe

tterjustifiedinthe

caseofalgorithmswithincreasedcomputationalcomplexity,wherethePCalone

gamoderateframeresolution

helpsourdesigntorespondatadecentrate,whichhoweverisstilllessthantrue

videorate.

Dependingonthesystemrequirementsotherapproachesmayalsobeused.

UsingavideoinputcardliketheAlteraDCVideo-TVP5146Nonecaninputvideo

datatotheFPGAandperformarangeofimagefunctions,likesimplefiltering,

colorprocessing,recastingtodifferentvideoformats,compression,etc.[18].In

thiscaseonecanbypassthecomputer-basedframegrabberandavoidtheuseofa

stheprincipalcostoflosingtheflexibilityinherentin

thesoftwarepartofthesystemaswellaslosingthePC-basedcontroloverthe

camerasandframegrabber,buttheresultisanall-hardwaresystemthatexcelsin

singrateisinprinciplelimitedbytheframegrabber,whichin诗经最美的十首爱情诗

thecaseofNTSCcompositeinputsignalis30fpswhileforPALvideoinputitis

25fps.

Similarly,processingatvideo-ratemaybeobtainedbyusingtheASICs

approach,whichusuallyincorporatesallpartsinanall-hardwarecustomsystem.

Imageprocessingimplementationsofmediumandhighcomplexityappearinthe

literatureandmostofthemcanmanipulateimagesinrealtime[19,20].Theycan

reachaframerateof30framespersecondorevenmoreinsomecustomsystems.

InsomecasestheycanalsoincorporateaCCDinterface[21].However,ASIC

implementationsarecomplexandexpensivesystemsthatsufferintermsof

veslowdesigncycleandarecertainlyawayfromtheplugand

playapproachadoptedinthepresentarticle.

PureDSP-baseddesignssupportthousandsofMIPSandarecomparablein

performancewithdesktopPCs,so

urelysoftware-basedtheyaremuchmore

tion,thecontrolflowpartofan

er,theycaneasily

incorporatevideoinputandoutputchannelswithlittleadditionalhardware[22].

However,justasisthecasewithordinaryserialcomputerstheycannotperform

computationallydemandingtasksatahighvideo-rate[23].

TheuseofFPGAsinhardwareimplementationscanpartlybridgethe

flexibilitygapbecauseFPGAsarere-programmableandhavearapiddesigncycle.

ComparedwithDSPs,systemsimplementedusingFPGAsaremoreefficient,

r,

buildingwithFPGAs,wealsohavetoimplementinhardwareprotocolsand

interfacesforvideoI/O.

Inthehost/co-processorarchitectureappliedinourdesign,theFPGAabsorbs

sectionsofthealgorithmthatarecostlyinsoftware,whileNios-IIexecutesflow

ghNiosisamediumperformanceprocessoritcangreatlyhelpby

implementingthecontrolpath,aswellassimplepre-processingandpost

ystem,anapplicationcanfurtherbepartitionedbetween

tpartinterfaceswitha

lsoexecutepartsofthe

algorithmthataresoftwarefriendly.

Hardwaretasklogicoperatesasanacceleratorofanycomputationally

demandingfunctionandcanbeaddedintheSOPCsystemasalibrarycomponent.

Itcanalsobere-used,sharedandimportedinanySOPCsystemthatfollowsthe

haracteristicstestifyofthehighlevelofflexibility

inherentinthisdesign.

Asshownfromthediscussionabovethepresentedsystemperformsbetterthan

ageneralpurposecomputerorapureDSPdesigninthecaseofheavy

computationaltasks,ssefficient

thanapurehardwaresystem,sincesuchsystemscanperformat30framesper

ritismoreflexibleandhasaclearpotentialtoevolve.

tion,this

particularIPcorecomplieswithAltera’sOpenCorePlusprogram[24]thatoffers

ytheproliferationof

USB2.0communicationportsinmoderncomputersandvideoequipmentmakes

thisproposedchannelasuitablecommoditychoicewhenbuildinga

hardware/softwarearchitecturebasedonaFPGAco-processor.

sions

Thedesignofageneralhardware/softwaresystembasedonahostcomputer

temperformance

wasstudiedinthecaseofimageprocessingalgorithmsandrepresentsadesign

sed

onahardwareboardfeaturingamediumFPGAdevice,whichisconfiguredasa

SystemOnaProgrammableChip,withaNios-IIsoftwareprocessorinthesystem

ardwarecomponentsarelocalandon-chipmemoryanda

UTMI-compliantmacrocellbySLSCorporation,allowingfastcommunication

withahostcomputer.

AnintegratedsystemcontrolledbyaNios-IIsoftwareprocessorprovides

s

optimumchoiceofhardware/softwarefeaturesmayacceleratethesystem

ioningamachinevisionapplicationbetweenahostcomputer

andahardwareco-processormaysolveanumberofproblemsandcanbeappealing

inanacademicorindustrialenvironmentwherecompactnessandportabilityofthe

systemisnotofprimalimportance.

TheprocessingandtransferratesreportedinTables1and3canbesufficient

sly,thesystem

isbetterjustifiedinthecaseofverydemandingimageprocessingcomputations

sksare

pointpatternmatchingforbiometricapplications[25]orblockmatchingfor

findingcorrespondencesbetweenimagesofastereopair[17].Thelaterapplication

isusuallydependedonmanycamerasandhassignificantcomputationaldemands.

Camerasandframegrabbersarebettercontrolledbyacomputerwithcustom

softwaresinceimagegrabbingandtransmittingprotocolsarenoteasilytransferred

therhand,thereal-timecalculationofdisparities

fromtwoormorecamerasisbetterperformedbyhardware[26–28].

Alternativelythehost/co-processorNios-IIbasedarchitecturecanbeusedfor

implementingandtestinginhardwareavarietyofimage-processingacademic

designs,

canalsobeusedtomanageimageprocessinginanumberofeducational

applicationsandstudentmachine-visionexercises[29].

Futureworkcanincludemoretestsofcomplexalgorithmsorcomparisonsof

thepresentedarchitecturewithfuturehighthroughputEthernetchannelorPCI

channelforhost/,itiscertainlymeaningfulto

generationUSB2.0macrocellsandNiossoftprocessorscanstillincreasetherange

ofapplicationsoftheproposeddesign.

Acknowledgements

ThisworkwasconductedincommunicationwiththesupportteamofSystem

LevelSolutionsCorporation,whoprovidedsuccessiveevaluationversionsofthe

toespeciallythanksoftwaremanagerTejas

Vaghelaforhisconstantsupportandadvice.

更多推荐

lvb是什么意思在线翻译读音例句