安全驾驶的英文驶翻译驶英语怎么说-zhan
2023年4月6日发(作者:灯具厂)
基于FPGA的快速图像处理系统的设计
摘要
我们评估、改进硬件、软件架构的性能,目的是为了适应各种不同的图像
处理任务。这个系统架构采用基于现场可编程门阵列(FPGA)和主机电脑。
PC端安装LabVIEW应用程序,用于控制图像采集和工业相机的视频捕获。
通过USB2.0传输协议执行传输。FPGA控制器是基于ALTERA的CycloneII
芯片,其作用是作为一个系统级可编程芯片(SOPC)嵌入NIOSII内核。该
SOPC集成了CPU,片内、外部内存,传输信道,和图像数据处理系统。采用
标准的传输协议和通过软硬件逻辑来调整各种帧的大小。与其他解决方案作比
较,对其一系列的应用进行讨论。
关键词:软件/硬件联合设计;图像处理;FPGA;嵌入式
1、导言
传统的硬件实现图像处理一般采用DSP或专用的集成电路(ASIC)。然而,
随着对更高的速度和更低的成本的追求,其解决方案转移到了现场可编程门阵
列(FPGA)身上。FPGA具有并行处理的特性以及更好的性能。当一个程序
需要实时处理,如视频或电视信号的处理,机械操纵时,要求非常严格,FPGA
可以更好的去执行。当需要严格的计算功能时,如滤波、运动估算、二维离散
余弦变换(二维DCTs)和快速傅立叶变换(FFTs)时,全是昏君的朝代 FPGA能够更好地
优化。在功能上,FPGA更多的硬件乘法器、更大的内存容量、更高的系统集
成度,轻而易举地超越了传统的DSP。以计算机为基础的成像技术的应用和
基于FPGA的并行控制器,这需要生成一个软硬件接口来进行高速传输。本
系统是一个典型的软硬件混合设计产品,其中包括电脑主机中运行的
LvbVIEW进行成像,配备了摄像头和帧采集,在另一端的Altera的FPGA开
发板上运行图像滤波器和其他系统组件。图像数据通过USB2.0进行高速传输。
各硬件部件和FPGA板的控制部分通过嵌入的NIOSII处理器进行关联,并利
用USB2.0作为沟通渠道。
2、设计工具概述
通过FPGA设计DSP系统往往采用高级别算法开发工具和硬件描述语言,
例如MATLAB。它也可采用具有第三方知识产权的IP内核执行典型的DSP
功能或高速通信协议。在我们的应用中,我们使用的模型设计工具例如
MathworksSimulink来建立DSP。将其生成HDL代码后利用QuartusII与其他
硬件设计文件综合。
SOPC-Builder作为一个工具驻留在Quartus环境中,其作用是将NIOSII
与外部逻辑硬件或标准外设融为一体。SOPC-Builder提供了一个界面结构,
以互联NIOSII和外部存储器、滤波器、以及主机电脑。
3、滤波器的模型和应用设计
这个工作的主要目标就是评估主、协处理器进行图像处理的性能,包括嵌
入式的NIOSII的性能以及电脑主机与FPGA板之间的USB2.0传输性能。现
有FPGA的性能可能会造成图像处理的局限性。为了完成目标,我们建立了
一个典型的图像处理应用,以针对FPGA协处理器。包括一个噪声滤波器和
一个边缘检测器。降噪和边缘检测这两个基本过程运用到各种机器视觉中,如
目标识别,医学成像,下一代的汽车行进路线检测,人员追踪,控制系统等方
面。
我们的噪声模型和边缘检测使用了AlteraDSPBuilderLibrariesin
Simulink。这方面有个例子可以从[11]找到,利用高斯33kernel降噪。边缘
检测利用典型的Prewitt或Sobel滤波器。这些功能可用于合并一系列边缘检
测后减少噪声。图1为滤波器的设计框图。
图1滤波器的设计框图
除了噪声检测和边缘滤波,还有中间处理逻辑关系的模块用于协调NIOS
II数据和控制路径还有滤波模块工作时序。这种中间的硬件结构定义为Avalon
界面[12]。这个接口不能在Simulink环境下仿真,是相当于嵌入系统的Verilog
文件。Avalon执行由一个16位数据输入和输出的路径,相应的读写控制信号
和一个控制接口可以选择中间输出高斯滤波或边缘检测。数据的输入输出在逻
辑模块的帮助下存入FIFO寄存器。每个接收到的图像帧存入外部SDRAM内
存缓冲区,并转换为适用于NIOSII操作的16位数据流的方式。在第五和第六
节将讨论NIOSII编码的问题。传入的图像通过一个简单的二维数字有限脉冲
响应卷积滤波器,处理在33区域范围内相邻像素的灰阶强度。产生缓冲的
原理图如图2所示。
图2
我们假设图像大小为640*480像素。该缓冲电路以同样的方法来为滤波器
提供缓冲空间。如果改变帧的大小,我们需要重新设计和编译。延迟数量取决
于块的大小,延迟深度取决于每行有多少像素。开发板上具有片外RAM因此
不会消耗FPGA逻辑要素。图3从左至右分别为原始图像、高斯滤波图像、
边缘滤波图像。
图3
4、嵌入式系统设计
协处理器执行上述所描述的做为组件的NIOSII处理器。NIOSII处理器在
这里的作用是处理数据流。这种设计经常用于基础工业和学术项目。一旦安装
综合软件,NIOSII将成为Quartus中的一个元件。
DSP-Builder将设计出来的模型转换成HDL编码以便适用于其他硬件组
件。通过综合软件,滤波器可以很容易地集成到SOPC中并与NIOSII结合。
NIOSII软核与其他模块构成了一个完整的系统,包括外部存储器控制器、DMA
通道、以及一个定制的USB高速通信IP核。VGA控制器可以将最终结果输
出至屏幕。诸如此类的功能,可以通过获得开源的IP核来或是第三方公司提
供的评估版IP核来实现。
USB2.0高速接口通过一块扩展板被添加到FPGA母板上。做为系统级的
解决方案,通过Santa-Cruz周边设备连机器可以将扩展子板插入到任何的
Altera母板上。这个子板提供了一个基于PHYCY7C68000的USB2.0收发器。
一个符合UTMI规范的继承USB控制功能的NIOSII系统。第8节我们将对
IP核的实际性能进行评估。图4为FPGA的流程图。图6为FPGA开发板和
图像采集部分。
图4FPGA设计流程图
图5
5、NIOS软核设计:
NIOS配置完毕后,将nios的代码下载。利用C语言来写nios中的代码是
有双重目的的:(a)它控制硬件业务,如硬件之间的DMA传输单元。它还提
供一个编程接口,处理数据通道,通过”API”命令如“open”“read”,“write”
和“close”来控制。(b)它允许系统进行简单的对输入信号进行软件处理而
不是使用专用的硬件来处理。例如,nios指令代码可以用来转换图像阵列成为
适合的一维数据流。
6、Activityflow
根据软件和硬件的活动,其混合结构的功能可概括如下:(a)图像流是从
电脑主机经过usb2.0高速串行总线到达FPGA母板。在下一个章节将会描述
使数据通过usb输入输出的应用程序编程接口。(b)内置的DMA数据总线将
内存中的数据传送到nios中处理然后依靠Avalon传至硬件数字逻辑。(c)
通过硬件加速器来处理数据流。(d)硬件逻辑对图像数据进行滤波后在通过
DMA传送至存储器中。(e)最终结果输出到VGA的数模转换通道上。做为nios
处理器的外围设备,支持DMA传输方式。然而做为VGA接口的数模转换芯片并
不是实时执行所有数据的转换。因此有一个比较可能的做法就是将数据通过
usb返回至电脑主机再做进一步处理成为简单的图像数据。需要指出的是这个
设计不仅仅是为了做为黑盒子那样的专门应用,这是代表了一种设计方法,可
广泛地定制应用。
7、接口设计与应用
基于PC的应用软件和部分视觉系统的的实施适用于各种工业应用。这套
系统包括了windowsXP操作系统、奔腾4处理器、usb2.0高速串行总线控制
器和NI1408PCI图像采集卡。主机的应用程序是基于LabVIEW虚拟仪器,它
用于控制图像采集,并进行初步的图像处理。图6为PC端LabVIEW控制界
面。
图6LabVIEW控制界面
图像采集卡最多可支持5个工业相机进行不同的任务。我们的系统中应用
CCD相机捕捉全帧大小为640*480黑白画面,但是最终采集后的是320*240
的。这样可以生成更小的数据量易于持续传输。LabVIEW主程序与USB之间
的通信使用了API函数和动态链接库。LabVIEW的优势在于其集成了一个图
像处理平台,能够进行快速的图像数据处理或预处理。当FPGA板接收完一
个完整的图像阵列后,系统将图像送至滤波器,经过滤波处理后将数据送至
VGA控制器中的缓存模块。
8、系统性能评估
上面已经建立了一套图像捕获装置。通过发送一些测试数据来测试USB
对pc和FPGA实验板之间接收和发送性能。经过测试我们发现徐文长传翻译 主机和目标板
之间的发送接收有效载荷为307,200字节。当nios的Hal驱动程序版本为1.2
时接收速度达到65Mbits/s,传输速度达到80Mbps。全速传输效率为9秒。
9、与其他系统进行对比
下面我们对比一下其他图像处理的解决方案以及性能和灵活性。为此我们
通过搭建其他解决方案并进行一系列实验来来获取对比数据。我们设计了不同
的滤波器来验证计算复杂性。经过与结果相比在奔腾4处理器和512兆内存的
计算机上结果如图7所示。
图7
10、结论
本文提出了一个融合电脑主机和FPGA的设计方案。并研究了基于此系统
下的图像处理性能。这也代表了一种设计方法,可用于广泛的定制应用。它是
基于FPGA可编程器件并以内嵌nios处理器的形式执行。
Designandevaluationofa
hardware/softwareFPGA-basedsystem
forfastimageprocessing
rosa,*,asb
Abstract
Weevaluatetheperformanceofahardware/softwarearchitecturedesignedto
temarchitectureis
basedonhardwarefeaturingaFieldProgrammableGateArray(FPGA)
EWhostapplicationcontrollingaframe
grabberandanindustrialcameraisusedtocaptureandexchangevideodatawith
thehardwareco-processorviaahighspeedUSB2.0channel,implementedwitha
AacceleratorisbasedonaAlteraCycloneIIchipand
isdesignedasasystem-on-a-programmable-chip(SOPC)withthehelpofan
CsystemintegratestheCPU,
externalandonchipmemory,thecommunicationchannelandtypicalimagefilters
edtransferrates
overthecommunicationchannelandprocessingtimesfortheimplemented
hardware/risonwith
othersolutionsisgivenandarangeofapplicationsisalsodiscussed.
Keywords:Hardware/softwareco-design;Imageprocessing;FPGA;Embedded
processor
uction
ThetraditionalhardwareimplementationofimageprocessingusesDigital
SignalProcessors(DSPs)orApplicationSpecificIntegratedCircuits(ASICs).
However,thegrowingneedforfasterandcost-effectivesystemstriggersashiftto
FieldProgrammableGateArrays(FPGAs),wheretheinherentparallelismresultsin
betterperformance[1,2].Whenanapplicationrequiresreal-timeprocessing,like
videoortelevisionsignalprocessingorreal-timetrajectorygenerationofarobotic
manipulator,thespecificationsareverystrictandarebettermetwhenimplemented
inhardware[3–5].Computationallydemandingfunctionslikeconvolutionfilters,
motionestimators,two-dimensionalDiscreteCosineTransforms(2DDCTs)and
FastFourierTransforms(FFTs)arebetteroptimizedwhentargetedonFPGAs[6,7].
Featureslikeembeddedhardwaremultipliers,increasednumberofmemoryblocks
andsystem-on-a-chipintegrationenablevideoapplicationsinFPGAsthatcan
outperformconventionalDSPdesigns[2,8].
Ontheotherhand,solutionstoanumberofimagingproblemsaremore
flexiblewhenimplementedinsoftwareratherthaninhardware,especiallywhen
theyarenotcomputationalldemandingorwhentheyneedtobeexecuted
er,somehardwarecomponentsarehard
tobere-designedandtransferredonaFPGAboardfromscratchwhentheyare
mponentsareframe
grabbersandmultiple-camerasystemsalreadyinstalledaspartofanimaging
applicationorotherroboticcontrolequipment.
Followingtheaboveconsiderationsweconcludethatitisoftenneededto
integratecomponentsfromanalreadyinstalledcomputer-basedimagingapplication
dedicatedtosomeautomationsystem,withFPGA-basedacceleratorsthatexploit
riticalneedarises
foranembeddedsoftware/hardwareinterfacethatcanallowforhigh-bandwidth
communicationbetweenthehostapplicationandthehardwareaccelerators.
Inthispaperweapplyandevaluatetheperformanceofanexamplemixed
hardware/softwaredesignthatincludesontheonesideahostcomputerrunninga
NationalInstruments(NI)LabVIEWimagingapplication,equippedwithacamera
andaframe-grabber,andontheothersideaAlteraFPGAboard[9]runningan
communicationchanneltransferringimagedatafromthehostcomputertothe
hardwareboardisahigh-speedUSB2.0portbymeansofanembeddedmacrocell.
ThevarioushardwarepartsandperipheralsontheFPGAboardarecontrolledand
ultofthisevaluation
onecanexploretherangeofapplicationssuitableforahost/co-process小学必背古诗75首古诗 or
architectureincludinganembeddedNios-IIprocessorandutilizinganUSB2.0
communicationchannel.
Inthefollowing,wefirstgiveashortaccountofthetoolsweusedforsystem
presentanoverviewoftheparticularimagefilteringapplicationwe
embeddedintheFPGAchipfortheevaluationofthehost/co-processorsystem
ribethemodularinterconnectionofdifferentsystempartsand
inethespeedandframe-sizelimits
y,wecompareour
mixedhost/co-processorUSB-baseddesignintermsofotherarchitecturesandother
communicationsmedia.
toolsoverview
ThedesignofaDSPsystemwithFPGAsoftenutilizesbothhigh-level
algorithmdevelopmenttoolsandhardwaredescriptionlanguage(HDL)
canalsomakeuseofthird-partyintellectualproperty(IP)coresimplementing
typicalDSPfunctionsorhighspeedcommunicationprotocols
[1].Inourapplicationweusemodel-baseddesigntoolslikeTheMathworks
Simulink(basedonMathwork’sMATLAB)withthelibrariesofAltera’s
-BuilderusesmodeldesigntoproduceandsynthesizeHDL
code,whichcanthenbeintegratedwithotherhardwaredesignfileswithina
synthesistool,resentwork,
wedesignedimagefiltercomponentsusingDSP-Builderlibrariesandtheresulting
blockswereintegratedwiththerestofthesysteminQuartus’
System-On-a-Programmable-Chip(SOPC)Builder.
purposeistointegrateanembeddedsoftwareprocessorlikeAltera’sNios-IIwith
ndcustomorstandardperipheralswithinanoverallsystem,often
calledSystem-On-a-Programmable-Chip(SOPC).SOPC-Builderprovidesan
interfacefabricinordertointerconnecttheNios-IIprocessingpathwithembedded
andexte《七步诗》古诗完整版 rnalmemory,thefilterco-processors,otherperipheralsandthechannelsof
-IIapplicationswerewritteninANSI
CandwerecompiledanddownloadedtotheFPGAboardbymeansofAltera’s
NiosIIIntegratedDevelopmentEnvironment(IDE),atooldedicatedtoassemble
poseofNios-IIapplicationsistocontrol
processinganddatastreamingbetweenthecomponentsofthesystemandits
ostsideonemaydevelopacontrolapplicationbymeansof
ab-VIEWsoftwarebyNationalInstruments
Corporation[10],whichprovidesaveryflexibleplatformforimageacquisition,
imageprocessingandindustrialcontrol.
ngandimplementationofthefilterdesign
Themaintargetofthisworkistoevaluatetheperformanceofa
host/co-processorarchitectureincludinganembeddedNios-IIprocessorand
utilizingacommunicationchannelbetweenhostandhardwareboard,likea
k-logicperformedbytheembeddedacceleratorcanbeany
purposewe
builtatypicalimage-processingapplicationinordertotargettheFPGA
reductionandedgedetectionaretwoelementaryprocessesrequiredformost
machinevisionapplications,likeobjectrecognition,medicalimaging,lane
detectioninnext-generationautomotivetechnology,peopletracking,control
systems,lnoiseandedgefilteringusingtheAlteraDSPBuilder
pleofthisprocedurecanbefoundin[11].Noise
reductionisappliedwithaGaussian33kernelwhileedgedetectionisdesigned
unctionscanbeappliedcombinedin
nblockdiagramof
romnoiseandedgefilterblocks,
thereisalsoablockrepresentingtheintermediatelogicbetweentheNio-IIdataand
termediatehardwarefabricfollowsa
specificprotocolreferredtoasAvaloninterface[12].Thisinterfacecannotbe
modeledintheSimulinkenvironmentandisratherinsertedinthesystemasa
examplesimplementingtheAvalonprotocolcanbefoundin
Alterareferencedesignsandtechnicalreports[13].Inbrief,ourAvalon
implementationconsistsofa16-bitdata-inputandoutputpath,theappropriate
ReadandWritecontrolsignalsandacontrolinterfacethatallowsforselection
betweentheintermediateoutputfromtheGaussfilterortheoutputfromtheedge
putandoutputtoandfromthetasklogicblocksisimplemented
imageframewhenreceivedbythehardwareboardisloadedintoanexternal
SDRAMmemorybufferandisconvertedintoanappropriate16-bitdatastreamby
ansferbetweenexternalmemorybuffers
andtheNios-IIdatabusisachievedthroughDirectMemoryAccess(DMA)
operationscontrolledbyappropriateinstructioncodefortheNios-IIsoftprocessor.
Nios-IIcodeflowforthissys耷组词 temisdiscussedinSections5and6.
(Fig.1)
Incomingpixelsareprocessedbymeansofasimple2DdigitalFiniteImpulse
Response(FIR)filterconvolutionkernel,workingonthegrayscaleintensitiesof
eachpixel’sneighborsina3inesarebufferedthrough
delay-linesproducingprimitive3
1delayblockproducesa
neighboringpixelinthesamescanline,whileaz640delayblockproducesthe
meimagesizeof
640e-buffercircuitisimplementedinthesamemannerfor
esolutionisincorporatedintheline-buffer
ngeinframesizeisrequiredwe
berofdelayblocksdependsonthesize
oftheconvolutionkernel,whiledelaylinedepthdependsonthenumberofpixels
comingpixelisatthecenterofthemaskandthelinebuffers
ineswith
considerabledepthareimplementedasdedicatedRAMblocksintheFPGAchip
anddonotconsumelogicalelements.
(Fig.2)
Afterlinebuffering,pipelinedaddersandembeddedmultiplierscalculatethe
.3showsthemodel-designfor
implementationofthe3owninFig.3
model-baseddesigntransfersthenecessaryarithmeticintoaparalleldigital
-consumingcalculations,like
multiplicationsareimplementedusingdedicatedmultipliersavailablein
medium-scaleAlteraFPGAs,liketheCycloneIIchip.
(Fig.3)
Whenthetwofiltersworkincombination,theoutputoftheGaussiankernelis
inputtoa3,thekernel-pixelsare
edetector
.4showsthemodel-design
ar
plicitywecombinehorizontal
andverticaledgedetectionfilteringbysimplyaddingthecorresponding
yimageisproducedbythresholdingtheresultbymeansofa
.6showsaninputandthe
successiveoutputsofthehardwareco-processorfora640480pixelimage.
(Fig.4)
(Fig.5)
(Fig.6)
edsystemdesign
Theco-processorpartsdescribedabovewereimplementedascomponentsofan
s-IIsoftwarecpu
whichisusedherefordatastreamingcontrol,isoftenthebasisforindustrialas
eusedinitsevaluationversionalongwiththe
toolsforassemblinganddownloadinginstructioncode[14].Onceinstalledwithin
thesynthesissoftware,theNiosprocessorbecomesintegratedasalibrary
componentinQuartus’SOPCbuildertool.
DSP-Builderconvertsthemodel-baseddesignintoHDLcodeappropriatefor
terisreadilyrecognizedbythe
synthesissoftwareasaSystem-on-aProgrammable-Chip(SOPC)moduleandcan
odules
thatarenecessaryforacompletesystemaretheNios-IIsoftprocessor,external
memorycontrollers,DMAchannels,andacustomIPperipheralforhighspeed
ntrollercanbeaddedinorderto
suchperipheralfunctionscanbe
foundasopensourcecustomHDLIntellectualProperty(IP)orasevaluationcores
providedbyAlteraorthirdpartycompanies.
USB2.0highspeedconnectivi少年中国说梁启超朗诵 tyisaddedtotheFPGAboardbymeansofa
daughter-cardbySystemLevelSolutions(SLS)Corporation[15].Itcanbeaddedto
ughter-card
2.0
IPcorecompliantwithTransceiverMacrocellInterface(UTMI)
ed
evaluationversionsoftheIPcoreandpresentpracticaltransmitandreceiveratesin
AchipalongwiththeembeddedNios-IIprocessorisalwaysa
slavedeviceinthecommunicationviatheUSBchannel,whilethehostcomputeris
alwaysthemasterdevice.
AblockdiagramofthehardwaresystemimplementedontheFPGAboardis
nneltothehostcomputerisalsoshown.
TheembeddedsystemisassembledbymeansoftheSOPC-Buildertoolofthe
synthesissoftware,byselectinglibrarycomponentsanddefiningtheir
einggeneratedbySOPCBuilder,thesystemcanbeinserted
y
additionalcomponentsthatarenecessaryarePLLsforNiosandmemory
esynthesizeandsimulatethedesignbymeansofthetools
describedinSection2,wetargetaCycloneII2C35F672C6FPGAchip
boardalongwiththeperipheralUSB2.0extensionisshowninthepictureofFig.8.
Theboardalsofeaturesexternalmemoryandseveraltypicalperipheralcircuits.
ogramming
Afterdeviceconfiguration,weassembleanddownloadinstructioncodeforthe
Nios-IIprocessor,accordingtothespecificationsofNios’HardwareAbstraction
Layer(HAL).Theinterestedreadercanlookfordetailsinthereferencemanuals
forNios-IIanditsdedicatedIntegratedDevelopmentEnvironment(Nios-IIIDE)
[16].NioscodeisoriginallywritteninANSICandhasatwofold
purpose:
(a)Itcontrolshardwareoperations,likeDMAtransfersbetweenhardware
offersaprogramminginterfaceforhandlingdatachannels,
withAPIcommandslike‘‘open’’,‘‘read’’,‘‘write’’and‘‘close’’.
(b)Itallowsthesystemtoperformsimplesoftwareoperationsontheinput
datainsteadofusingdedicatedhardwarestagesforsuchprocessing.
Forexample,Niosinstructioncodecanbeusedtoconvertimagearrays
intoappropriateone-dimensionaldatastreams.
eforthe
particulardesignopenstheUSBportandwaitstoreadthetransmittedpixelarray.
Itthencontrolspixelstreamingfrominputtofinaloutputasdescribedinthe
followingsection.
tyflow
Theflowofsoftwareandhardwareactivitythatsupportstheoperationofa
systemdesignedaccordingtothemixedarchitecturedetailedinthisworkcanbe
outlinedasfollows:
(a)Animagestreamistransferredfromthehostcomputertothehardware
boardforprocessingthroughthehighspeedUSB2.0communication
ribedinthenextsectionthehostapplication
communicateswiththeUSBportusingAdvancedProgramming
Interface(API)callsfordatainputandoutput.
(b)AccordingtoNiosprocessorinstructions,embeddedDMAhardware
operationstransferdatafrommemorytotheNiosdatapathandintothe
hardwaretasklogicbymeansoftheAvaloninterface.
(c)Thedatastreamisprocessedthroughthehardwareaccelerator.
(d)DMAoperationsstreamthefilteredoutputresultfromthehardware
task-logicbacktoexternalmemory(seeFig.7).
(e)ThefinalresultisoutputtotheonboardVGAdigital-to-analogchannel,
whichisperipheraltotheNios-IIprocessorandissupportedby
r,adigitaltoanalog
converterforaVGAportisnotalwaysimplementedonadevelopment
board,soapossiblealternativeistheresultingbinaryimagetobe
channeledbacktothehostcomputerviatheUSBconnectionfor
essmentofthedesign
performancethatispresentedinSection8includesalltheabove
portanttonotethattheabovesystemisnotmerely
ablack-boxcustomdesignimplementedforaparticularapplication,
butrepresentsadesignmethodologythatcanbeusedforawiderange
hnicaldetailsoftheoperations
abstractedabovearewelldocumentedfortheuseroftheparticular
developmentplatform,sothateveryaspectofthedesigncanbetested
etailedanalysisofthesystemdevelopment
techniquesishoweveroutofthescopeofthepresentarticle.
-basedsetupandapplication
Onthehostpartavisionsystemisimplemented,appropriateforaspectrumof
tcomputerisaWindowsXPPentiumIVfeaturing
hostapplicationisaLabVIEWvirtualinstrument(VI)thatcontrolstheframe
grabberandperformsinitialprocessingofthecapturedimages(seeFig.9).The
framegrabbercansupportuptofiveindustrialcamerasperformingdifferenttasks.
InoursystemtheVIapplicationcapturesafullframesized640480pixelsfroma
CCIRanalogB&WCCDcamera(SamsungBW-2302).Itmaythenreducethe
imageresolutionapplyinganimageextractionfunction,downto320240pixels.
Itcanproduceevensmallerframesinordertotradesizefortransferrate.
TheLabVIEWhostapplicationcommunicateswiththeUSBinterfaceusing
APIfunctionsandaDynamicLinkLibrary(DLL)andtransmitsanumericarray
ntageofusingLabVIEWasabasisfor
developingthehostapplicationisthatitincludesaVISIONlibraryabletoperform
fastmanipulationofimagedataorapreprocessingoftheimageifitisnecessary.
Whenthereceptionoftheimagearrayiscompletedatthehardwareboardend,
thesystemloadstheimagedatatothefiltercoprocessorandsendstheoutputtothe
VGAcontrollerviaSRAMmemory(seeFig.7).Alternativelytheoutputcanbe
sentbacktothehostapplicationbymeansofa‘‘write’’
procedureisrepeatedwiththenextcapturedframe.
tionofthesystemperformance
Theabovesetupwastestedwithvariouscaptureratesandframeresolutions.
Usingseveraltest-versionsoftheSLSUSB2.0megafunctionwemeasuredreceive
(rx)andtransmit(tx)throughputbetweenthehostPCandthetargethardware
payloadof307,thatusing
theNiosIIHALdriver,thelatestevaluationversion1.2oftheIPcoretransfersin
highspeedoperation65Mbitspersecondinreceivemodeandabout80Mbpsin
r,datatransferrate
fromthehostcomputertothehardwareboardisonlyonefactorthataffectsthe
performanceofanimageprocessingsystemdesignedaccordingtoa
host/co-processorarchitecture,realsosoftware
issuestobetakenintoaccountbothatthehostendandattheNios-IIembedded
mple,framecapturingandserializationpriortotransferring
therhand,the
Nios-IIembeddedprocessorcontrolsthedataflowfollowinginstructioncode
downloadedtoembeddedmemory,,theoverall
LabVIEWsoftwareallowsforanefficienthandlingofarraystructuresandalso
possessesimagegrabbingandvisiontoolsthatreduceprocessingtimeonthehost
theabovesoftwarelimitations,therearealsohardwareissuesrelated
toanintegratedSystem-on-a-Programmable-chip,likethetimeneededforDirect
MemoryAccess(DMA)formanceofthe
hardwareboardisdividedintotheprocessingratesofthehardwarefilter
co-processorandtheperformanceoftherestofsystem,likeexternalmemory
condfactoraddsanoverheaddepending
uatethe
performanceoftheproposedarchitecturetakingintoaccountandmeasuringwhen
possiblethefollowingdelaytimes:
(a)Timetograbanimageframeandserializeit.
(b)TransfertimeovertheUSB2.0channel.
(c)Nominaltimeneededbytheco-processorfilterinordertoprocessthe
imageframe.
(d)Overheadtimeneededfordataflowandcontrolintheintegrated
hardwaresystem.
Table1summarizestheresponsetimeoftheaboveoperationsandr辛弃疾词醉里挑灯看剑 eports
lethesystemresultsinapractical
andstablevideorateof20framespersecondatanimageresolutionof320240
rly,largerframeswithdimensions640480pixelscanbetransferred
eboard
isclockedat100MHzthehardwareimagefilterprocessesa640480pixelsframe
inaminimumof3.1mswhileDMAtransfersandothercontrolflowadd
meis
transferredfromthehostcomputertothehardwareboardinapproximately26ms.
OtherpossiblelatenciesincludeproperdatamanipulationbytheNios-IIinstruction
codeanddependontheframesize.
Asaconclusion,delaysaredividedbetweensoftwareandhardwareprocedures
dwarefilterco-processordoesnotsubstantially
dNios-IIprocessingtimeand
transferratesoverthecommunicationchannelcanbeabottleneckforlargeframes.
SinceopencoreUSB2.0embeddedtechnologyisstilldevelopingonemayexpect
2summarizesthe
hardwarerequirementsfortheoverallsystemweimplementedinthisstudy(see
alsoFig.7).Thetablereportsnumberoflogicalelementsandmemorybitsneeded
toimplementthefunctionspresentedaboveinamediumFPGAchip,theAltera
lpotentialofthischipis33000Logicelements
lealsoreportsclockfrequenciesforthesoft
wereimplementedbymeansof
twoPhaseLockedLoops(PLLs).
(Fig.7)
isonwithothersystems
Inthefollowing,wepresentacomparisonwithotherimageprocessing
solutions,rtoestablishsome
numericalcomparisonbetweenthepresentedarchitectureandapurelysoftware
solution,hesizeddesignswith
variationsofthebasicfilteroperations,introducingdifferentdegreesof
implementedinhardwareaSumofAbsolute
Differences(SAD)algorithmfordensedepthmapcalculations,whichisbasedon
correlationoperationsandismuchmoreintensivecomputationallythansimple
aredtheresultswiththesamealgorithmsimplementedin
softwareandrunningonaPentiumIVprocessorat3GHzwith512MBRAM.
tware
resultswereattainedbyprogramminganalyticallythecorrespondingproceduresin
NI’sLabVIEWlanguage,
usedpre-capturedAVIvideosequencesandprocessedeachframewiththesame
esolutionsof
320240and640480pixelswereassessedseparately,sincetheyneeddifferent
transferringtimestoourhardwareco-processor.
Table3showsframeratesforprocessingvideofilesofbothresolutionsasa
irstcolumnofTable3,the
ationalcomplexityin
thesecondcolumnismeasuredasmultiplesofnN,wherenisthesizeofthe
convolutionkernel,equalto33,andNisatotalof320
caseoftheSADalgorithmcomplexityisconsideredtobetheproductnND,
wherenisthesizeofthecomparisonwindowappliedonanimageofNpixelsand
foratotaldisparityrangeofDpixels[17].Ourhardwareversionofthisalgorithm
rd
andfourthcolumnofTable3givetotalframerateforeachoperationwhen
implementedinpuresoftwareandinourproposedarchitecture,
notethattheframespersecond(fps)valueinthecaseoftheSADalgorithmrefers
toasingletransmittedframeinsteadofthestereopairinordertohavearesult
hstereo.
Table3Frame-ratecomparisonbetweenPC-basedimplementationsand
host/coprocessordesignsforanumberofimageprocessingoperationsimagesthe
wholecycleoftransmitting,processingandprojectingonamonitorscreenis14
fps(resolution320240),whenworkingwiththecurrentversionoftheSLS
USB2.0megafunction.
ThelastcolumnofTable3givestotallogicelementsrequiredformapping
uiredhardware
resourcesincreasewithincreasingnumberofstagesinthehardwarepipeline.
However,resourcesdonotnecessarilyincreasewithincreasingcomputational
sonisthatthesamehardwarestructurecaninprinciplebe
adequateforbothsmallandlargeframes,sincethesameparallelcomputationsare
neededperpixelandperclockcycle,aspixelsstreamintothesystempipeline.
Framesofincreasedresolutiononlyneeddeeperlinebuffers,asshowninFig.2.
Hence,-chip
memoryrequirementsfortheimplementationsofTable3varybetween185,000
and230,000memorybitsoutofatotalof480,reimplementationsin
Table3requireanominalprocessingtime0.77msforevery320240frameand
3.1msforevery640480frame(seealsoTable1).SADneedstwicethattimein
onaltimesfordatatransferringandcontrolare
asdiscussedinSection8.
.10theprocessing
r昭昭牵牛星皎皎河汉女 esultsofavideofilewithframeresolution320
hardware/.10bour
hardwareprocessingresultsareshownwithsquaresandrefertoframeresolutionof
640mbs,atthelowerpartofbothdiagramsrepresentframerates
achievedbypuresoftwarerunningonthehostcomputer.
ItcanbeseenthattheperformanceofthehostPCwithoutthehardware
co-processorisgoodinthecaseofsimpleprocessesbut,asitisexpected,itfalls
formanceofour
hardware/softwaremixedarchitectureisalmostconstantascomplexityincreases,
becauseinallcasesprocessingiscompletedinasinglepass,t
decreaseisduetopre-processingsteps,accordingtothetypeofoperation.
However,theperformanceofoursystemisdependentonframesize,asisshown
uctionofframerateinthe
caseofincreasedresolutionisclearlyattributedtoincreasedpayloadduringUSB
transfersandDMAtransactionsfromboardmemorytotheFPGAsystem.
Bybetteroptimizingthesoftwarealgorithmsorbyadoptingfasterprocessors
inthefuture,theperformanceofthesoftwaresystemcanbeenhanced,howeverthe
gapwillstillpersist,withincreasingcomputationaldemands.
Thebottomlineoftheaboveanalysisisclearlythatthesuitabilityofthe
tterjustifiedinthe
caseofalgorithmswithincreasedcomputationalcomplexity,wherethePCalone
gamoderateframeresolution
helpsourdesigntorespondatadecentrate,whichhoweverisstilllessthantrue
videorate.
Dependingonthesystemrequirementsotherapproachesmayalsobeused.
UsingavideoinputcardliketheAlteraDCVideo-TVP5146Nonecaninputvideo
datatotheFPGAandperformarangeofimagefunctions,likesimplefiltering,
colorprocessing,recastingtodifferentvideoformats,compression,etc.[18].In
thiscaseonecanbypassthecomputer-basedframegrabberandavoidtheuseofa
stheprincipalcostoflosingtheflexibilityinherentin
thesoftwarepartofthesystemaswellaslosingthePC-basedcontroloverthe
camerasandframegrabber,buttheresultisanall-hardwaresystemthatexcelsin
singrateisinprinciplelimitedbytheframegrabber,whichin诗经最美的十首爱情诗
thecaseofNTSCcompositeinputsignalis30fpswhileforPALvideoinputitis
25fps.
Similarly,processingatvideo-ratemaybeobtainedbyusingtheASICs
approach,whichusuallyincorporatesallpartsinanall-hardwarecustomsystem.
Imageprocessingimplementationsofmediumandhighcomplexityappearinthe
literatureandmostofthemcanmanipulateimagesinrealtime[19,20].Theycan
reachaframerateof30framespersecondorevenmoreinsomecustomsystems.
InsomecasestheycanalsoincorporateaCCDinterface[21].However,ASIC
implementationsarecomplexandexpensivesystemsthatsufferintermsof
veslowdesigncycleandarecertainlyawayfromtheplugand
playapproachadoptedinthepresentarticle.
PureDSP-baseddesignssupportthousandsofMIPSandarecomparablein
performancewithdesktopPCs,so
urelysoftware-basedtheyaremuchmore
tion,thecontrolflowpartofan
er,theycaneasily
incorporatevideoinputandoutputchannelswithlittleadditionalhardware[22].
However,justasisthecasewithordinaryserialcomputerstheycannotperform
computationallydemandingtasksatahighvideo-rate[23].
TheuseofFPGAsinhardwareimplementationscanpartlybridgethe
flexibilitygapbecauseFPGAsarere-programmableandhavearapiddesigncycle.
ComparedwithDSPs,systemsimplementedusingFPGAsaremoreefficient,
r,
buildingwithFPGAs,wealsohavetoimplementinhardwareprotocolsand
interfacesforvideoI/O.
Inthehost/co-processorarchitectureappliedinourdesign,theFPGAabsorbs
sectionsofthealgorithmthatarecostlyinsoftware,whileNios-IIexecutesflow
ghNiosisamediumperformanceprocessoritcangreatlyhelpby
implementingthecontrolpath,aswellassimplepre-processingandpost
ystem,anapplicationcanfurtherbepartitionedbetween
tpartinterfaceswitha
lsoexecutepartsofthe
algorithmthataresoftwarefriendly.
Hardwaretasklogicoperatesasanacceleratorofanycomputationally
demandingfunctionandcanbeaddedintheSOPCsystemasalibrarycomponent.
Itcanalsobere-used,sharedandimportedinanySOPCsystemthatfollowsthe
haracteristicstestifyofthehighlevelofflexibility
inherentinthisdesign.
Asshownfromthediscussionabovethepresentedsystemperformsbetterthan
ageneralpurposecomputerorapureDSPdesigninthecaseofheavy
computationaltasks,ssefficient
thanapurehardwaresystem,sincesuchsystemscanperformat30framesper
ritismoreflexibleandhasaclearpotentialtoevolve.
tion,this
particularIPcorecomplieswithAltera’sOpenCorePlusprogram[24]thatoffers
ytheproliferationof
USB2.0communicationportsinmoderncomputersandvideoequipmentmakes
thisproposedchannelasuitablecommoditychoicewhenbuildinga
hardware/softwarearchitecturebasedonaFPGAco-processor.
sions
Thedesignofageneralhardware/softwaresystembasedonahostcomputer
temperformance
wasstudiedinthecaseofimageprocessingalgorithmsandrepresentsadesign
sed
onahardwareboardfeaturingamediumFPGAdevice,whichisconfiguredasa
SystemOnaProgrammableChip,withaNios-IIsoftwareprocessorinthesystem
ardwarecomponentsarelocalandon-chipmemoryanda
UTMI-compliantmacrocellbySLSCorporation,allowingfastcommunication
withahostcomputer.
AnintegratedsystemcontrolledbyaNios-IIsoftwareprocessorprovides
s
optimumchoiceofhardware/softwarefeaturesmayacceleratethesystem
ioningamachinevisionapplicationbetweenahostcomputer
andahardwareco-processormaysolveanumberofproblemsandcanbeappealing
inanacademicorindustrialenvironmentwherecompactnessandportabilityofthe
systemisnotofprimalimportance.
TheprocessingandtransferratesreportedinTables1and3canbesufficient
sly,thesystem
isbetterjustifiedinthecaseofverydemandingimageprocessingcomputations
sksare
pointpatternmatchingforbiometricapplications[25]orblockmatchingfor
findingcorrespondencesbetweenimagesofastereopair[17].Thelaterapplication
isusuallydependedonmanycamerasandhassignificantcomputationaldemands.
Camerasandframegrabbersarebettercontrolledbyacomputerwithcustom
softwaresinceimagegrabbingandtransmittingprotocolsarenoteasilytransferred
therhand,thereal-timecalculationofdisparities
fromtwoormorecamerasisbetterperformedbyhardware[26–28].
Alternativelythehost/co-processorNios-IIbasedarchitecturecanbeusedfor
implementingandtestinginhardwareavarietyofimage-processingacademic
designs,
canalsobeusedtomanageimageprocessinginanumberofeducational
applicationsandstudentmachine-visionexercises[29].
Futureworkcanincludemoretestsofcomplexalgorithmsorcomparisonsof
thepresentedarchitecturewithfuturehighthroughputEthernetchannelorPCI
channelforhost/,itiscertainlymeaningfulto
generationUSB2.0macrocellsandNiossoftprocessorscanstillincreasetherange
ofapplicationsoftheproposeddesign.
Acknowledgements
ThisworkwasconductedincommunicationwiththesupportteamofSystem
LevelSolutionsCorporation,whoprovidedsuccessiveevaluationversionsofthe
toespeciallythanksoftwaremanagerTejas
Vaghelaforhisconstantsupportandadvice.
更多推荐
lvb是什么意思在线翻译读音例句
发布评论