Implementing the GBT data transmission protocol in FPGAsS. Baron a, J.P. Cachemiche b, F. Marin b, P. Moreira a, C. Soos aaCERN, 1211 Geneva 23, Switzerland, bCPPM, 13288 Marseille, [email protected], [email protected], [email protected], e GBT chip [1] is a radiation tolerant ASIC that can beused to implement bidirectional multipurpose 4.8Gb/s opticallinks for high-energy physics experiments. It will be proposedto the LHC experiments for combined transmission of physicsdata, trigger, timing, fast and slow control and monitoring.Although radiation hardness is required on detectors, it is notnecessary for the electronics located in the counting rooms,where the GBT functionality can be realized usingCommercial Off-The-Shelf (COTS) components. This paperdescribes efficient physical implementation of the GBTprotocol achieved for FPGA devices on Altera and Xilinxdevices with source codes developed in Verilog and VHDL.The current platforms are based on Altera StratixIIGX andXilinx Virtex5.We will start by describing the GBT protocolimplementation in detail. We will then focus on practicalsolutions to make Stratix and Virtex transceivers match thecustom encoding scheme chosen for the GBT.Results will be presented on single channel occupancy,resource optimization when using several channels in a chipand bit error rate measurements, with the only aim todemonstrate the ability of both Altera and Xilinx FPGAs tohost such a protocol with excellent performances. Finally,information will be given on how to use the available sourcecode and how to integrate GBT functionality into customFPGA applications.Logically the link provides three “distinct” data paths for:Timing and Trigger, Data Acquisition and the Slow Control.In practice, the three logical paths do not need to be physicallydifferent and are merged. The aim of such architecture is toallow a single link to be used simultaneously for data readout,timing and trigger distribution, readout and experimentcontrol. The link establishes a point-to-point opticalbidirectional connection (using two optical fibers).The GBT chipset [2] is under development to match sucharchitecture. It targets high-speed (3.36Gb/s) data transmissionbetween the detectors and the counting room.As illustrated in Figure 1, such a link is implemented by acombination of custom and Commercial Off-The-Shelf(COTS) components. In the counting room, the receivers andtransmitters will be implemented using COTS components andFPGAs while, embedded on the detectors, the receivers andtransmitters will be implemented by the GBT chipset andVersatile Link Components [3]. This architecture clearlydistinguishes between the counting room and front-endelectronics specificities: that is, the on-detector front-endelectronics works in a hostile radiation environment requiringcustom made components while the counting room electronicsoperates in a radiation free environment allowing the use ofCOTS components. Moreover, the availability of FPGAs withup to 48 Hard-IP serializer blocks would allow concentratingdata from several front-end sources into a single module in thecounting room facilitating data merging and leading tocompact systems.The study presented below will focus on proving theusability of COTS components and FPGAs to implement theGBT protocol in counting rooms [4].I. GBT PROTOCOL PRESENTATIONA. IntroductionThe general architecture of a high-speed optical linkimplemented using the GBT chipset and FPGA is representedin Figure 1.Figure 1: GBT optical link implementation schemeB. GBT ProtocolDue to the beam luminosity planned for SLHC, the highspeed data transmission link will be exposed to high SingleEvent Upset rates. SEUs are a major impairment to error freedata transmission. To deal with this, the GBT line codingadopts a robust error correction scheme that will allowcorrection of bursts of errors caused by SEUs. A significantfraction of the channel bandwidth must therefore be assignedto the transmission of a Forward Error Correction (FEC) code.The code to be used must provide a high level ofprotection, since errors occurring during transmission can alsooccur as burst errors and not only as isolated events. Becauseof this, a double interleaved Reed-Solomon correcting codewas chosen. The code is built by first scrambling the inputdata to provide DC-balancing of the frame, and theninterleaving two Reed-Solomon encoded words (using 4-bit631

symbols), each capable of correcting a double symbol error(Figure 2). The interleaving operation allows increasing thecorrection capability of errors up to 4 symbols.and serialization. The line encoding/decoding process isrepresented in Figure 4.No problem was encountered to configure the hard-IPtransceivers of the FPGAs, as the portability of this protocolwas carefully checked during the specification phase of theGBT. In particular, the ability of Stratix and Virtextransceivers to transmit 120 bits at a frequency of 40 MHz wasensured at that time.Figure 2: GBT encoding schemeThis in practice means that a sequence of up to 16consecutive incorrectly-received bits can be corrected. Thiscorrection technique requires an extra field of 32 bits in theframe to protect the 88 transmitted bits (including data, headerand slow control), resulting in a code efficiency of 73%.The frame (sketched in Figure 3) is composed of 120 bitsthat are transmitted during a single SLHC bunch crossinginterval (25 ns) resulting in a line data rate of 4.8 Gb/s. Ofthese, 4 bits are used for the frame Header (H) and 32 used forForward Error Correction (FEC). This leaves a total of 84 bitsfree for data transmission corresponding to a user bandwidthof 3.36 Gb/s. In these 84-bits, 4 are always reserved for theSlow Control (SC) field (see ‘Slow control channel’) and 80bits are reserved for data (D) transmission. Among the 4-bit ofslow control, 2 are reserved for GBT control and 2 are userdefined. The ‘D’ field use is not pre-assigned and can be usedindistinguishably for Data Acquisition (DAQ), Timing Trigger& Control (TTC) or Experiment Control (EC) applications[5][6].Figure 4: Block diagram of a full GBT link in an FPGAHowever, these transceivers provide neither specificencoding schemes like the one we selected nor flexible wordalignment functions. This is mainly due to the fact that theytarget the most common telecommunication protocols. We hadthus to implement in user logic all the encoding and decodingblocks, as well as a customized pattern detection and wordalignment block (see Figure 5).This makes 2 80 82 bits of data available to the user fora frame of 120bits, giving a payload of 68%.Figure 5: Frame alignment procedure in FPGAsAt power on or after a loss of synchronization, the receiverstarts a frame-lock acquisition cycle to find the frameboundaries, that is, to acquire frame synchronization.The frame-lock acquisition mode operates as follows. Inthe StratixIIGX, the transceiver hard-IP word aligner blockcannot be bypassed. It is thus configured to lock on anarbitrary pattern. Once completed the process is not repeated,except at power on or upon a command from the patterndetection state machine. For all the other devices, we bypassthe word aligner inside the transceiver.Figure 3: GBT frameII. GBT PROTOCOL IMPLEMENTATION IN FPGASA. FPGAs constraintsIn the same way it is done in the GBT ASIC, the DCbalance of data transmitted over the optical fiber is ensured bythe FPGA by scrambling the data contained in the SC and Dfields. For forward error correction the scrambled data andheader are Reed-Solomon encoded before nibble interleaving,The parallel output of the receiver feeds the custom patterndetection and word aligner blocks, which take control of theframe alignment process: for each received frame the four bitsin the header position are checked for header validity. Becausethe header pattern can be found in the data, 23 consecutiveframes must contain a valid header before the frame isconsidered locked (the probability of false boundary detectionis then reduced below 10-20 as demonstrated in [5]). Otherwise,the frame is shifted by one bit and the valid header checkingprocedure is repeated. After frame-lock is achieved, the632

receiver switches to the frame-tracking mode, which maintainsframe synchronization even in the presence of headerscorrupted by noise or single event upsets.The phase tracking mode must thus be tolerant to a lowrate of detection of invalid headers. Provided that framesynchronization is maintained, the detection of a corruptedheader will not introduce a transmission error since the headerfield is also protected by the forward error correction codetransmitted with the frame. A corrupted header will thus becorrected and properly identified by the Reed-Solomondecoder. The frame tracking mode operates as follows: after asuccessful frame-lock acquisition cycle has been executed thereceiver enters the frame-tracking mode. In this mode thereceiver strives to maintain frame synchronization. It checksthe validity of the headers and counts the number of invalidheaders received in 64 consecutive frames after the firstinvalid header has been detected. If the number of invalidheaders received in 64 consecutive frames is bigger than 4then the receiver re-enters the frame-lock acquisition mode.Otherwise the receiver resets the count of invalid frames andremains in the frame-tracking mode.B.Resource UsageThe full serializer-deserializer, as described above, wasimplemented both in a StratixIIGX and in a Virtex5FXT.Besides the transceivers and PLLs, which do not consume anyresources as they are hard-coded, a single link consumes 1542ALMs (Adaptative Logic Modules) for the StratixII and 1481Slices for the Virtex5.The table 1 shows the number of links which can beimplemented in a selection of StratixIIGX and of Virtex5FXTdevices, taking into account the available transceiver blocksand logic elements.links to GBT transceivers: some links must be left to outputprocessed data and therefore occupancy will be lower.However, as a back-end FPGA has to dedicate a significantpart of its logic to other tasks, optimization of the resourcesused by the decoding block is a must.C. OptimizationAn analysis of the resource usage per block for a singlelink (see Figure 6) quickly shows that more than half of thelogic elements are used by the Reed-Solomon decoder.Figure 6: % of ALMs/Slices of one GBT link used by eachfunctional blockIt was thus natural to study optimization schemes,particularly for designs hosting several GBT links in onedevice. The first possibility is to share one decoder blockbetween several links, multiplying its operating frequency bythe same factor. The Reed-Solomon decoding algorithm is alarge combinatorial circuit, and the maximum operatingfrequency achieved was 134MHz for the StratixIIGX,applying all the timing optimization constraints available. Thisallowed to share one decoder block between 3 links.An analysis of the resources used for 12 links implementedin a StratixIIGX type EP2SGX90 was carried out with andwithout optimization.Table 1: Maximum GBT links for StratixIIGXFigure 7: Effect of optimization by 3 on 12 links implementedon a EP2SGX90Table 2: Maximum GBT links for Virtex5FXTDifferences of occupancy between Table 1 and Table 2emphasize the different policies used by Altera and Xilinx interm of ratio between the number of logic cells and thenumber of transceivers. However, these numbers should beused with care. It is obvious that the occupancy of logic cellsis too high if one tries to use all the available transceivers of achip for GBT protocol implementation. This is tempered bythe fact that a design using GBT links will not dedicate all itsAs shown in the Figure 7, the device occupancy droppedfrom 51% of ALMs to 40% thanks to the optimization. Indeed,the fraction of the resources used by the decoder blocksdropped from 28% down to 10%. However, 7% of new logicelements were added due to the resource consumingmultiplexers and de-multiplexers required to share thedecoder.This implementation was tested on a PCIe SIIGXdevelopment kit with three optimized links using loopbackcables mounted on the HSMC connectors. It ran several dayswithout a single error being detected.633

The next step for optimization could be to pipeline thedecoder algorithm to increase the clock frequency. Thedrawback of this implementation, beside its complexity, is thatit increases the decoding latency.III. MEASUREMENTSA. Setups and equipmentTwo evaluation boards were used to implement the GBTprotocol on FPGAs. The ML523 (hosting a Virtex5FXT typeXC5VFX100T) for Xilinx [8], the PCIe SIIGX DevelopmentKit (hosting a StratixIIGX type EP2SGX90) for Altera [7],both powered by the power supply given in the kit (SeeFigure 8).The signal (that looks like a PRBS due to the scrambling)was transmitted by an SFP to the receiver in the StratixIIover a short optical fibre (A). After full decoding (and remotemonitoring of the decoded values), the data were encodedback, serialized again and transmitted using another SFP module and an optical fibre (B) back to the Virtex5, where itwas decoded and compared to the generated words.We let the system run during several hours withoutcounting any error. Besides providing us an opportunity toimplement the GBT protocol on both main technologies, thistest allowed us to check the compatibility between the GBTASIC protocol and its VHDL translation: the Virtex5 had theReed-Solomon encoder and decoder implemented in Verilog(the direct copy of the GBT protocol implementation in theASIC), whereas the StratixII encoder and decoder wereimplemented in VHDL.C. Jitter performancesUsing the same setup, we measured the jitter out of the twooptical fibres A and B in Figure 9. For each of the resultsbelow, the SFP module transmitting the optical signal wasthe same (it was successively mounted on A and B fibres totest Xilinx and Altera devices).As presented in Figure 10, Xilinx and Altera platformsboth showed excellent performances. The eyes were widelyopen, and the total jitter of the order of 80ps PP and 5ps RMS.Figure 8: Evaluation platforms. ML523 from Xilinx (left) andPCIe SIIGX from Altera (right)The reference clock was generated by the J-BERT 4903Afrom Agilent on differential SMA cables.For all the qualitative measurements, the very same SFP 1300nm optical transceiver module from MergeOptics wasused (mounted and dismounted from one board to another).The optical patch cords were 50cm long.The jitter measurements were made at the optical receiverlevel with the Lecroy SDA100G sampling scope equippedwith 10 GHz optical sampling head.Figure 10: Eye diagrams for Xilinx Virtex5 FXT (left) andAltera StratixIIGX (right)B. Platform testingVarious platforms and technologies were tested byimplementing the GBT protocol in both Altera and Xilinxchips presented above. As described on the Figure 9, agenerator instantiated in the Virtex5 was sending parallel data(80 bits @ 40 MHz, either constant words or flying bits) to theencoder and serializer.IV. SOURCE CODE AVAILABILITYReference designs of the GBT protocol will be madeavailable before the end of 2009 for both Altera and XilinxFPGAs. They will be presented as a firmware-based starter kit,downloadable on request via the CERN SVN repository. Thisstarter kit will include the source code for bothimplementations, and, as much as possible, for various typesof devices (StratixII and IV GX, and Virtex5 and 6 FXT) andvarious flavors of optimization. It will also includedocumentation.Basic support will be provided on how to use and optimizethe implementation.V. CONCLUSIONFigure 9: Test setup based on two platformsWith this study, we proved that the GBT protocol canindeed be implemented with success both in Altera and XilinxFPGA chips. The scheme proposed in the introduction where634

GBT ASICs are used in detector areas and FPGAs in countingrooms is thus a valid prospect, and the developed code willnow be used as a basis to test the GBT serdes chip once itbecomes available.A firmware-based starter kit will be made available uponrequest to the users. It will be progressively completed byseveral implementation flavors for StratixIV and Virtex6, andnew optimization techniques like a pipelined Reed-Solomondecoder are being considered.VI. REFERENCES[1] GBT project home page:[2] P. Moreira, GBTx BTX/Specifications/gbtxSpecsV1.2.pdf[3] F. Vasey, “Versatile Link”, ACES 2009 workshop, 3-4March 2009, CERN, y?contribId 37&sessionId 22&confId 47853[4] GBT-FPGA project web ] G. Papotti, “Architectural studies of a radiation-hardtransceiver ASIC in 0.13 mm CMOS for digital optical linksin high energy physics applications”, PhD thesis, University ofParma, Italy, January 6] G. Papotti, “An Error-Correcting Line Code for a HEPRad-Hard Multi-GigaBit Optical Link”, 12th Workshop forLHC and future Experiments (LECC 2006), Valencia, Spain,25-29 September 2006, 30&sessionId 19&confId 574[7] Documentation on Altera PCI express DevelopmentKit, StratixIIGX era/kitpciexpress s2gx.html[8] Documentation on Xilinx Virtex5 FXT ML523RocketIO GTX characterization -V5-ML52XUNI-G.htm635

I. GBT PROTOCOL PRESENTATION A. Introduction The general architecture of a high-speed optical link implemented using the GBT chipset and FPGA is represented in Figure 1. Figure 1: was chosen. The code is built by first scrambling the input GBT optical link implementation scheme Logically the