This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/ republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

# 9.5 Gbit/s 20 Channel 1:8 DEMUX for a Coherent Optical Receiver DSPU ASIC Input Interface

Vijitha R. Herath<sup>1, 3</sup>, Olaf Adamczyk<sup>2, 3</sup>, Ralf Peveling<sup>3</sup>, Christian Wödehoff<sup>3</sup>, Reinhold Noé<sup>3</sup>

1. now with University of Peradeniya, Sri lanka 2. now with Nokia Siemens Networks, Germany 3. University of Paderborn,

Germany

E-mail: vijitha@ee.pdn.ac.lk

*Abstract*-This paper presents the design of an input interface to a CMOS DSPU of an optical coherent QPSK with polarization multiplex receiver. The interface consists of a 20 channel 1:8 DEMUX. Source Coupled FET logic (SCFL) and CMOS logic were used in the design. The interface converts 10 Gbit/s input data rate to 1.25 Gbit/s at the output. The interface gives an open eye diagram at the output up to 9.5 Gbit/s input data rate. The system consumes 7.9 mW/channel.Gb/s. 130 nm bulk CMOS technology was used in the design.

## I. INTRODUCTION

Coherent quadrature phase shift keying (QPSK) with polarization multiplex allows upgrading existing 10 Gb/s transmission systems for operation at 40 Gb/s. Coherent receivers can equalize chromatic and polarization mode dispersion in the electric domain. Efficient implementation of this receiver concept requires new algorithms for carrier and data recovery, and polarization control schemes that allow the use of distributed feedback (DFB) lasers [1]. FPGA-based real-time implementations have achieved data rates up to 10 Gb/s [2-4]. An integrated approach that combines 4 analogto-digital converters (ADC) and a digital signal processing unit (DSPU) in a single application specific integrated circuit (ASIC) was presented in [5]. It has 20M gates and achieves a data rate of 40 Gb/s. The modular solution allows the fabrication of the ADCs and the DSPU using the best-suited device technologies [6]. This approach enables maximum performance of each component and avoids the high complexity and difficult power management of a single-chip approach.

The ADCs convert received QPSK with polarization multiplex signal to the digital domain. The nominal data rate at the ADC output is 10 Gb/s. The 20 ADC output channels are fed to the CMOS ASIC. The DSPU in the ASIC performs carrier and data recovery and polarization control. But the DSPU input can operate only up to 1.25 Gb/s. Furthermore the DSPU uses CMOS logic while the ASIC input is source coupled logic. Therefore it is necessary to have an input interface which down convert 10 Gb/s input data rate to 1.25 Gb/s output data rate. The interface should convert the SCFL input signals to CMOS logic signals at the output. This paper presents the input interface of the CMOS DSPU ASIC.

The paper is organized as follows. Next section discusses the DEMUX interface architecture. After that the clock



Fig. 1. Interface architecture

control and clock distribution issues related to the interface are discussed. In the fourth section the layout and packaging of the system is discussed. Finally test results of the interface are discussed.

### II. INTERFACE ARCHITECTURE

As shown in the Fig. 1 the interface includes four parallel 5 bit 1:8 DEMUX blocks. The DEMUX blocks are synchronized by a clock distribution system. The 1:8 DEMUX has tree architecture (Fig. 2). Therefore it consists of three cascade 1:2 DEMUX blocks. The 1:2 DEMUX block at the input operates at the highest speed (10 Gb/s). The second DEMUX stage operates at 5 Gb/s. In order to minimize power consumption the second stage current is designed to be lower than the first stage. In the tree architecture, number of 1:2 DEMUX blocks increases power of 2 with each stage. Therefore it is essential to make the cells smaller in order to maintain the layout area small. The first and second DEMUX stage use source coupled FET logic (SCFL) for the design. Fig. 3 (top) shows the schematic of a SCFL D-latch, which is the fundamental building block of the DEMUX. The input speed of the third DEMUX stage is 2.5 Gb/s. Because the CMOS logic can operate at this speed and CMOS logic consume less power and less area, third DEMUX stage is designed using CMOS logic. The Fig. 3 (bottom) shows the schematic of a CMOS D-latch. Between the stages two and three there is a SCFL to CMOS logic converter circuit. The system operates with half rate input clock. The input clock signal is directly fed to the first



Fig. 2. A 1:8 DEMUX architecture

DEMUX stage through sequence of buffers. The input clock frequency is divided by two and the resulting signal is send through phase shifter and bit rotator circuits (clock control) before feeding to the second DEMUX stage. The clock signal frequency is divided by two for the second time and again send through another clock control module in addition to converting it to CMOS logic before feeding to the third DEMUX stage.

## III. CLOCK DISTRIBUTION SYSTEM

Clock distribution is a critical issue in large-area circuits. Even though the tree-type architecture provides excellent high speed performance, it involves the difficulty of establishing the appropriate phase relationship between the different frequencies clocking the data of every stage [7]. The main requirement of the clock distribution system is to ensure the synchronicity of each 1:2 DEMUX on the same level together with the following DEMUX stages at the reduced clock and data rate. Furthermore, the clock distribution has to be able to drive every latch in each DEMUX block without degrading the quality of the clock signal. The first requirement is achieved by maintaining the length of the clock lines to all DEMUX in a same level constant and the number of buffer stages constant. Meander lines are used to maintain constant line delays whenever necessary. In order to maintain a proper clock signal so that correct switching occurs, no single buffer drives more than five subsequent buffers (fan out). Inter-stage synchronization is achieved by using clock controllers (Fig. 4 (top)) after each clock divider stage.

The clock controller consists of a phase shifter (Fig. 4 (middle)) followed by an EXOR bit rotator (Fig. 4 (bottom)). The phase shifter uses either the inphase or the 90° delayed output signal of the clock divider, depending on the control signal. The EXOR circuit can select between the inphase or the inverted clock signal. Therefore it is possible to obtain 0°, 90°, 180°, and 270° phase shifted versions of the input clock



Fig. 3 SCFL D-latch (top) and CMOS D-latch (bottom) schematics

signal. DC control signals (ground/open) are applied at the control inputs [8].

The final clock distribution system is shown in the Fig. 5. The clock signals for all the three 1:2 DEMUX stages are symmetrically distributed to each polarization component (P1 and P2) of the 1:8 DEMUX block. The clock signals are routed between inphase (I) and quadrature (O) DEMUX blocks of each polarization in order to maintain clock line lengths equal. There are several buffer stages before a clock signal reaches a 1:8 DEMUX block. Within each 1:8 DEMUX block clock signals are distributed using the tree architecture. There is a SCFL - CMOS logic interface between the second and third DEMUX stages. Clock controllers in a given stage operate identically with same control signals. The last stage of the clock distribution tree uses CMOS logic. Therefore the size of the last DEMUX stage is considerably smaller than the comparable SCFL implementation. This also relaxes the loading on the clock distribution.

#### IV. LAYOUT DESIGN

Fig. 6 (left) shows the layout of the DSPU which includes 1:8 DEMUX. A 120 nm bulk CMOS technology from ST Microelectronics (France) is used for fabrication. This technology has six copper metal layers ( $\varepsilon_r = 4.2$ ). Layer 1 or the bottom layer is used for signal interconnects. Layer 5 is used for ground and the layer 6 is used for the power supply.



Fig. 4. Clock controller architecture (top), phase shifter schematic (middle), and bit rotator schematic (bottom)

The total device count of the DEMUX chip is 11838 including 9890 transistors. The chip area is 4.1 mm x 4.1 mm. It is possible to design the layout of the DEMUX in a more compact form, but due to practical issues regarding connecting 4×5 differential 10 Gbit/s data paths from the ADCs to the DEMUX this layout topology was selected. The clock distribution system is symmetric with respect to the horizontal center line of the chip core. Input data and clock lines are 50  $\Omega$  microstrip transmission lines. Output lines operate with CMOS logic. The IR drop or the voltage drop across the conducting metal of the power and ground lines in the chip is a critical design issue. I is the current through the supply and the ground metal conductors and R is the path resistance of the metal conductors of the supply and the ground lines. The IR drop limits the actual voltage across the devices and as a result performance degradation occurs. The supply (V<sub>DD</sub>) metal layer and the ground metal layer widths are carefully calculated so that they have necessary current



Fig.5. Clock tree architecture

carrying capacity and the IR drop across is within the accepted range. The metal area connected to the gate of a MOSFET is carefully chosen to avoid the antenna effect. In order to avoid the antenna effect a diode can be connected to the metal near the gate. This diode (antenna diode) can protect the gate oxide from excessive charge build up. Antenna diodes are inserted adjacent to the gates where ever necessary [9]. Each 1:8 DEMUX block is among other things encircled by a guard ring (connecting the substrate to the ground) in order to suppress the switching noise generated by the DSP block. The Calibre® design rule checker is used to check the layout for the violation of design rules. The DIVA® Layout vs. Schematic (LVS) checker is used for layout verification.

This full custom designed DEMUX was connected to the input of the standard-cell-designed DSPU and both the DEMUXs and the DSPU are integrated on a single chip. This chip consists of both SCFL (analog) blocks and CMOS logic (digital) blocks in a single device. The DSPU generates a lot of switching noise. Therefore the full-custom DEMUX and the standard-cell design were isolated as much as possible by using substrate-to-ground contacts all around the full-custom circuit parts. The main circuitry of the DSPU fits to the center square in the middle of the layout. 10 Gbit/s data inputs are routed from the top and the bottom of the chip and are passed



Fig. 6. Layout of the DSPU (left), and the die photograph (right)



Fig. 7. Clock divider sensitivity curve and the clock output signal when the clock input frequency is 5 GHz

on in demultiplexed form to the DSPU in the chip center while the clock input comes from the side. There is an unavoidable overlap between the full-custom and standardcell layouts. Fig. 6 (left) shows the die photograph.

## V. RESULTS

The data inputs of the A/D converters were connected to the output signals generated by a 10 Gbit/s transmission system. The A/D converters generate half-rate clock output signals apart from the A/D converted data. The A/D half-rate clock output and the DSPU clock input were connected through an adjustable delay line to compensate any phase mismatch. The phase control switches (PC1 and PC2) were activated whenever necessary to achieve synchronization. The input data were generated using a  $2^7$ -1 pseudorandom bit sequence. The DEMUX output test terminal was connected to an oscilloscope to check the eye diagram.

The clock frequency divider was tested using a singleended AC-coupled signal generated by a synthesizer. The divide-by-four clock divider test output was connected to the oscilloscope. The input signal power was adjusted in order to get the proper test output. The testing showed that the divideby-four clock divider operates between 0.1 GHz and 6.8 GHz input frequency that confers with the DEMUX nominal operating range. Fig. 7 shows the 1:4 clock divider sensitivity characteristics. Fig. 7 shows the output clock signal corresponding to the 5 GHz input clock signal.

The current consumption of the SCFL block of the interface is about 750 mA (1.8 V supply) in the testing. The peak current drawn by the CMOS block of the interface is about 140 mA (1.2 V supply) at 9.5 Gbit/s in the simulations. The average current drawn by 1.2 V supply is 68 mA. Total power consumption is about 1.5 W at the same data rate. This is equivalent to 75 mW of power per input data channel.

The opening eye diagram is obtained at the output up to 9.5 Gbit/s input data rate when the DSPU was switched off. With



Fig. 8. Output eye diagrams when the input data rate is 5 Gbit/s (top), and 9.5 Gbit/s (bottom)

the DSPU switched on the switching noise of the DSPU significantly degrade the DEMUX performance. Fig. 8 shows the output eye diagrams when the input data rate is 5 Gbit/s (top) and 9.5 Gbit/s (bottom), respectively.

## VI. DISCUSSIONS

This paper present the design of a 9.5 Gbit/s 20 channel 1:8 DEMUX designed using 130 nm bulk CMOS technology. The DEMUX is used as an input interface of a CMOS DSPU. The design is optimized for power consumption and layout area while maintaining required data rate. The total power consumption of the system is 1.5 W. The switching noise of the CMOS logic DSPU limits the performance of the DEMUX. As this is a part of an ASIC it is difficult to compare with other existing DEMUXes.

#### ACKNOWLEDGMENT

The authors like to acknowledge support form the German academic exchange service (DAAD) and the European commission (EC).

#### References

- R. Noé, "PLL-Free Synchronous QPSK Polarization Multiplex/Diversity Receiver Concept with Digital I&Q Baseband Processing", *IEEE Photon. Technol. Lett.*, Vol. 17, 2005, pp. 887-889.
- [2] T. Pfau, et al. "Polarization-Multiplexed 2.8 Gbit/s Synchronous QPSK Transmission with Real-Time Digital Polarization Tracking", *Proc. ECOC2007*, 8.3.3, Sept. 16-20, 2007, Berlin, Germany.
- [3] T. Pfau, et al. "Ultra-Fast Adaptive Digital Polarization Control in a Realtime Coherent Polarization-Multiplexed QPSK Receiver", *Proc.* OFC/NFOEC'08, OTuM3, Feb. 24-28, 2008, San Diego, CA, USA.
- [4] A. Leven, N. Kaneda, Y. Chen, "A real-time CMA-based 10 Gb/s polarization demultiplexing coherent receiver implemented in an

FPGA", Proc. OFC/NFOEC'08, Proc. OFC/NFOEC'08, OTuO2, Feb. 24-28, 2008, San Diego, CA, USA.

- 24-28, 2008, San Diego, CA, USA.
  [5] H. Sun; K. Wu, K. Roberts, "Real-time measurements of a 40 Gb/s coherent system", *Optics Express*, Vol. 16, 2008, pp. 873-879.
  [6] V. Herath, et al. "Chipset for a Coherent Polarization Multiplexed QPSK Receiver", Proc. OFC/NFOC'09, March 22.- 26, 2009[7] Eby Friedman, "High Performance Clock Distribution Networks", Journal of VLSI Signal Processing 16, pp. 113-116, 1997.
  [8] Z. Lao, et al. "Si Biology 14, Gb/g, 14, Damyutijalogyar IC for System

- [8] Z. Lao, et al. "Si Bipolar 14 Gb/s 1:4-Demultiplexer IC for System Applications", IEEE Journal of Solid-State Circuits, Vol. 31, No. 1, pp. 54-59, Jan. 1996.
- [9] D-S. Cho, et al. "Efficient modeling techniques for IR drop analysis in ASIC designs", ASIC/SOC Conference, pp. 64-68, 1999.