my tech blog » Vivado

Vivado: Failed to install all user apps

eli — Tue, 22 Feb 2022 12:53:04 +0000

Every now and then Vivado whines with

[Common 17-356] Failed to install all user apps.

And every time I’m looking up how to to solve this, and then I find that the command that fixes this is

tclapp::reset_tclstore

in the Tcl command window. And then quit Vivado, and start it again. Why? I’ll never know.

Just wanted it written down where I’ll be looking for it.

Critical Warnings after upgrading a PCIe block for Ultrascale+ on Vivado 2020.1

eli — Tue, 01 Jun 2021 11:06:00 +0000

Introduction

Checking Xillybus’ bundle for Kintex Ultrascale+ on Vivado 2020.1, I got several critical warnings related to the PCIe block. As the bundle is intended to show how Xillybus’ IP core is used for simplifying communication with the host, these warnings aren’t directly related, and yet they’re unacceptable.

This bundle is designed to work with Vivado 2017.3 and later: It sets up the project by virtue of a Tcl script, which among others calls the upgrade_ip function for updating all IPs. Unfortunately, a bug in Vivado 2020.1 (and possibly other versions) causes the upgraded PCIe block to end up misconfigured.

This bug applies to Zynq Ultrascale+ as well, but curiously enough not with Virtex Ultrascale+. At least with my setting there was no problem.

The problem

Having upgraded an UltraScale+ Integrated Block (PCIE4) for PCI Express IP block from Vivado 2017.3 (or 2018.3) to Vivado 2020.1, I got several Critical Warnings. Three during synthesis:

[Vivado 12-4739] create_clock:No valid object(s) found for '-objects [get_pins -filter REF_PIN_NAME=~TXOUTCLK -of_objects [get_cells -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GT*E4_CHANNEL_PRIM_INST}]]'. ["project/pcie_ip_block/source/ip_pcie4_uscale_plus_x0y0.xdc":127]
[Vivado 12-4739] get_clocks:No valid object(s) found for '--of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]'. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":63]
[Vivado 12-4739] get_clocks:No valid object(s) found for '--of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]'. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":64]

and another seven during implementation:

[Vivado 12-4739] create_clock:No valid object(s) found for '-objects [get_pins -filter REF_PIN_NAME=~TXOUTCLK -of_objects [get_cells -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GT*E4_CHANNEL_PRIM_INST}]]'. ["project/pcie_ip_block/source/ip_pcie4_uscale_plus_x0y0.xdc":127]
[Vivado 12-4739] set_clock_groups:No valid object(s) found for '-group [get_clocks -of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]]'. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":63]
[Vivado 12-4739] set_clock_groups:No valid object(s) found for '-group '. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":63]
[Vivado 12-4739] set_clock_groups:No valid object(s) found for '-group [get_clocks -of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]]'. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":64]
[Vivado 12-4739] set_clock_groups:No valid object(s) found for '-group '. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":64]
[Vivado 12-5201] set_clock_groups: cannot set the clock group when only one non-empty group remains. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":63]
[Vivado 12-5201] set_clock_groups: cannot set the clock group when only one non-empty group remains. ["project/pcie_ip_block/synth/pcie_ip_block_late.xdc":64]

The first warning in each group points at this line in ip_pcie4_uscale_plus_x0y0.xdc, which was automatically generated by the tools:

create_clock -period 4.0 [get_pins -filter {REF_PIN_NAME=~TXOUTCLK} -of_objects [get_cells -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GT*E4_CHANNEL_PRIM_INST}]]

And the other at these two lines in pcie_ip_block_late.xdc, also generated by the tools:

set_clock_groups -asynchronous -group [get_clocks -of_objects [get_ports sys_clk]] -group [get_clocks -of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]]
set_clock_groups -asynchronous -group [get_clocks -of_objects [get_pins -hierarchical -filter {NAME =~ *gen_channel_container[1200].*gen_gtye4_channel_inst[3].GTYE4_CHANNEL_PRIM_INST/TXOUTCLK}]] -group [get_clocks -of_objects [get_ports sys_clk]]

So this is clearly about a reference to a non-existent logic cell supposedly named gen_channel_container[1200], and in particular that index, 1200, looks suspicious.

I would have been relatively fine with ignoring these warnings had it been just the set_clock_groups that failed, as these create false paths. If the design implements properly without these, it’s fine. But failing a create_clock command is serious, as this can leave paths unconstrained. I’m not sure if this is indeed the case, and it doesn’t matter all that much. One shouldn’t get used to ignoring critical warnings.

Looking at the .xci file for this PCIe block, it’s apparent that several changes were made to it while upgrading to 2020.1. Among those changes, these three lines were added:

GTHE4_CHANNEL_X49Y99
1200
3

Also, somewhere else in the XCI file, this line was added:

GTHE4_CHANNEL_X49Y99

So there’s a bug in the upgrading mechanism, which sets some internal parameter to select the a nonexistent GT site.

The manual fix (GUI)

To rectify the wrong settings manually, enter the settings of the PCIe block, and click the checkbox for “Enable GT Quad Selection” twice: Once for unchecking, and once for checking it. Make sure that the selected GT hasn’t changed.

Then it might be required to return some unrelated settings to their desired values. In particular, the PCI Device ID and similar attributes change to Xilinx’ default as a result of this. It’s therefore recommended to make a copy of the XCI file before making this change, and then use a diff tool to compare the before and after files, looking for irrelevant changes. Given that this revert to default has been going on for so many years, it seems like Xilinx considers this a feature.

But this didn’t solve my problem, as the bundle needs to set itself correctly out of the box.

Modifying the XCI file? (Not)

The immediate thing to check was whether this problem applies to PCIe blocks that are created in Vivado 2020.1 from scratch inside a project which is set to target KCU116 (which is what the said Xillybus bundle targets). As expected, it doesn’t — this occurs just on upgraded IP blocks: With the project that was set up from scratch, the related lines in the XCI file read:

GTYE4_CHANNEL_X0Y7
1
3

and

GTYE4_CHANNEL_X0Y7

respectively. These are values that make sense.

With this information at hand, my first attempt to solve this was to add the four new lines to the old XCI file. This allowed using the XCI file with Vivado 2020.1 properly, however synthesizing the PCIe block on older Vivado versions failed: As it turns out, all MODELPARAM_VALUE attributes become instantiation parameters for pcie_uplus_pcie4_uscale_core_top inside the PCIe block. However looking at the source file (on 2020.1), these parameters are indeed defined (only in those generated in 2020.1), and yet they are unused, like many other instantiation parameters in this module. So apparently, Vivado’s machinery generates an instantiation parameter for each of these, even if they’re not used. Those unused parameters are most likely intended for scripting.

So this trick made Vivado instantiate the pcie_uplus_pcie4_uscale_core_top with instantiation parameters that it doesn’t have, and hence its synthesis failed. Dead end.

I didn’t examine the possibility to deselect “Enable GT Quad Selection” in the original block, because Vivado 2017.3 chooses the wrong GT for the board without this option.

Workaround with Tcl

Eventually, I solved the problem by adding a few lines to the Tcl script.

Assuming that $ip_name has been set to the name of the PCIe block IP, this Tcl snippet rectifies the bug:

if {![string equal "" [get_property -quiet CONFIG.MASTER_GT [get_ips $ip_name]]]} {
  set_property -dict [list CONFIG.en_gt_selection {true} CONFIG.MASTER_GT {GTYE4_CHANNEL_X0Y7}] [get_ips $ip_name]
}

This snippet should of course be inserted after updating the IP core (with e.g. upgrade_ip [get_ips]). The code first checks if the MASTER_GT is defined, and only if so, it sets it to the desired value. This ensures that nothing happens with the older Vivado versions. Note the “quiet” flag of get_properly, which prevents it from generating an error if the property isn’t defined. Rather, it returns an empty string if that’s the case, which is what the result is compared against.

Setting MASTER_GT this way also rectifies GT_CONTAINER correctly, and surprisingly enough, this doesn’t change anything it shouldn’t, and in particular, the Device IDs remain intact.

However the disadvantage with this solution is that the GT to select is hardcoded in the Tcl code. But that’s fine in my case, for which a specific board (KCU116) is targeted by the bundle.

Another way to go, which is less recommended, is to emulate the check and uncheck of “Enable GT Quad Selection”:

if {![string equal "" [get_property -quiet CONFIG.MASTER_GT [get_ips $ip_name]]]} {
  set_property CONFIG.en_gt_selection {false} [get_ips $ip_name]
  set_property CONFIG.en_gt_selection {true} [get_ips $ip_name]
}

However turning the en_gt_selection flag off and on again also resets the Device ID to default as with manual toggling of the checkbox. And even though it sets the MASTER_GT correctly in my specific case, I’m not sure whether this can be relied upon.

Vivado 2025.2 update

Fast forward to late 2025, it turns out that Vivado 2025.2 suddenly reacts to the workaround by selecting the GTY quad to GTY_Quad_227, and resetting the Device ID to its default value. This doesn’t happen with Vivado 2025.1 and back.

Removing the workaround code still resulted in MASTER_GT being set to GTHE4_CHANNEL_X49Y99. However, as of 2025.1, this makes no difference anymore (actually, also as soon as 2023.1). This property doesn’t influence anything of the PCIe block IP’s files, except for the XCI and XML files, where this property has a different value. As it has no influence on the source files used for implementation, the workaround can be removed safely for all versions starting with 2025.1 (and probably earlier too).

So the update workaround looks like this:

if {[version -short] < 2025.1 && \
    ![string equal "" [get_property -quiet CONFIG.MASTER_GT [get_ips $ip_name]]]} {
  set_property -dict [list CONFIG.en_gt_selection {true} CONFIG.MASTER_GT {GTYE4_CHANNEL_X0Y7}] [get_ips $ip_name]
}

The obvious change is that the workaround isn’t applied from Vivado 2025.1 and later.

Test point placement constraints for KC705

eli — Thu, 10 Sep 2020 16:08:46 +0000

The said board, which is Xilinx’ official development kit for Kintex-7, has an LCD which can be taken off. Its pins can then be used as plain testpoints for logic. These are the placement constraint for this use (Vivado XDC style):

set_property PACKAGE_PIN Y10 [get_ports tp[0]]; # LCD_DB7, pin 1
set_property PACKAGE_PIN AA11 [get_ports tp[1]]; # LCD_DB6, pin 2
set_property PACKAGE_PIN AA10 [get_ports tp[2]]; # LCD_DB5, pin 3
set_property PACKAGE_PIN AA13 [get_ports tp[3]]; # LCD_DB4, pin 4
set_property PACKAGE_PIN AB10 [get_ports tp[4]]; # LCD_E, pin 9
set_property PACKAGE_PIN AB13 [get_ports tp[5]]; # LCD_RW, pin 10
set_property PACKAGE_PIN Y11 [get_ports tp[6]]; # LCD_RS, pin 11

set_property IOSTANDARD LVCMOS15 [get_ports tp[*]]

set_false_path -to [get_ports -filter {NAME=~tp[*]}]

The actual voltage on these pins is 3.3V — there’s a voltage shifter inbetween, which is why these pins can’t be used as inputs.

Xilinx Ultrascale / Ultrascale+ GTH/GTY CPLL calibration

eli — Sun, 23 Aug 2020 16:36:42 +0000

… or why does my GTH/GTY not come out of reset? Why are those reset_rx_done / reset_tx_done never asserted after a reset_all or a reset involving the CPLLs?

What’s this CPLL calibration thing about?

It turns out that some GTH/GTY’s on Ultrascale and Ultrascale+ FPGAs have problems with getting the CPLL to work reliably. I’ll leave PG182 for the details on which ones. So CPLL calibration is a cute name for some workaround logic, based upon the well-known principle, that if something doesn’t work, turn it off, turn it on, and then check again. Repeat.

Well, not quite that simple. There’s also some playing with a bit (they call it FBOOST, jump start or not?) in the secret-sauce CPLL_CFG0 setting.

This way or another, this extra piece of logic simply checks whether the CPLL is at its correct frequency, and if so, it does nothing. If the CPLL’s frequency isn’t as expected, with a certain tolerance, it powers the CPLL off and on (with CPLLPD), resets it (CPLLRESET) and also plays with that magic FBOOST bit. And then tries again, up to 15 times.

The need for access to the GT’s DRPs is not just for that magic register’s sake, though. One can’t just measure the CPLL’s frequency directly, as it’s a few GHz. An FPGA can only work with a divided version of this clock. As there are several possibilities for routing and division of clocks inside the GT to its clock outputs, and the clock dividers depend on the application’s configuration, there’s a need to bring one such clock output to give the CPLL’s output divided by a known number. TXOUTCLK was chosen for this purpose.

So much of the calibration logic does exactly that: It sets some DRP registers to set up a certain relation between the CPLL’s output and TXOUTCLK (divided by 20, in fact), it does its thing, and then returns those register’s values to what they were before.

A word of warning

My initial take on this CPLL calibration thing was to enable it for all targets (see below for how). Can’t hurt, can it? An extra check that the CPLL is fine before kicking off. What could possibly go wrong?

I did this on Vivado 2015.2, and all was fine. And then I tried on later Vivado version. Boom. The GTH didn’t come out of reset. More precisely, the CPLL calibration clearly failed.

I can’t say that I know exactly why, but I caught the change that makes the difference: Somewhere between 2015.2 and 2018.3, the Wizard started to set the GTH’s CPLL_INIT_CFG0 instantiation parameter to 16′b0000001010110010. Generating the IP with this parameter set to its old value, 16′b0000000000011110, made the GTH work properly again.

I compared the reset logic as well as the CPLL calibration logic, and even though there I found a few changes, they were pretty minor (and I also tried to revert some of them, but that didn’t make any difference).

So the conclusion is that the change in CPLL_INIT_CFG0 failed the CPLL calibration. Why? I have no idea. The meaning of this parameter is unknown. And the CPLL calibration just checks that the frequency is OK. So maybe it slows down the lock, so the CPLL isn’t ready when it’s checked? Possibly, but this info wouldn’t help very much anyhow.

Now, CPLL calibration is supposed to be enabled only for FPGA targets that are known to need it. The question is whether the Transceiver IP’s Wizard is clever enough to set CPLL_INIT_CFG0 to a value that won’t make the calibration fail on those. I have no idea.

By enabling CPLL calibration for a target that doesn’t need it, I selected an exotic option, but the result should have worked nevertheless. Surely it shouldn’t break from one Vivado version to another.

So the bottom line is: Don’t fiddle with this option, and if your GTH/GTY doesn’t come out of reset, consider turning CPLL calibration off, and see if that changes anything. And if so, I have no clear advice what to do. But at least the mystery will be resolved.

Note that…

The CPLL calibration is triggered by the GT’s reset_all assertion, as well as with reset_*_pll_and_datapath, if the CPLL is used in the relevant data path. The “reset done” signal for a data path that depends on the CPLL is asserted only if and when the CPLL calibration was successful and the CPLL is locked.
If cplllock_out (if exposed) is never asserted, this could indicate that the CPLL calibration failed. So it makes sense to wait indefinitely for it — better fail loudly than work with a wobbling clock.
Because the DRP clock is used to measure the period of time for counting the number of cycles of the divided CPLL clock, its frequency must be set accurately in the Wizard. Otherwise, the CPLL calibration will most certainly fail, even if the CPLL is perfectly fine.
The calibration state machine takes control of some GT ports (listed below) from when cpllreset_in is deasserted, and until the calibration state machine has finished, with success or failure.
While the calibration takes place, and if the calibration ends up failing, the cplllock_out signal presented to the user logic is held low. Only when the calibration is finished successfully, is the GT’s CPLLLOCK connected to the user logic (after a slight delay, and synchronized with the DRP clock).

Activating the CPLL calibration feature

See “A word of warning” above. You probably don’t want to activate this feature for all FPGA targets.

There are three possible choices for whether the CPLL calibration module is activated in the Wizard’s transceiver. This can’t be set from the GUI, but by editing the XCI file manually. There are two parameters in that file, PARAM_VALUE.INCLUDE_CPLL_CAL and MODELPARAM_VALUE.C_INCLUDE_CPLL_CAL, which should have the same value as follows:

0 — Don’t activate.
1 — Do activate.
2 — Activate only for devices which the Wizard deems have a problem (default).

Changing it from the default 2 to 1 makes Vivado respond with locking the core saying it “contains stale content”. To resolve this, “upgrade” the IP, which triggers a warning that user intervention is necessary.

And indeed, three new ports are added, and this change this addition of ports is also reflected in the XCI file (but nothing else should change): gtwiz_gthe3_cpll_cal_txoutclk_period_in, gtwiz_gthe3_cpll_cal_cnt_tol_in and gtwiz_gthe3_cpll_cal_bufg_ce_in.

These are three input ports, so they have to be assigned values. PG182‘s Table 3-1 gives the formulas for that (good luck with that) and the dissection notes below explain these formulas. But the TL;DR version is:

gtwiz_gthe3_cpll_cal_bufg_ce_in should be assigned with a constant 1′b1.
gtwiz_gthe3_cpll_cal_txoutclk_period_in should be assigned with the constant value of P_CPLL_CAL_TXOUTCLK_PERIOD, as found in the transceiver IP’s synthesis report (e.g. mytransceiver_synth_1/runme.log).
gtwiz_gthe3_cpll_cal_cnt_tol_in should be assigned with the constant value of P_CPLL_CAL_TXOUTCLK_PERIOD, divided by 100.

The description here relates to a single transceiver in the IP.

The meaning of gtwiz_gthe3_cpll_cal_txoutclk_period_in is as follows: Take the CPLL clock and divide it by 80. Count the number of clock cycles in a time period corresponding to 16000 DRP clock cycles. That’s the value to assign, as this is what the CPLL calibration logic expects to get.

gtwiz_gthe3_cpll_cal_cnt_tol_in is the number of counts that the result can be higher or lower than expected, and still the CPLL will be considered fine. As this is taken as the number of expected counts, divided by 100, this results in a ±1% clock frequency tolerance. Which is a good idea, given that common SSC clocking (PCIe, SATA, USB 3.0) might drag down the clock frequency by -5000 ppm, i.e. -0.5%.

The possibly tricky thing with setting these correctly is that they depend directly on the CPLL frequency. Given the data rate, there might be more than one possibility for a CPLL frequency, however it’s not expected that the Wizard will change it from run to run unless something fundamental is changed in the parameters (e.g. changing the data rate of one of the directions or both).

Besides, the CPLL frequency appears in the XCI file as MODELPARAM_VALUE.C_CPLL_VCO_FREQUENCY.

If the CPLL is activated deliberately, it’s recommended to verify that it actually takes place by setting a wrong value for gtwiz_gthe3_cpll_cal_txoutclk_period_in, and check that the calibration fails (cplllock_out remains low).

Which ports are affected?

Looking at ultragth_gtwizard_gthe3.v gives the list of ports that the CPLL calibration logic fiddles with. Within the CPLL calibration generate clause they’re assigned with certain values, and in the “else” clause, with the plain bypass:

    // Assign signals as appropriate to bypass the CPLL calibration block when it is not instantiated
    else begin : gen_no_cpll_cal
      assign txprgdivresetdone_out = txprgdivresetdone_int;
      assign cplllock_int          = cplllock_ch_int;
      assign drprdy_out            = drprdy_int;
      assign drpdo_out             = drpdo_int;
      assign cpllreset_ch_int      = cpllreset_int;
      assign cpllpd_ch_int         = cpllpd_int;
      assign txprogdivreset_ch_int = txprogdivreset_int;
      assign txoutclksel_ch_int    = txoutclksel_int;
      assign drpaddr_ch_int        = drpaddr_int;
      assign drpdi_ch_int          = drpdi_int;
      assign drpen_ch_int          = drpen_int;
      assign drpwe_ch_int          = drpwe_int;
    end

Dissection of Wizard’s output

The name of the IP was ultragth in my case. That’s the significance of this name appearing all over this part.

The impact of changing the XCI file: In the Verilog files that are produced by the Wizard, MODELPARAM_VALUE.C_INCLUDE_CPLL_CAL is used directly when instantiating the ultragth_gtwizard_top, as the C_INCLUDE_CPLL_CAL instantiation parameter.

Also, the three new input ports are passed on to ultragth_gtwizard_top.v, rather than getting all zero assignments when they’re not exposed to the user application logic.

When activating the CPLL calibration (setting INCLUDE_CPLL_CAL to 1) additional constraints are also added to the constraint file for the IP, adding a few new false paths as well as making sure that the timing calculations for the TXOUTCLK is set according to the requested clock source. The latter is necessary, because the calibration logic fiddles with TXOUTCLKSEL during the calibration phase.

In ultragth_gtwizard_top.v the instantiation parameters and the three ports are just passed on to ultragth_gtwizard_gthe3.v, where the action happens.

First, the following defines are made (they like short names in Xilinx):

`define ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__EXCLUDE 0
`define ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__INCLUDE 1
`define ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__DEPENDENT 2

and further down, we have this short and concise condition for enabling CPLL calibration:

    if ((C_INCLUDE_CPLL_CAL         == `ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__INCLUDE) ||
        (((C_INCLUDE_CPLL_CAL       == `ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__DEPENDENT) &&
         ((C_GT_REV                 == 11) ||
          (C_GT_REV                 == 12) ||
          (C_GT_REV                 == 14))) &&
         (((C_TX_ENABLE             == `ultragth_gtwizard_gthe3_TX_ENABLE__ENABLED) &&
           (C_TX_PLL_TYPE           == `ultragth_gtwizard_gthe3_TX_PLL_TYPE__CPLL)) ||
          ((C_RX_ENABLE             == `ultragth_gtwizard_gthe3_RX_ENABLE__ENABLED) &&
           (C_RX_PLL_TYPE           == `ultragth_gtwizard_gthe3_RX_PLL_TYPE__CPLL)) ||
          ((C_TXPROGDIV_FREQ_ENABLE == `ultragth_gtwizard_gthe3_TXPROGDIV_FREQ_ENABLE__ENABLED) &&
           (C_TXPROGDIV_FREQ_SOURCE == `ultragth_gtwizard_gthe3_TXPROGDIV_FREQ_SOURCE__CPLL))))) begin : gen_cpll_cal

which simply means that the CPLL calibration module should be generated if _INCLUDE_CPLL_CAL is 1 (as I changed it to), or if it’s 2 (default) and some conditions for enabling it automatically are met).

Further down, the hint for how to assign those three new ports is given. Namely, if CPLL was added automatically due to the default assignment and specific target FPGA, the values calculated by the Wizard itself are used

      // The TXOUTCLK_PERIOD_IN and CNT_TOL_IN ports are normally driven by an internally-calculated value. When INCLUDE_CPLL_CAL is 1,
      // they are driven as inputs for PLL-switching and rate change special cases, and the BUFG_GT CE input is provided by the user.
      wire [(`ultragth_gtwizard_gthe3_N_CH* 18)-1:0] cpll_cal_txoutclk_period_int;
      wire [(`ultragth_gtwizard_gthe3_N_CH* 18)-1:0] cpll_cal_cnt_tol_int;
      wire [(`ultragth_gtwizard_gthe3_N_CH*  1)-1:0] cpll_cal_bufg_ce_int;
      if (C_INCLUDE_CPLL_CAL == `ultragth_gtwizard_gthe3_INCLUDE_CPLL_CAL__INCLUDE) begin : gen_txoutclk_pd_input
        assign cpll_cal_txoutclk_period_int = {`ultragth_gtwizard_gthe3_N_CH{gtwiz_gthe3_cpll_cal_txoutclk_period_in}};
        assign cpll_cal_cnt_tol_int         = {`ultragth_gtwizard_gthe3_N_CH{gtwiz_gthe3_cpll_cal_cnt_tol_in}};
        assign cpll_cal_bufg_ce_int         = {`ultragth_gtwizard_gthe3_N_CH{gtwiz_gthe3_cpll_cal_bufg_ce_in}};
      end
      else begin : gen_txoutclk_pd_internal
        assign cpll_cal_txoutclk_period_int = {`ultragth_gtwizard_gthe3_N_CH{p_cpll_cal_txoutclk_period_int}};
        assign cpll_cal_cnt_tol_int         = {`ultragth_gtwizard_gthe3_N_CH{p_cpll_cal_txoutclk_period_div100_int}};
        assign cpll_cal_bufg_ce_int         = {`ultragth_gtwizard_gthe3_N_CH{1'b1}};
      end

These `ultragth_gtwizard_gthe3_N_CH things are just duplication of the same vector, in case there are multiple channels for the same IP.

First, note that cpll_cal_bufg_ce is assigned constant 1. Not clear why this port is exposed at all.

And now to the calculated values. Given that it says

      wire [15:0] p_cpll_cal_freq_count_window_int      = P_CPLL_CAL_FREQ_COUNT_WINDOW;
      wire [17:0] p_cpll_cal_txoutclk_period_int        = P_CPLL_CAL_TXOUTCLK_PERIOD;
      wire [15:0] p_cpll_cal_wait_deassert_cpllpd_int   = P_CPLL_CAL_WAIT_DEASSERT_CPLLPD;
      wire [17:0] p_cpll_cal_txoutclk_period_div100_int = P_CPLL_CAL_TXOUTCLK_PERIOD_DIV100;

a few rows above, and

  localparam [15:0] P_CPLL_CAL_FREQ_COUNT_WINDOW      = 16'd16000;
  localparam [17:0] P_CPLL_CAL_TXOUTCLK_PERIOD        = (C_CPLL_VCO_FREQUENCY/20) * (P_CPLL_CAL_FREQ_COUNT_WINDOW/(4*C_FREERUN_FREQUENCY));
  localparam [15:0] P_CPLL_CAL_WAIT_DEASSERT_CPLLPD   = 16'd256;
  localparam [17:0] P_CPLL_CAL_TXOUTCLK_PERIOD_DIV100 = (C_CPLL_VCO_FREQUENCY/20) * (P_CPLL_CAL_FREQ_COUNT_WINDOW/(400*C_FREERUN_FREQUENCY));
  localparam [25:0] P_CDR_TIMEOUT_FREERUN_CYC         = (37000 * C_FREERUN_FREQUENCY) / C_RX_LINE_RATE;

it’s not all that difficult to do the math. And looking at Table 3-1 of PG182, the formulas match perfectly, but I didn’t feel very reassured by those.

So why bother? Much easier to use the values calculated by the tools, as they appear in ultragth_synth_1/runme.log (for a 5 Gb/s rate and reference clock of 125 MHz, but YMMV as there’s more than one way to achieve a line rate):

	Parameter P_CPLL_CAL_FREQ_COUNT_WINDOW bound to: 16'b0011111010000000
	Parameter P_CPLL_CAL_TXOUTCLK_PERIOD bound to: 18'b000000111110100000
	Parameter P_CPLL_CAL_WAIT_DEASSERT_CPLLPD bound to: 16'b0000000100000000
	Parameter P_CPLL_CAL_TXOUTCLK_PERIOD_DIV100 bound to: 18'b000000000000101000
	Parameter P_CDR_TIMEOUT_FREERUN_CYC bound to: 26'b00000011100001110101001000

The bottom line is hence to set gtwiz_gthe3_cpll_cal_txoutclk_period_in to 18′b000000111110100000, and gtwiz_gthe3_cpll_cal_cnt_tol_in to 18′b000000000000101000. Which is 4000 and 40 in plain decimal, respectively.

Dissection of CPLL Calibration module (specifically)

The CPLL calibrator is implemented in gtwizard_ultrascale_v1_5/hdl/verilog/gtwizard_ultrascale_v1_5_gthe3_cpll_cal.v.

Some basic reverse engineering. This may be inaccurate, as I wasn’t very careful about the gory details on this matter. Also, when I say that a register is modified below, it’s to values that are listed after the outline of the state machine (further below).

So just to get an idea:

TXCLKOUTSEL start with value 0.
Using the DRP ports, it fetches the existing values of the PROGCLK_SEL and PROGDIV registers, and modifies their values.
It changes TXCLKOUTSEL to 3′b101, i.e. TXCLKOUT is routed to TXPROGDIVCLK. This can be more than one clock source, but it’s the CPLL directly, divided by PROGDIV (judging by the value assigned to PROGCLK_SEL).
CPLLRESET is asserted for 32 clock cycles, and then deasserted.

The state machine now enters a loop as follows.

The state machine waits 16384 clock cycles. This is essentially waiting for the CPLL to lock, however the CPLL’s lock detector isn’t monitored. Rather, it waits this fixed amount of time.
txprogdivreset is asserted for 32 clock cycles.
The state machine waits for the assertion of the GT’s txprgdivresetdone (possibly indefinitely).
The state machine checks that the frequency counter’s output (more on this below) is in the range of TXOUTCLK_PERIOD_IN ± CNT_TOL_IN. If so, it exits this loop (think C “break” here), with the intention of declaring success. If not, and this is the 15th failed attempt, it exits the loop as well, but with the intention of declaring failure. Otherwise, it continues as follows.
The FBOOST DRP register is read and then modified.
32 clock cycles later, CPLLRESET is asserted.
32 clock cycles later, CPLLPD is asserted for a number of clock cycles (determined by the module’s WAIT_DEASSERT_CPLLPD_IN input), and then deasserted (the CPLL is powered down and up!).
32 clock cycles later, CPLLRESET is deasserted.
The FBOOST DRP register is restored to its original value.
The state machine continues at the beginning of this loop.

And the final sequence, after exiting the loop:

PROGDIV and PROGCLK_SEL are restored to its original value
CPLLRESET is asserted for 32 clock cycles, and then deasserted.
The state machine waits for the assertion of the GT’s cplllock, possibly indefinitely.
txprogdivreset is asserted for 32 clock cycles.
The state machine waits for the assertion of the GT’s txprgdivresetdone (possibly indefinitely).
The state machine finishes. At this point one of the module’s CPLL_CAL_FAIL or CPLL_CAL_DONE is asserted, depending on the reason for exiting the loop.

As for the values assigned when I said “modified” above, I won’t get into that in detail, but just put a related snippet of code. Note that these values are often shifted to their correct place in the DRP registers in order to fulfill their purpose:

  localparam [1:0]  MOD_PROGCLK_SEL = 2'b10;
  localparam [15:0] MOD_PROGDIV_CFG = 16'hA1A2; //divider 20
  localparam [2:0]  MOD_TXOUTCLK_SEL = 3'b101;
  localparam        MOD_FBOOST = 1'b1;

Now, a word about the frequency counter: It’s a bit complicated because of clock domain issues, but what it does is to divide the clock under test by 4, and then count how many cycles the divided clock has during a period of FREQ_COUNT_WINDOW_IN DRP clocks. Which is hardcoded as 16000 clocks.

If we’ll trust the comment saying that PROGDIV is set to 20, it means that the frequency counter gets the CPLL clock divided by 20. It then divides this further by 4, and counts this for 16000 DRP clocks. Which is exactly the formula given in Table 3-1 of PG182.

Are we having fun?

Installing Vivado 2020.1 on Linux Mint 19

eli — Sat, 22 Aug 2020 15:56:37 +0000

… or any other “unsupported” Linux distribution.

… or: How to trick the installer into thinking you’re running one of the supported OSes.

So I wanted to install Vivado 2020.1 on my Linux Mint 19 (Tara) machine. I downloaded the full package, and ran xsetup. A splash window appeared, and soon after it another window popped up, saying that my distribution wasn’t supported, listing those that were, and telling me that I could click “OK” to continue nevertheless. Which I did.

But then nothing happened. Completely stuck. And there was an error message on the console, reading:

$ ./xsetup
Exception in thread "SPLASH_LOAD_MESSAGE" java.lang.IllegalStateException: no splash screen available
	at java.desktop/java.awt.SplashScreen.checkVisible(Unknown Source)
	at java.desktop/java.awt.SplashScreen.getBounds(Unknown Source)
	at java.desktop/java.awt.SplashScreen.getSize(Unknown Source)
	at com.xilinx.installer.gui.H.run(Unknown Source)
Exception in thread "main" java.lang.IllegalStateException: no splash screen available
	at java.desktop/java.awt.SplashScreen.checkVisible(Unknown Source)
	at java.desktop/java.awt.SplashScreen.close(Unknown Source)
	at com.xilinx.installer.gui.G.b(Unknown Source)
	at com.xilinx.installer.gui.InstallerGUI.G(Unknown Source)
	at com.xilinx.installer.gui.InstallerGUI.e(Unknown Source)
	at com.xilinx.installer.api.InstallerLauncher.main(Unknown Source)

This issue is discussed in this thread of Xilinx’ forum. Don’t let the “Solved” title mislead you: They didn’t solve it at all. But one of the answers there gave me the direction: Fool the installer to think my OS is supported, after all. In this specific case there was no problem with the OS, but a bug in the installer that caused it to behave silly after that popup window.

It was also suggested to install Vivado in batch mode with

./xsetup -b ConfigGen

however this doesn’t allow for selecting what devices I want to support. And this is a matter of tons of disk space.

So to make it work, I made changes in some files in /etc, and kept the original files in a separate directory. I also needed to move /etc/lsb-release into that directory as well, so it won’t mess up things

I changed /etc/os-release (which is in fact a symlink to ../usr/lib/os-release on my machine, so watch it) to

NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

and /etc/lsb-release

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"

This might very well be an overkill, but once I got the installation running, I didn’t bother check what the minimal change is. Those who are successful might comment below on this, maybe?

Note that this might not work on a Red Hat based OS, because it seems that there are distribution-dependent issues. Since Linux Mint 19 is derived from Ubuntu 18.04, faking an earlier Ubuntu distro didn’t cause any problems.

This failed for me repeatedly at first, because I kept a copy of the original files in the /etc directory. This made the installation tool read the original files as well as the modified ones. Using strace, I found that the tool called cat with

execve("/bin/cat", ["cat", "/etc/lsb-release", "/etc/os-release", "/etc/real-upstream-release"], 0x5559689a47b0 /* 55 vars */) = 0

It looks like “cat” was called with some wildcard, maybe /etc/*-release? So the solution was to move the original files away to a new directory /etc/real/, and create the fake ones in their place.

Another problem was probably that /etc/lsb-release is a directory in my system. That made the “cat” return a failure status, which can’t be good.

Of course I have an opinion on the entire OS support situation, but if you’re reading this, odds that you agree with me anyhow.

Solved: Kintex-7 / KC705: MiG DDR3 fails to calibrate

eli — Sat, 13 Apr 2019 09:13:24 +0000

As I implemented a MiG controller for KC705′s on-board SODIMM, the controller failed to calibrate at first. Despite that I’ve copied the instantiation and port connections from the example design. As for the pin placement, this was taken care of by the core itself, by virtue of memctl/memctl/user_design/constraints/memctl.xdc within MiG’s dedicated directory (which I called memctl).

And yet init_calib_complete remained low, indicating calibration had failed.

Actually, I had followed Xilinx’ XTP196 slides, except that I didn’t make an example design — I had my own.

Having ruled out holding the MiG controller in reset or a faulty pinout, it turned out that a constraint needs to be added to the application XDC file, namely

set_property slave_banks {32 34} [get_iobanks 33]

What it says (see UG912), is that I/O banks 32 and 34 should calibrate their on-chip terminations (DCI, Digitally Controller Impedance) based upon the reference resistors connected to the dedicated pins on bank 33. Without this constraints, on-chip termination on banks 32 and 34 doesn’t work, and the signal integrity on the relevant I/Os goes down the toilet. No wonder it didn’t calibrate.

The hint for this is on page 37 of XTP196, “Modifications to Example Design”, which tells us to overwrite the example design created by Vivado with a ZIP file Xilinx supplies. On the following page it lists the changes made, among others “Added DCI Cascade constraints to XDC”.

Linux: When Vivado’s GUI doesn’t start with an error on locale

eli — Thu, 20 Sep 2018 18:40:32 +0000

Trying to running Vivado 2017.3 with GUI and all on a remote host with X forwarding, i.e.

$ ssh -X mycomputer

setting the environment with

$ . /path/to/Vivado/2017.3/settings64.sh

it failed with

$ vivado &
terminate called after throwing an instance of 'std::runtime_error'
  what():  locale::facet::_S_create_c_locale name not valid

Now here’s the odd thing: The error message is actually helpful! It is a locale problem:

$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_IL
LC_CTYPE="en_IL"
LC_NUMERIC="en_IL"
LC_TIME="en_IL"
LC_COLLATE="en_IL"
LC_MONETARY="en_IL"
LC_MESSAGES="en_IL"
LC_PAPER="en_IL"
LC_NAME="en_IL"
LC_ADDRESS="en_IL"
LC_TELEPHONE="en_IL"
LC_MEASUREMENT="en_IL"
LC_IDENTIFICATION="en_IL"
LC_ALL=

Checking on a non-ssh terminal, all read “en_US.UTF-8″ instead. The problem seems to be that the SSH is from a newer Linux distro to an older one. “en_IL” is indeed the locale on the newer machine, which is OK there. And SSH changed the locale (which I believe one can avoid, but it’s not worth the effort given the simple workaround below).

So the fix is surprisingly simple:

$ export LC_ALL=en_US.UTF-8

and then check again:

$ locale
LANG=en_IL
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

OK, so LANG is still rubbish, but after this Vivado 2017.3′s GUI is up and running.

I should mention that there was no problem starting Vivado 2014.4′s GUI, but instead it crashed somewhere in the middle of the implementation. Once again, the fixing the locale solved this.

Vivado’s timing analysis on set_input_delay and set_output_delay constraints

eli — Thu, 06 Apr 2017 08:41:23 +0000

OK, what’s this?

This page is the example part of another post, which explains the meaning of set_input_delay and set_output_delay in SDC timing constraints.

As mentioned on the other post, the relevant timing constraints were:

create_clock -name theclk -period 20 [get_ports test_clk]
set_output_delay -clock theclk -max 8 [get_ports test_out]
set_output_delay -clock theclk -min -3 [get_ports test_out]
set_input_delay -clock theclk -max 4 [get_ports test_in]
set_input_delay -clock theclk -min 2 [get_ports test_in]

set_input_delay -max timing analysis (setup)

Slack (MET) :             15.664ns  (required time - arrival time)
  Source:                 test_in
                            (input port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_samp_reg/D
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Setup (Max at Fast Process Corner)
  Requirement:            20.000ns  (theclk rise@20.000ns - theclk rise@0.000ns)
  Data Path Delay:        2.465ns  (logic 0.291ns (11.797%)  route 2.175ns (88.203%))
  Logic Levels:           1  (IBUF=1)
  Input Delay:            4.000ns
  Clock Path Skew:        2.162ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    2.162ns = ( 22.162 - 20.000 )
    Source Clock Delay      (SCD):    0.000ns
    Clock Pessimism Removal (CPR):    0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
                         input delay                  4.000     4.000
    AE20                                              0.000     4.000 r  test_in (IN)
                         net (fo=0)                   0.000     4.000    test_in
    AE20                 IBUF (Prop_ibuf_I_O)         0.291     4.291 r  test_in_IBUF_inst/O
                         net (fo=1, routed)           2.175     6.465    test_in_IBUF
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/D
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)    20.000    20.000 r
    AE23                                              0.000    20.000 r  test_clk (IN)
                         net (fo=0)                   0.000    20.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.077    20.077 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           1.278    21.355    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.026    21.381 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           0.781    22.162    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/C
                         clock pessimism              0.000    22.162
                         clock uncertainty           -0.035    22.126
    SLICE_X0Y1           FDRE (Setup_fdre_C_D)        0.003    22.129    test_samp_reg
  -------------------------------------------------------------------
                         required time                         22.129
                         arrival time                          -6.465
  -------------------------------------------------------------------
                         slack                                 15.664

This analysis starts at time zero, adds the 4 ns (clock-to-output) that was specified in the max input delay constraint, and continues that data path at the fastest possible combination of process, voltage and temperature. Together with the FPGA’s own data path delay (2.465 ns), the total data path delay stands at 6.465 ns.

The clock path is the calculated, once again with the fastest possible combination, starting from the following clock at 20 ns. The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 22.129 ns, which is 15.664 ns after the data arrived to the flip-flop, which is this constraint’s slack.

It’s simple to see from this analysis that the max input delay is the clock-to-output ( + board delay), as it’s added to the data path. So it’s basically how late the data path started. Note the “Max” part in the Path Type above.

set_input_delay -min timing analysis (hold)

Min Delay Paths
--------------------------------------------------------------------------------------
Slack (VIOLATED) :        -0.045ns  (arrival time - required time)
  Source:                 test_in
                            (input port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_samp_reg/D
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Hold (Min at Slow Process Corner)
  Requirement:            0.000ns  (theclk rise@0.000ns - theclk rise@0.000ns)
  Data Path Delay:        3.443ns  (logic 0.626ns (18.194%)  route 2.817ns (81.806%))
  Logic Levels:           1  (IBUF=1)
  Input Delay:            2.000ns
  Clock Path Skew:        5.351ns (DCD - SCD - CPR)
    Destination Clock Delay (DCD):    5.351ns
    Source Clock Delay      (SCD):    0.000ns
    Clock Pessimism Removal (CPR):    -0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
                         input delay                  2.000     2.000
    AE20                                              0.000     2.000 r  test_in (IN)
                         net (fo=0)                   0.000     2.000    test_in
    AE20                 IBUF (Prop_ibuf_I_O)         0.626     2.626 r  test_in_IBUF_inst/O
                         net (fo=1, routed)           2.817     5.443    test_in_IBUF
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/D
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)     0.000     0.000 r
    AE23                                              0.000     0.000 r  test_clk (IN)
                         net (fo=0)                   0.000     0.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.734     0.734 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           2.651     3.385    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.093     3.478 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           1.873     5.351    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_samp_reg/C
                         clock pessimism              0.000     5.351
                         clock uncertainty            0.035     5.387
    SLICE_X0Y1           FDRE (Hold_fdre_C_D)         0.101     5.488    test_samp_reg
  -------------------------------------------------------------------
                         required time                         -5.488
                         arrival time                           5.443
  -------------------------------------------------------------------
                         slack                                 -0.045

This analysis starts at time zero, adds the 2 ns (clock-to-output) that was specified in the min input delay constraint, and continues that data path at the slowest possible combination of process, voltage and temperature. Together with the FPGA’s own data path delay (3.443 ns), the total data path delay stands at 5.443 ns. It should be no surprise that the FPGA’s own delay is bigger compared with the fast analysis above.

The clock path is the calculated, now with the slowest possible combination, starting from the same clock edge at 0 ns. After all, this is a hold calculation, so the question is whether the mat wasn’t swept under the feet of the sampling flip-flop before it managed to sample it.

The clock travels from the input pin to the flip-flop (with no clock network delay compensation, since no PLL is involved), taking into account the calculated jitter. All in all, the clock path ends at 5.488 ns, which is 0.045 ns too late after the data switched. So the constraint was violated, with a negative slack of 0.045.

It’s simple to see from this analysis that the min input delay is the minimal clock-to-output, as it’s added to the data path. So it’s basically how early the data path may start. Note the “Min” part in the Path Type above.

It may come as a surprise that a 2 ns clock-to-output can violate a hold constraint. This shouldn’t be taken lightly — it can cause real problems.

The solution for this case would be to add a PLL to the clock path, which locks the global network’s clock to the input clock. This effectively means pulling it several nanoseconds earlier, which definitely solves the problem.

set_output_delay -max timing analysis (setup)

Slack (MET) :             2.983ns  (required time - arrival time)
  Source:                 test_out_reg/C
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_out
                            (output port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Max at Slow Process Corner
  Requirement:            20.000ns  (theclk rise@20.000ns - theclk rise@0.000ns)
  Data Path Delay:        3.631ns  (logic 2.583ns (71.152%)  route 1.047ns (28.848%))
  Logic Levels:           1  (OBUF=1)
  Output Delay:           8.000ns
  Clock Path Skew:        -5.351ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    0.000ns = ( 20.000 - 20.000 )
    Source Clock Delay      (SCD):    5.351ns
    Clock Pessimism Removal (CPR):    0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
    AE23                                              0.000     0.000 r  test_clk (IN)
                         net (fo=0)                   0.000     0.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.734     0.734 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           2.651     3.385    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.093     3.478 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           1.873     5.351    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_out_reg/C
  -------------------------------------------------------------------    -------------------
    SLICE_X0Y1           FDRE (Prop_fdre_C_Q)         0.223     5.574 r  test_out_reg/Q
                         net (fo=1, routed)           1.047     6.622    test_out_OBUF
    AK21                 OBUF (Prop_obuf_I_O)         2.360     8.982 r  test_out_OBUF_inst/O
                         net (fo=0)                   0.000     8.982    test_out
    AK21                                                              r  test_out (OUT)
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)    20.000    20.000 r
                         clock pessimism              0.000    20.000
                         clock uncertainty           -0.035    19.965
                         output delay                -8.000    11.965
  -------------------------------------------------------------------
                         required time                         11.965
                         arrival time                          -8.982
  -------------------------------------------------------------------
                         slack                                  2.983

Since the purpose of this analysis is to measure the output delay, it starts off with the clock edge, follows it towards the flip-flop, and then along the data path. That sums up to the overall delay. Note that the “Path Type” doesn’t say it’s a setup calculation (to avoid confusion?) even though it takes the following clock (at 20 ns) into consideration.

The calculation takes place at the slowest possible combination of process, voltage and temperature (recall that the input setup calculation took place with the fastest one). Following the clock path, it’s evidently very similar to the clock path of the hold analysis for input delay, which is quite expected, as both are based upon the slow model.

The data path simply continues the clock path until the physical output is stable, calculated at 8.982 ns.

This is compared with the time of the following clock at 20 ns, minus the output delay. Minus the possible jitter (0.035 ns in the case above). Data arrived at 8.982 ns, the moment that counts is at ~12 ns, so there’s almost 3 ns slack.

This demonstrates why set_output_delay -max is the setup time of the receiver: The output delay is reduced from the following clock’s time position, and that’s the goal to meet. That’s exactly the definition of setup time: How long before the following clock the data must be stable.

set_output_delay -min timing analysis (hold)

Slack (MET) :             0.791ns  (arrival time - required time)
  Source:                 test_out_reg/C
                            (rising edge-triggered cell FDRE clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Destination:            test_out
                            (output port clocked by theclk  {rise@0.000ns fall@10.000ns period=20.000ns})
  Path Group:             theclk
  Path Type:              Min at Fast Process Corner
  Requirement:            0.000ns  (theclk rise@0.000ns - theclk rise@0.000ns)
  Data Path Delay:        1.665ns  (logic 1.384ns (83.159%)  route 0.280ns (16.841%))
  Logic Levels:           1  (OBUF=1)
  Output Delay:           -3.000ns
  Clock Path Skew:        -2.162ns (DCD - SCD - CPR)
    Destination Clock Delay (DCD):    0.000ns
    Source Clock Delay      (SCD):    2.162ns
    Clock Pessimism Removal (CPR):    -0.000ns
  Clock Uncertainty:      0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Total Input Jitter      (TIJ):    0.000ns
    Discrete Jitter          (DJ):    0.000ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock theclk rise edge)     0.000     0.000 r
    AE23                                              0.000     0.000 r  test_clk (IN)
                         net (fo=0)                   0.000     0.000    test_clk
    AE23                 IBUF (Prop_ibuf_I_O)         0.077     0.077 r  test_clk_IBUF_inst/O
                         net (fo=1, routed)           1.278     1.355    test_clk_IBUF
    BUFGCTRL_X0Y4        BUFG (Prop_bufg_I_O)         0.026     1.381 r  test_clk_IBUF_BUFG_inst/O
                         net (fo=2, routed)           0.781     2.162    test_clk_IBUF_BUFG
    SLICE_X0Y1           FDRE                                         r  test_out_reg/C
  -------------------------------------------------------------------    -------------------
    SLICE_X0Y1           FDRE (Prop_fdre_C_Q)         0.100     2.262 r  test_out_reg/Q
                         net (fo=1, routed)           0.280     2.542    test_out_OBUF
    AK21                 OBUF (Prop_obuf_I_O)         1.284     3.826 r  test_out_OBUF_inst/O
                         net (fo=0)                   0.000     3.826    test_out
    AK21                                                              r  test_out (OUT)
  -------------------------------------------------------------------    -------------------

                         (clock theclk rise edge)     0.000     0.000 r
                         clock pessimism              0.000     0.000
                         clock uncertainty            0.035     0.035
                         output delay                 3.000     3.035
  -------------------------------------------------------------------
                         required time                         -3.035
                         arrival time                           3.826
  -------------------------------------------------------------------
                         slack                                  0.791

This analysis is similar to the max output delay, only it’s calculated on the fastest possible combination of process, voltage and temperature, and against the same clock edge (and not the following one). So again, going from setup to hold, these are reversed. Once again, the clock path is very similar to the clock path of the setup analysis for input delay, which is quite expected, as both are based upon the fast model.

As before, the data path continues the clock path until the physical output is stable, calculated at 3.826 ns (note the difference with the slow path!).

This is compared with the time of the same clock at 0 ns, minus the output delay, minus the possible jitter (0.035 ns in the case above, not clear why it’s counted if it’s the same clock cycle, but anyhow). Recall that the min output delay was negative (-3 ns), which is why it appears as a positive number in the calculation.

Conclusion: Data was stable until 3.826 ns, and needs to be stable until 3.035. That’s fine, with a 0.791 ns slack.

This demonstrates why set_output_delay -min is minus the hold time of the receiver: Jitter aside, the given output delay with reversed sign is used as the time which the data path delay must exceed. In other words, the data must be stable for that long after the clock. This is the definition of hold time.

Meaning of set_input_delay and set_output_delay in SDC timing constraints

eli — Thu, 06 Apr 2017 08:36:55 +0000

Introduction

Synopsys Design Constraints (SDC) has been adopted by Xilinx (in Vivado, as .xdc files) as well as Altera (in Quartus, as .sdc files) and other FPGA vendors as well. Despite the wide use of this format, there seems to be some confusion regarding the constraints for defining I/O timing.

This post is defines what they mean, and then shows the timing calculations made by Vivado and Quartus (in separate pages), demonstrating their meaning when implementing a very simple example design. So there’s no need to take my word for it, and this also gives a direction on how to check that your own constraints did what they were supposed to do.

There are several options to these constraints, but these are documented elsewhere. This post is about the basics.

And yes, it’s the same format with Xilinx and Altera. Compatibility. Unbelievable, but true.

What they mean

In short,

set_input_delay -clock … -max … : The maximal clock-to-output of the driving chip + board propagation delay
set_input_delay -clock … -min … : The minimal clock-to-output of the driving chip. If not given, choose zero (maybe a future revision of the driving chip will be manufactured with a really fast process)
set_output_delay -clock … -max … : The t_setup time of the receiving chip + board propagation delay
set_output_delay -clock … -min … : Minus the t_hold time of the receiving chip (e.g. set to -1 if the hold time is 1 ns).

Note that if neither -min or -max are given, it’s like two assignments, one with -min and one with -max. In other words: Poor constraining.

The definitions are confusing: set_input_delay defines the allowed range of delays of the data toggle after a clock, but set_output_delay defines the range of delays of the clock after a data toggle. Presumably, the rationale behind this is to match datasheet figures of the device on the other end.

Always constraint both min and max

It may seem meaningless to use the min/max constraints. For example, using a catch-both single set_output_delay sets the setup time correctly, and the hold time to a negative value which is incorrect, but why bother? It allows the output port to toggle before the clock, but that couldn’t happen, could it?

Well, actually it can. For example, it’s quite common to let an FPGA PLL (or alike) generate the internal FPGA clock from the clock at some input pin (the “clock on the board”). This allows the PLL to align the clock on the FPGA’s internal clock network to the input clock, by time-shifting it slightly to compensate for the delay of the clock distribution network.

Actually, the implementation tools may feel free to shift the clock to slightly earlier than the clock input, in order to meet timing better: A slow path from logic to output may violate the maximal delay allowed from clock to output. Moving the clock earlier fixes this. But moving the internal clock to earlier than the clock on the board may switch other outputs that depend on the same clock to before the clock on the board toggles, leading to hold time violations on the receiver of these outputs. Nothing prevents this from happening, except a min output delay constraint.

Outline of example design

We’ll assume test_clk input clock, test_in input pin, and test_out output, with the following relationship:

   always @(posedge test_clk)
     begin
	test_samp <= test_in;
	test_out <= test_samp;
     end

No PLL is used to align the internal clock with the board’s test_clk, so there’s a significant clock delay.

And the following timing constraints applied in the SDC/XDC file:

create_clock -name theclk -period 20 [get_ports test_clk]
set_output_delay -clock theclk -max 8 [get_ports test_out]
set_output_delay -clock theclk -min -3 [get_ports test_out]
set_input_delay -clock theclk -max 4 [get_ports test_in]
set_input_delay -clock theclk -min 2 [get_ports test_in]

As the tools’ timing calculations are rather long, they are on separate pages:

Click here for Quartus’ timing analysis
Click here for Vivado’s timing analysis

PCIe: Xilinx’ pipe_clock module and its timing constraints

eli — Wed, 01 Feb 2017 15:27:45 +0000

Introduction

In several versions of Xilinx’ wrapper for the integrated PCIe block, it’s the user application logic’s duty to instantiate the module which generates the “pipe clock”. It typically looks something like this:

pcie_myblock_pipe_clock #
      (
          .PCIE_ASYNC_EN                  ( "FALSE" ),                 // PCIe async enable
          .PCIE_TXBUF_EN                  ( "FALSE" ),                 // PCIe TX buffer enable for Gen1/Gen2 only
          .PCIE_LANE                      ( LINK_CAP_MAX_LINK_WIDTH ), // PCIe number of lanes
          // synthesis translate_off
          .PCIE_LINK_SPEED                ( 2 ),
          // synthesis translate_on
          .PCIE_REFCLK_FREQ               ( PCIE_REFCLK_FREQ ),        // PCIe reference clock frequency
          .PCIE_USERCLK1_FREQ             ( PCIE_USERCLK1_FREQ ),      // PCIe user clock 1 frequency
          .PCIE_USERCLK2_FREQ             ( PCIE_USERCLK2_FREQ ),      // PCIe user clock 2 frequency
          .PCIE_DEBUG_MODE                ( 0 )
      )
      pipe_clock_i
      (

          //---------- Input -------------------------------------
          .CLK_CLK                        ( sys_clk ),
          .CLK_TXOUTCLK                   ( pipe_txoutclk_in ),     // Reference clock from lane 0
          .CLK_RXOUTCLK_IN                ( pipe_rxoutclk_in ),
          .CLK_RST_N                      ( pipe_mmcm_rst_n ),      // Allow system reset for error_recovery
          .CLK_PCLK_SEL                   ( pipe_pclk_sel_in ),
          .CLK_PCLK_SEL_SLAVE             ( pipe_pclk_sel_slave),
          .CLK_GEN3                       ( pipe_gen3_in ),

          //---------- Output ------------------------------------
          .CLK_PCLK                       ( pipe_pclk_out),
          .CLK_PCLK_SLAVE                 ( pipe_pclk_out_slave),
          .CLK_RXUSRCLK                   ( pipe_rxusrclk_out),
          .CLK_RXOUTCLK_OUT               ( pipe_rxoutclk_out),
          .CLK_DCLK                       ( pipe_dclk_out),
          .CLK_OOBCLK                     ( pipe_oobclk_out),
          .CLK_USERCLK1                   ( pipe_userclk1_out),
          .CLK_USERCLK2                   ( pipe_userclk2_out),
          .CLK_MMCM_LOCK                  ( pipe_mmcm_lock_out)

      );

Consequently, some timing constraints that are related to the PCIe block’s internal functionality aren’t added automatically by the wrapper’s own constraints, but must be given explicitly by the user of the block, typically by following an example design.

This post discusses the implications of this situation. Obviously, none of this applies to PCIe block wrappers which handle this instantiation internally.

What is the pipe clock?

For our narrow purposes, the PIPE interface is the parallel data part of the SERDES attached to the Gigabit Transceivers (MGTs), which drive the physical PCIe lanes. For example, data to a Gen1 lane, running at 2.5 GT/s, requires 2.0 Gbit/s of payload data (as it’s expanded by a 10/8 ratio with 10b/8b encoding). If the SERDES is fed with 16 bits in parallel, a 125 MHz clock yields the correct data rate (125 MHz * 16 = 2 GHz).

By the same coin, a Gen2 interface requires a 250 MHz clock to support a payload data rate of 4.0 Gbit/s per lane (expanded into 5 GT/s with 10b/8b encoding).

The clock mux

If a PCIe block is configured for Gen2, it’s required to support both rates: 5 GT/s, and also be able to fall back to 2.5 GT/s if the link partner doesn’t support Gen2 or if the link doesn’t work properly at the higher rate.

In the most common setting (or always?), the pipe clock is muxed between two source clocks by this piece of code (in the pipe_clock module):

    //---------- PCLK Mux ----------------------------------
    BUFGCTRL pclk_i1
    (
        //---------- Input ---------------------------------
        .CE0                        (1'd1),
        .CE1                        (1'd1),
        .I0                         (clk_125mhz),
        .I1                         (clk_250mhz),
        .IGNORE0                    (1'd0),
        .IGNORE1                    (1'd0),
        .S0                         (~pclk_sel),
        .S1                         ( pclk_sel),
        //---------- Output --------------------------------
        .O                          (pclk_1)
    );
    end

So pclk_sel, which is a registered version of the CLK_PCLK_SEL input port is used to switch between a 125 MHz clock (pclk_sel == 0) and a 250 MHz clock (clk_sel == 1), both clocks generated from the same MMCM_ADV block in the pipe_clock module.

The BUFGMUX’ output, pclk_1 is assigned as the pipe clock output (CLK_PCLK). It’s also used in other ways, depending on the instantiation parameters of pipe_clock.

Constraints for Gen1 PCIe blocks

If a PCIe block is configured for Gen1 only, there’s no question about the pipe clock’s frequency: It’s 125 MHz. As a matter of fact, if the PCIE_LINK_SPEED instantiation parameter is set to 1, one gets (by virtue of Verilog’s generate commands)

    BUFG pclk_i1
    (
        //---------- Input ---------------------------------
        .I                          (clk_125mhz),
        //---------- Output --------------------------------
        .O                          (clk_125mhz_buf)
    );
    assign pclk_1 = clk_125mhz_buf;

But never mind this — it’s never used: Even when the block is configured as Gen1 only, PCIE_LINK_SPEED is set to 3 in the example design’s instantiation, and we all copy from it.

Instead, the clock mux is used and fed with pclk_sel=0. The constraints reflect this with the following lines appearing in the example design’s XDC file for Gen1 PCIe blocks (only!):

set_case_analysis 1 [get_pins {pcie_myblock_support_i/pipe_clock_i/pclk_i1_bufgctrl.pclk_i1/S0}]
set_case_analysis 0 [get_pins {pcie_myblock_support_i/pipe_clock_i/pclk_i1_bufgctrl.pclk_i1/S1}]
set_property DONT_TOUCH true [get_cells -of [get_nets -of [get_pins {pcie_myblock_support_i/pipe_clock_i/pclk_i1_bufgctrl.pclk_i1/S0}]]]

The first two commands tell the timing analysis tools to assume that the clock mux’ inputs are S0=1 and S1=0, and hence that the mux forwards the 125 MHz clock (connected to I0).

The DONT_TOUCH constraint works around a bug in early Vivado revisions, as explained in AR #62296: The S0 input is assigned ~pclk_sel, which requires a logic inverter. This inverter was optimized into the BUFCTRL primitive by the synthesizer, flipping the meaning of the first set_case_analysis constraints. Which caused the timing tools to analyze the design as if both S0 and S1 were set to zero, hence no clock output, and no constraining of the relevant paths.

The problem with this set of constraints is their cryptic nature: It’s not clear at all why they are there, just by reading the XDC file. If the user of the PCIe block decides, for example, to change from a 8x Gen1 configuration to 4x Gen2, everything will appear to work nicely, since all clocks except the pipe clock remain the same. It takes some initiative and effort to figure out that these constraints are incorrect for a Gen2 block.

To make things even worse, almost all relevant paths will meet the 250 MHz (4 ns) requirement even when constrained for 125 MHz on a sparsely filled FPGA, simply because there’s little logic along these paths. So odds are that everything will work fine during the initial tests (before the useful logic is added to the design), and later on the PCIe interface may become shaky throughout the design process, as some paths accidentally exceed the 4 ns limit.

Dropping the set_case_analysis constraints

As these constraints are relaxing by their nature, what happens if they are dropped? Once could expect that the tools would work a bit harder to ensure that all relevant paths meet timing with either 125 MHz or 250 MHz, or simply put, that the constraining would occur as if pclk_1 was always driven with a 250 MHz clock.

But this isn’t how timing calculations are made. The tools can’t just pick the faster clock from a clock mux and follow through, since the logic driven by the clock might interact with other clock domains. If so, a slower clock might require stricter timing due to different relations between the source and target clock’s frequencies.

So what actually happens is that the timing tools mark all logic driven by the pipe clock as having multiple clocks: The timing of each path going to and from any such logic element is calculated for each of the two clocks. Even the timing for paths going between logic elements that are both driven by the pipe clock are calculated four times, covering the four combinations of the 125 MHz and 250 MHz clocks, as source and destination clocks.

From a practical point of view, this is rather harmless, since both clocks come from the same MMCM_ADV, and are hence aligned. Making these excessive timing calculations always ends up with the equivalent for the 250 MHz clock only (some clock skew uncertainty possibly added for going between the two clocks). Since timing is met easily on these paths, this extra work adds very little to the implementation efforts (and how long it takes to finish).

On the other hand, this adds some dirt to the timing report. First, the multiple clocks are reported (excerpt from the Timing Report):

7. checking multiple_clock
--------------------------
 There are 2598 register/latch pins with multiple clocks. (HIGH)

Later on, the paths between logic driven by the pipe clock are counted as inter clock paths: Once from 125 MHz to 250 MHz, and vice versa. This adds up to a large number of bogus inter clock paths:

------------------------------------------------------------------------------------------------
| Inter Clock Table
| -----------------
------------------------------------------------------------------------------------------------

From Clock    To Clock          WNS(ns)      TNS(ns)  TNS Failing Endpoints  TNS Total Endpoints      WHS(ns)      THS(ns)  THS Failing Endpoints  THS Total Endpoints
----------    --------          -------      -------  ---------------------  -------------------      -------      -------  ---------------------  -------------------
clk_250mhz    clk_125mhz          0.114        0.000                      0                 5781        0.053        0.000                      0                 5781
clk_125mhz    clk_250mhz          0.114        0.000                      0                 5764        0.053        0.000                      0                 5764

Since a single endpoint might produce many paths (e.g. a block RAM), there’s no need for a correlation between the number of endpoints and the number of paths. However the similarity between the figures of the two directions seems to indicate that the vast majority of these paths are bogus.

So dropping the set_case_analysis constraints boils down to some noise in the timing report. I can think of two ways to eliminate it:

Issue set_case_analysis constraints setting S0=0, S1=1, so the tools assume a 250 MHz clock. This covers the Gen2 case as well as Gen1.
Use the constraints of the example design for a Gen2 block (shown below).

Even though both ways (in particular the second) seem OK to me, I prefer taking the dirt in the timing report and not add constraints without understanding the full implications. Being more restrictive never hurts (as long as the design meets timing).

Constraints for Gen2 PCIe blocks

If a PCIe block is configured for Gen2, it has to be able to work a Gen1 as well. So the set_case_analysis constraints are out of the question.

Instead, this is what one gets in the example design:

create_generated_clock -name clk_125mhz_x0y0 [get_pins pcie_myblock_support_i/pipe_clock_i/mmcm_i/CLKOUT0]
create_generated_clock -name clk_250mhz_x0y0 [get_pins pcie_myblock_support_i/pipe_clock_i/mmcm_i/CLKOUT1]
create_generated_clock -name clk_125mhz_mux_x0y0 \
                        -source [get_pins pcie_myblock_support_i/pipe_clock_i/pclk_i1_bufgctrl.pclk_i1/I0] \
                        -divide_by 1 \
                        [get_pins pcie_myblock_support_i/pipe_clock_i/pclk_i1_bufgctrl.pclk_i1/O]
#
create_generated_clock -name clk_250mhz_mux_x0y0 \
                        -source [get_pins pcie_myblock_support_i/pipe_clock_i/pclk_i1_bufgctrl.pclk_i1/I1] \
                        -divide_by 1 -add -master_clock [get_clocks -of [get_pins pcie_myblock_support_i/pipe_clock_i/pclk_i1_bufgctrl.pclk_i1/I1]] \
                        [get_pins pcie_myblock_support_i/pipe_clock_i/pclk_i1_bufgctrl.pclk_i1/O]
#
set_clock_groups -name pcieclkmux -physically_exclusive -group clk_125mhz_mux_x0y0 -group clk_250mhz_mux_x0y0

This may seem tangled, but says something quite simple: The 125 MHz and 250 MHz clocks are physically exclusive (see AR #58961 for an elaboration on this). In other words, these constraints declare that no path exists between logic driven by one clock and logic driven by the other. If such path is found, it’s bogus.

So this drops all the bogus paths mentioned above. Each path between logic driven by the pipe clock is now calculated twice (for 125 MHz and 250 MHz, but not across the clocks). This seems to yield the same practical results as without these constraints, but without complaints about multiple clocks, and of course no inter-clock paths.

Both clocks are still related to the pipe clock however. For example, checking a register driven by the pipe clock yields (Tcl session):

get_clocks -of_objects [get_pins -hier -filter {name=~*/pipe_clock_i/pclk_sel_reg1_reg[0]/C}]
clk_250mhz_mux_x0y0 clk_125mhz_mux_x0y0

Not surprisingly, this register is attached to two clocks. The multiple clock complaint disappeared thanks to the set_clock_groups constraint (even the lower “asynchronous” flag is enough for this purpose).

So can these constraints be used for a Gen1-only block, as a safer alternative for the set_case_analysis constraints? It seems so. Is it a good bargain for getting rid of those extra notes in the timing report? It’s a matter of personal choice. Or knowing for sure.

Bonus: Meaning of some instantiation parameters of pipe_clock

This is the meaning according to dissection of Kintex-7′s pipe_clock Verilog file. It’s probably the same for other targets.

PCIE_REFCLK_FREQ: The frequency of the reference clock

1 => 125 MHz
2 => 250 MHz
Otherwise: 100 MHz

CLKFBOUT_MULT_F is set to that the MCMM_ADV’s internal VCO always runs at 1 GHz. Hence the constant CLKOUT0_DIVIDE_F = 8 makes clk_125mhz run at 125 MHz (dividing by 8), and CLKOUT1_DIVIDE = 4 makes clk_250mhz run at 250 MHz (dividing by 8)

PCIE_USERCLK1_FREQ: The frequency of the module’s CLK_USERCLK1 output, which is among others the clock with the user interface (a.k.a. user_clk_out or axi_clk)

1 => 31.25 MHz
2 => 62.5 MHz
3 => 125 MHz
4 => 250 MHz
5 => 500 MHz
Otherwise: 62.5 MHz

PCIE_USERCLK2_FREQ: The frequency of the module’s CLK_USERCLK2 output. Not used in most applications. Same frequency mapping as PCIE_USERCLK1_FREQ.