Wednesday, October 30, 2013

Desktop raised floor

It's been a while since I've posted about a project I've done rather than a tool or some of my reversing work. This one is purely mechanical too!

First, a little background. I have a lot of FPGA/CPLD/MCU dev boards on my desk. By "a lot" I don't mean two or three... more like 20. Powering this much hardware presents some interesting problems. I don't have that many USB ports (and many of them need more power than USB can provide). Wallwarts are another obvious solution, but I don't have enough outlets or wallwarts to power 20 boards either!

I made three bar-shaped USB hubs with male mini-B ports, to plug into small development boards backplane-style. This helped a bit, but as my collection of boards grew the situation got worse.

By last May, my desk looked something like this:

My desk full of cables
Despite extensive efforts to manage the cable disaster with split tubing, there was still a giant octopus. Worse yet, my power strips were full and half of my boards didn't even have power.

The first step was to replace the loose boards with a datacenter-style "raised floor". I bought a 2x3 foot sheet of clear blue acrylic from McMaster-Carr, carefully floorplanned where all of the boards would go, and then drilled holes for each board's mounting standoffs.

Drilling holes
This operation had to be done out on the kitchen table because my office was too small to work comfortably in.

Mounting USB hubs
I mounted all of the USB hubs to the underside of the board in order to save space on top for dev boards and things I was likely to need to probe. While this seemed a good idea at first, reaching underneath them to run cables was a little tricky. After finishing the build I replaced the legs with ones several inches longer to provide the necessary hand clearance.

Before running cables, I attached all of the boards and brought it back to my desk to test the fit.

The apparatus on my desk
The "hostnames" on labels below each board are used as node names for my batch scheduler and unit test framework (more on that in a future post). In addition, those boards with Ethernet interfaces are assigned a constant IP address by my DHCP server, recorded in DNS with that hostname so I can write test cases using hostnames instead of raw IP addresses.

In an effort to reduce cable mess, I made custom cut-to-size USB cables out of cat5 cable and soldered on USB plugs. This was a very slow and laborious process because the connectors tended to melt very easily no matter what temperature I ran the iron at. BGA is no problem for me but these connectors gave me a hard time; I had yields somewhere around 60-70% even after rework. The rest of the time the connectors were melted beyond repair.

Despite the pain, I think the results were worth it. I was a little worried about signal quality as USB is supposed to be 90 ohm Zdiff and cat5e is 100, but I've noticed no problems. I did try to find 90 ohm cables but had trouble locating any.

Custom USB cables
After running all of the cables I could, a few of the boards were still unpowered and there were wallwarts everywhere, but the data wiring was a bit neater. Definitely a step in the right direction, but more work was needed.
After initial deployment

After taking that picture, I replaced most of the red electrical tape with zip ties and stick-on mount points. This made the setup a lot neater but I don't have any photos of that handy.

In order to tidy it up properly, I needed to tackle the power problem. My solution to that is a bit of a long story so I'll save that for next post :)

Tuesday, October 1, 2013

SoC framework, part 5: JtagDebugController and nocswitch

All of the JTAG utilities I've been mentioning are quite handy if you need to load a bitstream onto a board from one of several workstations. But JTAG is capable of much more, including powerful on-chip debug features.

One of the often-overlooked hard IP blocks in Xilinx FPGAs is BSCAN. This primitive (usually described in the FPGA's configuration user guide) connects a JTAG data register for certain special instructions to FPGA fabric.

Xilinx 6 and 7 series FPGAs each contain four BSCANs, one connected to each of the four JTAG instructions USER1...USER4. These are very rarely used by user designs, but Xilinx utilities like ChipScope and the in-system SPI programming cores use them to communicate with the FPGA without needing additional connections.

The primitive is named BSCAN_SPARTAN6 in Spartan-6 and BSCANE2 in 7 series. As far as I can tell, both are functionally equivalent.


BSCAN_SPARTAN6 #(
 .JTAG_CHAIN(1)
)
user1_bscan (
 .SEL(instruction_active),
 .TCK(tck),
 .CAPTURE(state_capture_dr),
 .RESET(state_reset),
 .RUNTEST(state_runtest),
 .SHIFT(state_shift_dr),
 .UPDATE(state_update_dr),
 .DRCK(tck_gated),
 .TMS(tms),
 .TDI(tdi),
 .TDO(tdo)
);

The JTAG_CHAIN parameter specifies which of the four user instructions to use. I'll summarize the interesting ports below including some notes:
  • SEL goes high whenever USERx is loaded into the instruction register, regardless of the test state machine's current state.
  • CAPTURE, RESET, RUNTEST, SHIFT, UPDATE are one-hot flags that go high when the corresponding DR state is active. When the state machine is in the IR shift path, all flags are held low.
  • TMS is of little practical use since the state machine is already implemented for you.
  • TCK provides direct access to the JTAG clock. (Be sure to create a timing constraint for any signals clocked by this net.) In my experience the Xilinx tools often do not recognize this signal as a clock and use high-skew local routing; manual insertion of a BUFG/BUFH is advised for optimal results.
  • TDI and TDO are connected to the corresponding JTAG pins when in the SHIFT-DR state. You can connect any fabric logic you want to them.
Given this core plus libjtaghal on the PC side, we have a solid framework for building an on-chip debug system! The first step is to decide what sort of data to move over the link. Since my framework is NoC based, raw NoC frames seemed the natural choice. This would create a sort of layer-3 VPN encapsulating RPC/DMA transactions within JTAG scan operations.

After some experimenting with protocols I came up with one that seemed to work reasonably well. USER1 is the status/control register, USER2 is the RPC data register, and USER3 is the DMA data register. USER4 is left free for future expansion.

The FPGA side of the link is a module called JtagDebugController. It exposes RPC and DMA ports to the NoC; my current convention calls for addresses in subnet c000/2 to be routed to the debug bridge.

I'm deliberately not describing the actual on-wire protocol in depth because it's still in flux; when I get closer to a stable release I'll document it somewhere.

The PC side of the link is a C++ application using libjtaghal called "nocswitch". Example usage:

$./x86_64-linux-gnu/nocswitch --server localhost --port 50100 --lport 50101
Emulated NoC switch [SVN rev 1253:1254M] by Andrew D. Zonenberg.

License: 3-clause ("new" or "modified") BSD.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Connected to JTAG daemon at localhost:50100
Querying adapter...
    Remote JTAG adapter is a Dev board JTAG (232H) (serial number "FTWOON60", userid "FTWOON60", frequency 10.00 MHz)
Initializing chain...
Scan chain contains 1 devices
Device  0 is a Xilinx XC6SLX25 stepping 2
    Virtual TAP status register is  1000adba
    Valid NoC endpoint detected

This spawns a nocswitch listening on localhost:50101 connecting to a jtagd at localhost:50100.

Once nocswitch is running, it polls the status register on USER1 constantly waiting for the "new RPC message" or "new DMA message" bit to be set. (This causes a lot of traffic on the nocswitch-jtagd link and uses a decent amount of CPU on the host; my custom 8-port ICE will include FPGA based polling and an onboard nocswitch along with the jtagd's to avoid this problem.)

Client applications can then connect to nocswitch via a TCP-based protocol. The nocswitch assigns an address in c000/2 to each client in a manner somewhat reminiscent of DHCP; client applications (on the same machine or elsewhere on the LAN) can then send and receive NoC packets directly to the device under test. Multiple clients are fully supported; the nocswitch performs layer-2 switching between clients and the DUT as needed.

Nocswitch is able to switch frames from one client to another as well as just to the DUT; this permits a client to send messages to a NoC address without caring about whether it's a core in the SoC, a PC-side unit test, or even an RTL simulation (my mechanism for doing the latter will be described in a future post).

From a test case author's perspective, the NocSwitchInterface class implements the RPCAndDMAInterface class and supports the usual complement of operations.

printf("Connecting to nocswitch server...\n");
NOCSwitchInterface iface;
iface.Connect(server, port);

uint16_t eaddr = nameserver.ForwardLookup("eth0");
printf("eth0 is at %04x\n", eaddr);

printf("Resetting interface...\n");
iface.RPCFunctionCall(eaddr, ETH_RESET, 0, 0, 0, rxm);

Finally, here's a sneak peek at what's coming in future posts:
  • Hardware cosimulation, including a workaround for ISim's lack of Verilog PLI support
  • Splash, my build system inspired by Google Blaze
  • RED TIN, my internal logic analyzer (ChipScope/SignalTap replacement with lots of features useful in my work, like state machine decoding, RLE, and time-scale compression)
  • A look at both the hardware and software sides of the infrastructure for my dev board farm (batch scheduling, distributed build, automated testing, managed power distribution, and more). Hooking a single board up to a single JTAG dongle works fine if you only have one device but becomes a lot more of a pain to maintain when you have over twenty dev boards with more on the way!