**** BEGIN LOGGING AT Sun Jul 16 02:59:57 2006 Jul 16 09:29:42 vmaster: you around? Jul 16 09:36:46 I'm trying to do some JTAG speed calculations to see if I understand the influence of latency correctly. Jul 16 09:37:12 If anyone has some numbers I could use to verify the calculations, please let me know. Jul 16 10:23:54 Griffon26: I'm here Jul 16 10:24:20 I'm trying to match my calculations with what you told me about your transfer speeds Jul 16 10:25:27 you said you could transfer 54 bytes in one go Jul 16 10:26:01 for those you'd have to wait for an ack, so that's 2 * latency Jul 16 10:26:34 this was USB1.x, so max 12Mbps and 1ms latency Jul 16 10:26:57 and you got a speed of several 100s K/s? Jul 16 10:28:19 there are two ways to transfer data on an ARM7/9 Jul 16 10:29:54 one allows me to transfer 54 byte at a time, the other transfers only 4 byte at a time, but i continue without waiting for an ack Jul 16 10:31:46 i'll just run some tests to give you definite numbers Jul 16 10:31:55 ok, great! thanks Jul 16 10:42:33 okay: 100% save transfers: ~25kb/s Jul 16 10:43:09 same method, but without waiting for the target: ~50kb/s Jul 16 10:43:23 different method, with a small handler running on the core to accept the data: ~100kb/s Jul 16 10:43:57 with fewer processes running i get about 20% better results Jul 16 10:46:48 and your jtag adapter is manually toggling the clock pin, right? Jul 16 10:47:03 I mean, the clock is also coming from USB data? Jul 16 10:52:08 assuming a JTAG overhead of a factor of 30, I come to 26K/s with 1ms latency and 51K/s with 0ms latency (both USB1.x). But I may be over simplifying. Jul 16 10:52:36 I'm not taking into account any bottlenecks in the other parts of the probe and a JTAG overhead of 30 seems excessive. Jul 16 10:59:34 i'm using the MPSSE Jul 16 10:59:46 so the clock is toggled for every bit shifted to TMS or TDI/TDO Jul 16 11:00:30 ah, that's good Jul 16 11:00:49 you'll also have to build large queues of commands Jul 16 11:01:43 because every scan you send takes at least one USB frame Jul 16 11:02:23 that means 54 bytes transferred per frame, right? Jul 16 11:04:00 oh wait.. I found an error in my calcs Jul 16 11:04:01 54 byte of payload Jul 16 11:04:18 but that's a lot more scans Jul 16 11:04:23 yes, I know Jul 16 11:05:30 I counted only 1 latency time Jul 16 11:05:38 should be 2 I think, per block Jul 16 11:07:38 mhh, you can't calculate the latency that easily, i guess Jul 16 11:07:50 it's max you mean Jul 16 11:08:11 yeah, it's just an upper bound Jul 16 11:08:59 well, it's safe to say that the latency of the reply is just added in full.. for the first one I should do some rounding up Jul 16 11:12:42 hmm.. you can always send 54 bytes within one microframe, so latency is the only thing that influences speed Jul 16 11:13:01 that can't be right Jul 16 11:18:11 if you have 1ms microframes, if 54 bytes fit in one frame (incl overhead) @ 12Mbps, if you wait for ack -> 54 bytes take 2 ms -> 26K/s. Huh? Jul 16 11:21:51 http://mmd.ath.cx/usb_latency.log Jul 16 11:22:22 that's the log output from my openocd, when writing 128kb using the 100% safe method to an ARM7TDMI-S Jul 16 11:23:00 the TCK frequency was limited to 2 MHz Jul 16 11:26:09 3 MHz would be possible, too, but don't improve the performance, and the arm7tdmi-s can't take more than 1/6th of it's core frequency Jul 16 11:26:16 which might be 14mhz at startup Jul 16 12:01:06 the actual writing starts at line 6222, after a reset, some initializations and querying the boards flash Jul 16 12:01:42 the Info: ... 6293,1 4% Jul 16 12:01:53 ftd2xx_execute_queue() lines Jul 16 12:02:04 display the USB latency Jul 16 12:02:23 inter is right after handing the buffer to the FTDI lib Jul 16 12:03:33 inter2 could actually be removed, the code between inter and inter2 isn't enabled on current builds Jul 16 12:03:57 and end is right after FT_Read returned the requested data (including the ACK) Jul 16 12:05:59 I'd say printing is incorrect Jul 16 12:06:26 I think the fractional part is leaving out leading zeroes Jul 16 12:06:58 mhh, it's seconds.microseconds Jul 16 12:07:10 that's what I figured Jul 16 12:08:51 yeah, it isn't meant to be a float, and sec: %i usec: %i might be more appropriate, but I haven't enabled that output for months Jul 16 12:10:39 ok, it looks like 2ms on average, which would make sense Jul 16 12:12:07 now if I assume a JTAG overhead factor of 29, I come to 53K/s if you're not waiting for acks and it also approaches the theoretical maximum of USB1.1 when using 54 bytes/block Jul 16 12:12:53 if I look at USB2.0, the optimum bytes per block would be around 270 and the speed with acks would be about 1Mbyte/s and without acks 2Mbyte/s Jul 16 12:15:53 writing 56 bytes takes 1065 tck cycles, so the overhead is only 2.37 Jul 16 12:16:40 how do you figure? It must be at least 8 because of bytes->bits, right? Jul 16 12:16:54 oh well, 2.37 tck's per bit Jul 16 12:18:34 i calculated the number of tcks a while ago Jul 16 12:19:38 the 50-60 kb/s when not waiting for an ack seems to be caused by the FT2232C itself, not the USB comms Jul 16 12:19:47 ok, I'm lost. I thought nothing much was done by the hardware and that you wrote one byte to set the state of the pins for one TCK cycle Jul 16 12:20:15 did you have a look at the FT2232C MPSSE appnote? Jul 16 12:20:20 I don't think so Jul 16 12:22:28 http://www.ftdichip.com/Documents/AppNotes/AN2232C-01_MPSSE_Cmnd.pdf Jul 16 12:22:44 ok Jul 16 12:26:50 the command byte specifies on which edge that should be read/written, if data is transmitted lsb/msb first, whether it's bits or bytes and what lines should be written Jul 16 12:27:20 the next byte specifies the length Jul 16 12:27:33 followed by data bytes Jul 16 12:36:44 oh, 6MHz max Jul 16 12:39:08 it's a nice, the MPSSE, but it's just too slow Jul 16 12:59:58 i guess you can only come close to the 6MHz if all you do is writes of a very large number of bytes (max is 65536) Jul 16 13:01:15 of course the mpsse isn't as fast as i'd like it to be, but it's a very convenient solution, and quite cheap, too Jul 16 13:02:03 we've discussed alternatives here every now and then, and AchiestDragon is even working on a PCB, but this is going to be very complex Jul 16 13:04:58 at higher speeds, the optimistic strategy ("the core is going to be fast enough") is more likely to fail, so you'll have to wait for the ACKs Jul 16 13:05:32 and then latency kicks in, and even USB 2.0 hi-speed with its 125us is going to set your limit Jul 16 13:06:04 so you'll have to move the ACK detection and waiting into the device Jul 16 13:07:46 which means either you define a generic jtag protocol (like mpsse, but more like a script language e.g. STAPL), or you make your probe target dependent Jul 16 13:10:48 what if you use an FPGA? Jul 16 13:11:21 that's what AchiestDragon is working on Jul 16 13:11:26 oh, ok Jul 16 13:11:27 iirc an ARM9 + FPGA Jul 16 13:11:37 but it's still very complex Jul 16 13:11:52 what is doing the USB stuff? the fpga? Jul 16 13:12:27 sorry, don't know **** ENDING LOGGING AT Mon Jul 17 02:59:58 2006