|  | AMCC suggested to set the PMU bit to 0 for best performace on the | 
|  | PPC440 DDR controller. The 440er common DDR setup files (sdram.c & | 
|  | spd_sdram.c) are changed accordingly. So all 440er boards using | 
|  | these setup routines will automatically receive this performance | 
|  | increase. | 
|  |  | 
|  | Please see below some benchmarks done by AMCC to demonstrate this | 
|  | performance changes: | 
|  |  | 
|  |  | 
|  | ---------------------------------------- | 
|  | SDRAM0_CFG0[PMU] = 1 (U-boot default for Bamboo, Yosemite and Yellowstone) | 
|  | ---------------------------------------- | 
|  | Stream benchmark results | 
|  | ------------------------------------------------------------- | 
|  | This system uses 8 bytes per DOUBLE PRECISION word. | 
|  | ------------------------------------------------------------- | 
|  | Array size = 2000000, Offset = 0 | 
|  | Total memory required = 45.8 MB. | 
|  | Each test is run 10 times, but only | 
|  | the *best* time for each is used. | 
|  | ------------------------------------------------------------- | 
|  | Your clock granularity/precision appears to be 1 microseconds. | 
|  | Each test below will take on the order of 112345 microseconds. | 
|  | (= 112345 clock ticks) | 
|  | Increase the size of the arrays if this shows that you are not getting | 
|  | at least 20 clock ticks per test. | 
|  | ------------------------------------------------------------- | 
|  | WARNING -- The above is only a rough guideline. | 
|  | For best results, please be sure you know the precision of your system | 
|  | timer. | 
|  | ------------------------------------------------------------- | 
|  | Function      Rate (MB/s)   RMS time     Min time     Max time | 
|  | Copy:         256.7683       0.1248       0.1246       0.1250 | 
|  | Scale:        246.0157       0.1302       0.1301       0.1302 | 
|  | Add:          255.0316       0.1883       0.1882       0.1885 | 
|  | Triad:        253.1245       0.1897       0.1896       0.1899 | 
|  |  | 
|  |  | 
|  | TTCP Benchmark Results | 
|  | ttcp-t: socket | 
|  | ttcp-t: connect | 
|  | ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  -> | 
|  | localhost | 
|  | ttcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++ | 
|  | ttcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57 | 
|  | ttcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw | 
|  |  | 
|  | ---------------------------------------- | 
|  | SDRAM0_CFG0[PMU] = 0 (Suggested modification) | 
|  | Setting PMU = 0 provides a noticeable performance improvement *2% to | 
|  | 5% improvement in memory performance. | 
|  | *Improves the Mbit/sec for TTCP benchmark by almost 76%. | 
|  | ---------------------------------------- | 
|  | Stream benchmark results | 
|  | ------------------------------------------------------------- | 
|  | This system uses 8 bytes per DOUBLE PRECISION word. | 
|  | ------------------------------------------------------------- | 
|  | Array size = 2000000, Offset = 0 | 
|  | Total memory required = 45.8 MB. | 
|  | Each test is run 10 times, but only | 
|  | the *best* time for each is used. | 
|  | ------------------------------------------------------------- | 
|  | Your clock granularity/precision appears to be 1 microseconds. | 
|  | Each test below will take on the order of 120066 microseconds. | 
|  | (= 120066 clock ticks) | 
|  | Increase the size of the arrays if this shows that you are not getting | 
|  | at least 20 clock ticks per test. | 
|  | ------------------------------------------------------------- | 
|  | WARNING -- The above is only a rough guideline. | 
|  | For best results, please be sure you know the precision of your system | 
|  | timer. | 
|  | ------------------------------------------------------------- | 
|  | Function      Rate (MB/s)   RMS time     Min time     Max time | 
|  | Copy:         262.5167       0.1221       0.1219       0.1223 | 
|  | Scale:        258.4856       0.1238       0.1238       0.1240 | 
|  | Add:          262.5404       0.1829       0.1828       0.1831 | 
|  | Triad:        266.8594       0.1800       0.1799       0.1802 | 
|  |  | 
|  | TTCP Benchmark Results | 
|  | ttcp-t: socket | 
|  | ttcp-t: connect | 
|  | ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  -> | 
|  | localhost | 
|  | ttcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++ | 
|  | ttcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89 | 
|  | ttcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw | 
|  |  | 
|  |  | 
|  | 2006-07-28, Stefan Roese <sr@denx.de> |