When looking to implement a Cortex-M0 DesignStart core in a Spartan-3AN FPGA, proper sizing of the core is crucial for an optimal design. The Cortex-M0 is ARM’s smallest and most energy efficient processor, making it well suited for FPGA implementation. However, careful consideration must be given to the specific resource requirements to avoid issues with fitting the core or meeting timing closure.
FPGA Resources Needed for Cortex-M0
The Cortex-M0 DesignStart core requires a number of key FPGA resources:
- Logic Cells – The core logic utilizes approximately 3,500 4-input LUTs. This includes the Cortex-M0 CPU, AHB-Lite interface, peripherals, and memory interfaces.
- Block RAM – The core requires between 4 to 16 dual-port block RAMs for instruction and data storage. Larger data and instruction footprints require more BRAM resources.
- DSP Slices – Some peripherals like the SysTick timer require a few DSP slices, but overall DSP usage is very low.
- Clock Management – The core requires one clock management tile to generate the system clock from the FPGA input clock.
The number of block RAMs and logic cells needed fluctuates based on the exact peripherals included and optimization techniques used. However, the above provides a general guideline of the key resources required.
Spartan-3AN Device Options
The Spartan-3AN FPGA family provides a wide variety of size options to fit different design needs. For a Cortex-M0 core, some of the most suitable options include:
- XC3S50AN – 5,060 LUTs, 20 BRAMs
- XC3S200AN – 19,968 LUTs, 56 BRAMs
- XC3S400AN – 39,400 LUTs, 116 BRAMs
- XC3S700AN – 67,584 LUTs, 232 BRAMs
Even the smallest XC3S50AN provides enough resources for a minimal Cortex-M0 configuration. The mid-range XC3S200AN allows room for additional peripherals and memory. The larger devices like the XC3S700AN allow for very complex systems with maximum memory and peripherals.
Utilization and Frequency
Based on the synthesis and implementation results of the Cortex-M0 DesignStart core, here are typical utilization and frequency numbers that can be achieved in different Spartan-3AN FPGAs:
- XC3S50AN – 3,200 LUTs (63%), 8 BRAMs (40%), 50 MHz
- XC3S200AN – 3,800 LUTs (19%), 12 BRAMs (21%), 110 MHz
- XC3S400AN – 4,000 LUTs (10%), 16 BRAMs (14%), 140 MHz
The smallest XC3S50AN fits a minimal Cortex-M0 configuration while the larger devices leave ample room for additional logic and bandwidth for high speeds. Utilization, frequency, and bandwidth improve significantly in the larger FPGAs.
Peripherals and Memory
The peripherals and memory interfaces included in the Cortex-M0 design significantly impact the resource utilization and performance. Here are guidelines for what fits based on device size:
- XC3S50AN – Basic system timer, UART, GPIO, 4KB instruction, 4KB data RAM
- XC3S200AN – Add additional timers, SPI, I2C, 8KB instruction, 8KB data RAM
- XC3S400AN – Larger data and instruction memory, Ethernet, USB possible
The smallest device constrains the peripherals and memory possible. Mid-range devices allow more complex peripherals and moderate memory. The largest devices can implement very advanced interfaces and maximum memory.
Optimizing the Cortex-M0 Design
To better fit the Cortex-M0 into the Spartan-3AN FPGAs, here are some optimization techniques that can be used:
- Minimize logic cell usage by removing unneeded peripherals
- Use FPGA BRAM for instruction and data rather than logic-based memory
- Use FPGA DSPs for peripheral functions to reduce logic cell usage
- Use synthesized clock management rather than LogicLock to save logic resources
- Simplify or remove memory interfaces like JTAG to save logic
Properly optimizing the Cortex-M0 design is essential to maximizing resources and performance in the Spartan-3AN FPGAs. Removing unnecessary logic, utilizing FPGA resources like BRAM and DSP, and simplifying interfaces can significantly improve results.
Pin Planning and Floorplanning
To integrate the Cortex-M0 into the Spartan-3AN FPGA, pin planning and floorplanning are critical. Key guidelines include:
- Plan FPGA pin connections for peripherals, memory interfaces, system clock, and debug
- Floorplan for proper register locality and timing closure
- Place memory interfaces close to FPGA BRAMs
- Plan clock regions to minimize skew
- Use FPGA hard macros like PLLs and DSPs to save resources
With good pin planning and floorplanning, the Cortex-M0 core can be smoothly integrated into the FPGA fabric. This helps optimize routing, timing, and resource usage.
Timing Closure and Optimization
Closing timing and meeting design frequency goals requires optimization of the Cortex-M0 implementation. Key techniques include:
- Following FPGA design guidelines for register balancing, pipelining, and DSP usage
- Constraining timing-critical paths to guide synthesis and layout
- Using incremental layout and optimization to fix failing paths
- Reducing combinational logic levels through pipelining and retiming
- Analyzing critical paths and using guide files to optimize routing
With proper timing closure techniques, the Cortex-M0 core can reliably be run at frequencies of 100 MHz or higher in the Spartan-3AN FPGAs.
Simulation, Debug, and Verification
Verifying the Cortex-M0 FPGA implementation requires robust simulation, debug, and testing. Recommended verification flows include:
- Simulating the core RTL with testbenches to verify functionality before synthesis
- Performing gate-level simulation to verify post-synthesis implementation
- Using ChipScope and SignalTap for in-system debug of the FPGA
- Running embedded software tests to exercise core functions
- Reusing verification components from the ARM DesignStart methodology
By combining pre-silicon and post-silicon verification techniques, a robust Cortex-M0 FPGA implementation can be achieved.
Conclusion
Implementing the Cortex-M0 in Spartan-3AN FPGAs requires careful design planning and optimization but provides a flexible and low-cost ARM-based solution. By selecting the optimal Spartan-3AN device, optimizing the core configuration, planning the SoC integration, and verifying the implementation, a high quality embedded system can be realized.