System-On-Chip (SoC) was a term used maybe a decade ago for chips, where a general-purpose computing core was attached to various peripherals (often heavyweight such as memory controllers or display controllers), originally intended to reduce need for pins for connections between the processor and peripherals. SoC often represented a chip-scale reduction of what was prevously built from individual chips on a board. SoC were generally seen as a counterpart to microcontrollers (mcu); SoC serving the high-end applications, mcus the low-end.Conventional microcontrollers (mainly 8-bit, such as the venerable 8051 and its derivatives, 8-bit PICs, or the original AVRs; but mostly also the 16-bit microcontrollers) are built tightly around the processor core, with all the buses and peripherals sharing a single clock. This greatly simplifies understanding of the mcu's timing for the users. However, this approach has drawbacks, too: peripherals are designed in a dedicated way to a particular mcu and are not portable to other mcus, increasing development costs. Also when the chip crosses a certain size (physically, but also in terms of clock fanout), clock distribution becames troublesome, in some cases clock skew between different parts of the chip becomes a serious issue.
The 32-bit microcontrollers were created in a different way - they are based around slightly modified general-purpose 32-bit processor cores (ARM, MIPS), but the memories and peripherals are connected to it through a fabric of buses and bus matrices, augmented by quite a few dedicated inter-module signals (e.g. interrupts signaling). This allows to split clocks to several domains, interlocked or mutually independent; adding buffers to allow continued execution after writing to slow peripherals; accomodating slow memories and peripherals through waitstates signaling; multiplexing access to memories and peripherals from several busmasters (processor(s), DMA units).
This is exactly the way how the SoC were created. It's all that "good stuff" which allowed to generate complex circuitry from discrete processors/memories/peripheral chips, on a board, and it continues to allow integrating very complex circuitry on a single chip, while maintaining high speed of operation. Not only that, but it also simplifies design: as bus interfaces are standardized, individual modules (usually called IP for "Intellectual Property") can be purchased from various vendors and "slapped together" relatively easily.
The downside of this approach is, that timing becomes a complex issue, usually beyond the point where it could be described in simple terms. One of the practical consequences is, that programmers accustomed to write bit-banged protocols timed by instruction cycle counting, or to calculate performance of key routines in instruction cycles, in practice can't do that anymore. Even the seemingly simple GPIO toggling won't necessarily be directly related to the speed, at which the mcu operates.
As an additional problem, timing of some of those "extra" internal signals, which are not tied tightly to the bus structure, may cause surprising effects. Some of the better known are the "interrupt clear arrives late and causes the ISR to reenter for no reason" and "clock enable arrives later than a read/write to the same peripheral, which then fails". These are usually brought up when one tries to write tight and fast code, and are usually covered up by the usual bloaty vendor-supplied libraries.
As the raw clock frequency of the processing core increases, so does complexity discussed here, and more. High speeds are achieved thanks to parallelism, high level of pipelining, out of order execution, speculative execution, heavy involvement of caches and buffers and various strategies how they are used. Also the number of interconnecting elements such as bus matrices and inter-bus/inter-clock-domain interfaces increases, rarely being properly documented; concerns are being adressed by extensive handwaving and pointing to benchmarks which barely ever truly represent real-world applications. Users are often genuinely surprised to find out that while the raw clock increased by almost 2 orders of magnitude compared to the 8-bitters, the "worst-case reaction" or "control granularity" remained almost the same, with significantly increased uncertainty in timing. As these issues are not something discussed widely and, understandably, manufacturers are not keen to put this up front in their marketing nor engineering materials, this leads to much frustration among users these days.