Johannes,
I think you are referring to my anti-forensics paper
[1]. <
[...]
Ha! That was a good paper. Thank you for drawing it to my attention.
Nor do I think that the criticisms offered therein were in any sense
inappropriate. You have identified a number of obvious single points of
failure which affect many common memory forensic acquisition tools. The
vulnerabilities are too simple and too obvious not to be exploited in
the wild. Indeed, in our own experience and based on anecdotal reports
by our customers, the "bad guys" are using far more sophisticated
techniques to frustrate memory analysis. Nor is it an excuse if a
forensic tool vendor has known about these problems for years and failed
to do anything about it.
Also, the basic concept which you advance as a solution to the problem,
namely acquiring memory by formatting your own page table entry
(effectively bypassing the operating system) has merit and deserves
further exploration. That having been said, I think that there are a
couple of things that need to be viewed more critically.
1. The first is the notion of scanning the PCI bus through the
PCI_CONFIG_DATA and PCI_CONFIG_ADDRESS IO ports. On Microsoft Windows,
the PCI bus driver assumes that it has exclusive access to the PCI bus.
Trying to access the PCI config space without coordinating with the
PCI bus is going to going to have some very unpleasant consequences
sooner or later. See, especially, the comments of Doron Holon in the
thread "Querying PCI config space,"
http://www.osronline.com/showthread.cfm?link=243808. You have correctly
identified a significant limitation of most contemporary memory forensic
software. The device and BIOS reserved areas which most tools discard
may contain important artifacts of sophisticated (and some
not-so-sophisticated) malware. Nevertheless, here we have a Microsoft
software engineer stating in unequivocal terms that what you are
proposing to do is going to trash the system. Nor do I care that lscpi
does the same thing on Linux and get's away with it most of the time.
There are a lot of things that you can get away with most of the time
but which will some day fail suddenly in a career ending moment.
2. Equally problematic is the notion of allowing the user to "acquire
past the end of physical memory (yielding simply zero blocks)." Ahhh!
If only it were so simple. In many cases this will work. But I also
have encountered systems on which attempting to read beyond the end of
physical memory (i.e. to an invalid physical address and not just beyond
MmHighestPossiblePhysicalPage) reliably results in a machine check
exception (MCE). I even have a system with me at the moment which
generates a MCE exception when you try to read from certain BIOS
reserved physical addresses below MmHighestPossiblePhysicalPage. The
faulting addresses are marked as BIOS reserved addresses but do not
otherwise present any reason for the fault, except that this is the way
that this particular BIOS has programmed the memory controller and
processor.
3. The third and most important problem is your facile assumption that
the risk of processor TLB cache corruption can be dismissed simply
because you are only reading from your "rogue" page table entries and
not writing to them. As you correctly state: "Note that depending on
the caching type in the PTE that holds the original mapping to a
physical page, writing to the rogue mapping could cause cache
incoherence and is strongly discouraged." Section 11.12.4 Vol 3A of
the Intel Software Developer's Manual states:
"The PAT allows any memory type to be specified in the page tables, and
therefore it is possible to have a single physical page mapped to two or
more different linear addresses, each with different memory types. Intel
does not support this practice because it may lead to undefined
operations that can result in a system failure." Current versions of
Microsoft Windows store the cache attribute of current
virtual-to-physical address mappings as a parameter in the PFN database.
When you use the operating system to re-map a physical page the OS
uses this stored cache attribute when it creates the new mapping. The
operating system thus attempts to avoid the cache corruption issues to
against which the Intel manuals warn. When you bypass the OS and
effectively create your own "rogue" page table entry you assume the
responsibility of doing what the OS does to prevent processor cache
corruption. Your own formulations acknowledge as much. In your paper
you assert, without supporting reference, that "this is not a problem
for the purpose of memory acquisition, as we only need to read from this
mapping." Assuming arguendo that this is so you still haven't accounted
for the possibility that someone ELSE might corrupt the processor cache
by wrtting to a physical page using a virtual address with a mapping
that is incompatible to yours.
Section 11.12.4 Vol 3A of the Intel Software Developer's Manual also
provides the steps that an operating system must take before mapping a
physical page to an incompatible mapping:
<Quote/>
1. Remove the previous mapping to a cacheable memory type in the page
tables; that is, make them not present.
2. Flush the TLBs of processors that may have used the mapping, even
speculatively.
3. Create a new mapping to the same physical address with a new memory
type, for instance, WC.
4. Flush the caches on all processors that may have used the mapping
previously. Note on processors that support self-snooping, CPUID feature
flag bit 27, this step is unnecessary.
</Quote>
The question I have is, how is the operating system going to undertake
these steps for a subsequent concurrent mapping (assuming that yours is
the original mapping) when the operating system does not know that your
mapping exists?
The first two problems described above, resulting from scanning the PCI
bus and from reading beyond the end of physical memory do not occur on
every system; however, the problems result in immediate and dramatic
feedback (in the form of a system crash) in the cases where they do
occur. Problems with processor cache corruption, on the other hand, do
occur, at least potentially, on any system with an Intel compatible
processor. However, cache corruption typically doesn't usually provide
any obvious indication that it has occurred. Some random bit of memory
get's corrupted. An affected system can continue running, in some cases
for years, without any apparent indication of a problem. And then one
day someone orders a drone to abort and it fires instead because you
corrupted the memory of some mission critical server months earlier. So
I do not think that the problem of cache corruption can be dismissed as
easily as you seem to think.
Regards,
George M. Garner Jr.
President
GMG Systems, Inc.