Atomic 1 0 3 – An Extensive System Memory Tester

broken image


Ii live-config-doc 3.0.23-1+deb8u1 all Live System Configuration Scripts (. Ii memtest86+ 4.20-1.1ubuntu9 amd64 thorough real-mode memory tester.

Abstract

  • MemTest86 is the original, free, stand alone memory testing software for x86 and ARM computers. MemTest86 boots from a USB flash drive and tests the RAM in your computer for faults using a series of comprehensive algorithms and test patterns.
  • 3 Atomic Operations. 3.1 Is Exclusive Access Enough? 3.2 Example Application; 3.3 Do It Yourself (DIY) Assembly Language; 4 C Programming Language; 5 C11 Atomics. 5.1 Atomic Flags; 5.2 Test And Set Example; 5.3 Atomics Memory Ordering; 5.4 Cache; 5.5 A Peek Under The Hood; 6 Other Challenges To Consider. 6.1 Lock Clearing; 6.2 Exclusive Access.

Recent developments in storage class memory (SCM) such as PCM, MRAM, resistive RAM (RRAM), and spin-transfer torque (STT)-RAM have strengthened their leadership as storage media for memory-based file systems. Traditional Linux memory-based file systems such as Ramfs and Tmpfs utilize the Linux page cache as a file system. These file systems have unnecessary overheads when adopted for SCM file system. Therefore, we propose a new memory-based file system using Memory Zone Partitioning called ZonFS, by extending the Linux Ramfs. In particular, we define a storage zone for SCM, modify the Ramfs to allocate a file system page from SCM. ZonFS avoids running Linux VM kernel codes such as (i) inserting pages allocated from SCM into the LRU list for VM page replacement and (ii) checking dirty pages for write-back to disk. Our extensive evaluations indicate that ZonFS has up to 9.1 and 14.1% higher I/O throughputs than native Ramfs and Tmpfs. Moreover, we also analyze performance behavior of ZonFS under the non-uniform memory access architecture of SCMs on a 40 manycore machine with various configurations such as file sharing level and file stripping level. Our evaluations show that memory controller contention and inter-node link congestion significantly affect the file system's performance and scalability.

This is a preview of subscription content, access via your institution.

Access options

Buy single article

USD 39.95

Price includes VAT (Canada)
Tax calculation will be finalised during checkout.

Atomic 1 0 3 – An Extensive System Memory Tester Using

authors: Jason Lowe-Power

M5's new memory system (introduced in the first 2.0 beta release) wasdesigned with the following goals:

  1. Unify timing and functional accesses in timing mode. With the oldmemory system the timing accesses did not have data and justaccounted for the time it would take to do an operation. Then aseparate functional access actually made the operation visible tothe system. This method was confusing, it allowed simulatedcomponents to accidentally cheat, and prevented the memory systemfrom returning timing-dependent values, which isn't reasonable foran execute-in-execute CPU model.
  2. Simplify the memory system code – remove the huge amount oftemplating and duplicate code.
  3. Make changes easier, specifically to allow other memoryinterconnects besides a shared bus.

For details on the new coherence protocol, introduced (along with asubstantial cache model rewrite) in 2.0b4, see CoherenceProtocol.

MemObjects

All objects that connect to the memory system inherit from MemObject.This class adds the pure virtual functions getMasterPort(conststd::string &name, PortID idx) and getSlavePort(const std::string&name, PortID idx) which returns a port corresponding to the given nameand index. This interface is used to structurally connect the MemObjectstogether.

Ports

Java create file if not exists. The next large part of the memory system is the idea of ports. Ports areused to interface memory objects to each other. They will always come inpairs, with a MasterPort and a SlavePort, and we refer to the other portobject as the peer. These are used to make the design more modular. Withports a specific interface between every type of object doesn't have tobe created. Every memory object has to have at least one port to beuseful. A master module, such as a CPU, has one or more MasterPortinstances. A slave module, such as a memory controller, has one or moreSlavePorts. An interconnect component, such as a cache, bridge or bus,has both MasterPort and SlavePort instances.

There are two groups of functions in the port object. The send*functions are called on the port by the object that owns that port. Forexample to send a packet in the memory system a CPU would callmyPort->sendTimingReq(pkt) to send a packet. Each send function has acorresponding recv function that is called on the ports peer. So theimplementation of the sendTimingReq() call above would simply bepeer->recvTimingReq(pkt) on the slave port. Using this method we onlyhave one virtual function call penalty but keep generic ports that canconnect together any memory system objects.

Master ports can send requests and receive responses, whereas slaveports receive requests and send responses. Due to the coherenceprotocol, a slave port can also send snoop requests and receive snoopresponses, with the master port having the mirrored interface.

Connections

In Python, Ports are first-class attributes of simulation objects, muchlike Params. Two objects can specify that their ports should beconnected using the assignment operator. Unlike a normal variable orparameter assignment, port connections are symmetric: A.port1 =B.port2 has the same meaning as B.port2 = A.port1. The notion ofmaster and slave ports exists in the Python objects as well, and a checkis done when the ports are connected together.

Objects such as busses that have a potentially unlimited number of portsuse 'vector ports'. An assignment to a vector port appends the peer to alist of connections rather than overwriting a previous connection.

Doo 2 0 0 download free. In C++, memory ports are connected together by the python code after allobjects are instantiated.

Atomic 1 0 3 – An Extensive System Memory Tester Free

Request

A request object encapsulates the original request issued by a CPU orI/O device. The parameters of this request are persistent throughout thetransaction, so a request object's fields are intended to be written atmost once for a given request. There are a handful of constructors andupdate methods that allow subsets of the object's fields to be writtenat different times (or not at all). Read access to all request fields isprovided via accessor methods which verify that the data in the fieldbeing read is valid.

The fields in the request object are typically not available to devicesin a real system, so they should normally be used only for statistics ordebugging and not as architectural values.

Request object fields include:

  • Virtual address. This field may be invalid if the request was issueddirectly on a physical address (e.g., by a DMA I/O device).
  • Physical address.
  • Data size.
  • Time the request was created.
  • The ID of the CPU/thread that caused this request. May be invalid ifthe request was not issued by a CPU (e.g., a device access or acache writeback).
  • The PC that caused this request. Also may be invalid if the requestwas not issued by a CPU.

Packet

A Packet is used to encapsulate a transfer between two objects in thememory system (e.g., the L1 and L2 cache). This is in contrast to aRequest where a single Request travels all the way from the requester tothe ultimate destination and back, possibly being conveyed by severaldifferent Packets along the way.

Read access to many packet fields is provided via accessor methods whichverify that the data in the field being read is valid.

A packet contains the following all of which are accessed by accessorsto be certain the data is valid:

Atomic
  • The address. This is the address that will be used to route thepacket to its target (if the destination is not explicitly set) andto process the packet at the target. It is typically derived fromthe request object's physical address, but may be derived from thevirtual address in some situations (e.g., for accessing a fullyvirtual cache before address translation has been performed). It maynot be identical to the original request address: for example, on acache miss, the packet address may be the address of the block tofetch and not the request address.
  • The size. Again, this size may not be the same as that of theoriginal request, as in the cache miss scenario.
  • A pointer to the data being manipulated.
    • Set by dataStatic(), dataDynamic(), and dataDynamicArray()which control if the data associated with the packet is freedwhen the packet is, not, with delete, and with delete []respectively.
    • Allocated if not set by one of the above methods allocate()and the data is freed when the packet is destroyed. (Always safeto call).
    • A pointer can be retrived by calling getPtr()
    • get() and set() can be used to manipulate the data in thepacket. The get() method does a guest-to-host endian conversionand the set method does a host-to-guest endian conversion.
  • A status indicating Success, BadAddress, Not Acknowleged, andUnknown.
  • A list of command attributes associated with the packet
    • Note: There is some overlap in the data in the status field andthe command attributes. This is largely so that a packet an beeasily reinitialized when nacked or easily reused with atomic orfunctional accesses.
  • A SenderState pointer which is a virtual base opaque structureused to hold state associated with the packet but specific to thesending device (e.g., an MSHR). A pointer to this state is returnedin the packet's response so that the sender can quickly look up thestate needed to process it. A specific subclass would be derivedfrom this to carry state specific to a particular sending device.
  • A CoherenceState pointer which is a virtual base opaque structureused to hold coherence-related state. A specific subclass would bederived from this to carry state specific to a particular coherenceprotocol.
  • A pointer to the request.

Access Types

There are three types of accesses supported by the ports.

  1. Timing - Timing accesses are the most detailed access. Theyreflect our best effort for realistic timing and include themodeling of queuing delay and resource contention. Once a timingrequest is successfully sent at some point in the future the devicethat sent the request will either get the response or a NACK if therequest could not be completed (more below). Timing and Atomicaccesses can not coexist in the memory system.
  2. Atomic - Atomic accesses are a faster than detailed access. Theyare used for fast forwarding and warming up caches and return anapproximate time to complete the request without any resourcecontention or queuing delay. When a atomic access is sent theresponse is provided when the function returns. Atomic and timingaccesses can not coexist in the memory system.
  3. Functional - Like atomic accesses functional accesses happeninstantaneously, but unlike atomic accesses they can coexist in thememory system with atomic or timing accesses. Functional accessesare used for things such as loading binaries, examining/changingvariables in the simulated system, and allowing a remote debugger tobe attached to the simulator. The important note is when afunctional access is received by a device, if it contains a queue ofpackets all the packets must be searched for requests or responsesthat the functional access is effecting and they must be updated asappropriate. The Packet::intersect() and fixPacket() methods canhelp with this.

Packet allocation protocol

The protocol for allocation and deallocation of Packet objects variesdepending on the access type. (We're talking about low-level C++new/delete issues here, not anything related to the coherenceprotocol.)

  • Atomic and Functional : The Packet object is owned by therequester. The responder must overwrite the request packet with theresponse (typically using the Packet::makeResponse() method).There is no provision for having multiple responders to a singlerequest. Since the response is always generated beforesendAtomic() or sendFunctional() returns, the requester canallocate the Packet object statically or on the stack.
  • Timing : Timing transactions are composed of two one-way messages,a request and a response. In both cases, the Packet object must bedynamically allocated by the sender. Deallocation is theresponsibility of the receiver (or, for broadcast coherence packets,the target device, typically memory). In the case where the receiverof a request is generating a response, it may choose to reuse therequest packet for its response to save the overhead of callingdelete and then new (and gain the convenience of usingmakeResponse()). However, this optimization is optional, and therequester must not rely on receiving the same Packet object back inresponse to a request. Note that when the responder is not thetarget device (as in a cache-to-cache transfer), then the targetdevice will still delete the request packet, and thus the respondingcache must allocate a new Packet object for its response. Also,because the target device may delete the request packet immediatelyon delivery, any other memory device wishing to reference abroadcast packet past point where the packet is delivered must makea copy of that packet, as the pointer to the packet that isdelivered cannot be relied upon to stay valid.

Timing Flow control

Timing requests simulate a real memory system, so unlike functional andatomic accesses their response is not instantaneous. Because the timingrequests are not instantaneous, flow control is needed. When a timingpacket is sent via sendTiming() the packet may or may not be accepted,which is signaled by returning true or false. If false is returned theobject should not attempt to sent anymore packets until it receives arecvRetry() call. At this time it should again try to callsendTiming(); however the packet may again be rejected. Note: Theoriginal packet does not need to be resent, a higher priority packet canbe sent instead. Once sendTiming() returns true, the packet may stillnot be able to make it to its destination. For packets that require aresponse (i.e. pkt->needsResponse() is true), any memory object canrefuse to acknowledge the packet by changing its result to Nacked andsending it back to its source. However, if it is a response packet, thiscan not be done. The true/false return is intended to be used for localflow control, while nacking is for global flow control. In both cases aresponse can not be nacked.

Response and Snoop ranges

Ranges in the memory system are handled by having devices that aresensitive to an address range provide an implementation forgetAddrRanges in their slave port objects. This method returns anAddrRangeList of addresses it responds to. When these ranges change(e.g. from PCI configuration taking place) the device should callsendRangeChange() Basejump 1 3 2. on its slave port so that the new ranges arepropagated to the entire hierarchy. This is precisely what happensduring init(); all memory objects call sendRangeChange(), and aflurry of range updates occur until everyones ranges have beenpropagated to all busses in the system.






broken image