17 Jul 2013

--> Teradata Architecture-PE,BYNET,AMP,VDISK,PDISK

Introduction to Teradata RDBMS
Teradata RDBMS is a complete relational database management system. The system is based on
off-the-shelf Symmetric Multiprocessing (SMP) technology combined with a communication
network connecting the SMP systems to form a Massively Parallel Processing (MMP) system.
BYNET is a hardware inter-processor network to link SMP nodes. All processors in a same SMP
node are connected by a virtual BYNET. We use the following figure to explain how each
component in this DBMS works together.
PDE (Parallel Database Extensions):
This component is an interface layer on the top of operating system. Its functions
include: executing vprocs (virtualprocessors), providing a parallel environment,
scheduling sessions, debugging, etc.

Teradata File System:
It allows Teradata RDBMS to store and retrieve data regardless of low-level operating system interface.
PE (Parsing Engine):
         Communicate with client
         Manage sessions
         Parse SQL statements
         Communicate with AMPs
         Return result to the client
AMP (Access Module Processor):
         BYNET interface
         Manage database
         Interface to disk subsystem
CLI (Call Level Interface):
         A SQL query is submitted and transferred in CLI packet format
TDP (Teradata Director Program):
Route the packets to the specified Teradata RDBMS server
Teradata RDBMS has the following components that support all data communication
management:
_ Call Level Interface ( CLI )
_ WinCLI & ODBC
_ Teradata Director Program ( TDP for channel attached client )
_ Micro TDP ( TDP for network attached client )


Node hardware and software components
CPUs are not physically associated with vprocs.  Performance is best when you use the UNIX affinity scheduler to keep a logical association between a CPU and a vproc.
Memory - Vprocs share a free memory pool within a node.  A segment of memory is allocated to a vproc for use, then returned to the memory pool for use by another vproc.
MCA - Slots in the MCA ( Micro Channel Adapter) are used for the following connections:
Local Peripheral Board (LPB)
External disk arrays
LAN connections
Mainframe channel connections
MCCA - MCCA boards (Micro Channel Cable Adapter) enable communication between a channel-attached node and the Tailgate box.  MCCA boards are located in MCA slots.
Ethernet Card - Each LAN connection to a node requires an Ethernet card, which communicates with the Teradata Gateway software.  Ethernet cards are located in MCA slots.
Twisted Pair Shielded Cable - Connects the MCCA card to the Tailgate box for a mainframe channel connection.
LAN Cable - Connect the Ethernet cards in the MCA to the LAN.
Tailgate Box - An adapter between the node cabinet and the mainframe in a channel-connected system.
Bus and Tag Cables - Connects the Tailgate box to the mainframe.
Virtual Disk(vdisk) - The logical disk that is managed by an AMP.  Each AMP is associated with a single disk.
UNIX - The Teradata RDBMS is built on the UNIX operating system for an open environment.  NCR added MP-RAS extensions to UNIX to facilitate a multiple CPU environment.
Parallel Database Extensions (PDE) - Software that runs on UNIX MP-RAS.  It was created by NCR to support the parallel environment.
Trusted Parallel Application (TPA) - Implements virtual processors and runs on the foundation of UNIX MP-RAS and PDE.
The Teradata RDBMS for UNIX is classified as a TPA.
Access Module Processors (AMP) are vprocs that receive steps from PEs and perform  database functions to return of update data.  Each AMP is associated with one vdisk.
PE - Vprocs that create SQL requests from the client and break the requests into steps.  The PEs send the steps to the AMPs and subsequently return the answer to the client.
Teradata Gateway - Software that communicates between the PEs and applications running on LAN-attached clients and a node in the system.  The Teradata Gateway has a session limit of 600 sessions.
Channel Driver - Software that communicates between the PEs and applications running on channel-attached clients.
Platforms
Single Node System:  All of the node components together comprise a node.  A single node system is typically implemented on an SMP platform.  The vprocs in an SMP system communicate over the vnet.
Nodes working together create a multiple-node Teradata RDBMS system, which is implemented on an MPP platform.  The nodes and vprocs communicate over the BYNET (Banyan Network).
BYNET
The BYNET is a high-speed interconnect that is responsible for:
Sending messages
Merging data
Sorting answers
The BYNET messaging capability enables vprocs to send different types of messages:
Point-to-Point - A vproc can send a message to another vproc:
In the same node using BYNET software only, the message is reassigned in memory to the target vproc.
In another node the message is using both BYNET hardware and software.
Multicast - A vproc can send a message to multiple vprocs by sending a broadcast message to all nodes.  The BYNET software on the receiving node determines whether a vproc on the node should receive or discard the message.
Broadcast - A vproc can broadcast a message to all the vprocs in the system.
Two BYNETs per system for the following reasons:
Performance
Fault Tolerance
Clique
A clique is a group of nodes that share access to the same disk arrays.  The nodes have a daisy-chain connection to each disk array controller.
Cliques provide data accessibility if a node fails for any reason (i.e. UNIX reset).
Vprocs are distributed across all nodes in the system.  Each multi-node system has at least one clique.
Software Components
UNIX operating system - The Teradata RDBMS runs on UNIX SVR4 with MP-RAS.
Parallel Database Extensions (PDE) - PDE was added to the UNIX kernel by NCR to support the parallel software environment.
Trusted Parallel Application (TPA) - A TPA uses PDE to implement virtual processors.  The Teradata RDBMS is classified as a TPA.
Channel Driver - The Channel Driver software is the means of communication between the application and the PEs assigned to channel-attached clients.
Teradata Gateway - The Gateway software is the means of communication between the application and the PEs assigned to network-attached clients.  There is one Gateway per node.
AMP - The AMP is a type of vproc that has software to manage data.
AMP Worker Task (AWT) Functions in the AMP perform a number of  operations, including:
Locking Tables
Executing Tables
Joining Tables
Executing end transaction steps
The file system software accesses the data on the virtual disks.  Each AMP uses the file system software to read from and write to the virtual disks.
Console Utilities - The AMP software includes utilities to perform generally sophisticated, low-level functions such as:
Configure and reconfigure the system
Rebuild tables
Reveal details about locks and space status
PE - a PE is a type of vproc that has software components to break SQL into steps, and send the steps to the AMPs.
Session Control - When you log on to the teradata RDBMS through your application, the session control software on the PE establishes that session.  Session control also manages and terminates sessions on the PE.
Parser/Optimizer - The parser interprets your Teradata SQL request and checks the syntax.  The parser decomposes the request into AMP steps, using the optimizer to determine the most efficient way to access the data on the virtual disks.  Then the parser sends the steps to the dispatcher.
Dispatcher - The dispatcher is responsible for a number of tasks, depending on the operation it is performing:
Processing Requests
Processing Responses

No comments:

Post a Comment