File Organisation Techniques in DBMS Tutorial Study Notes with Examples
The database is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields. Data is usually stored in the form of records. Records usually describe entities and their attributes. e,g. employee record represents an employee entity and each field value in the record specifies some attributes of that employee, such as Name Birth-date, Salary or Supervisor.
If every record in the file has exactly the same size (in bytes), the file is said to be made up of fixed length records. If different records in the file have different sizes, the file is said to be made up of variable length records,
Spanned versus Unspanned Records
- The records of a file must be allocated to disk blocks because a block is the unit of data transfer between disk and memory. When the block size is larger than the record size, each block will contain numerous records, although some of the files may have unusually large records that cannot fit in one block.
- Suppose that block size is B For a file of fixed length records of size R bytes, with B ≥ R , then we can fit bfr =[ B/R] records per block. The value of bfr is called
the blocking factor.
- In general, R may not divide B exactly, so we have some unused space in each block equal to B — (bfr * R)
- To utilize this unused space, we can store part of a record on one block and the rest on another. A pointer at the end of the first block points to the block containing the remainder of the record. This organisation is called spanned.
- If records are not allowed to cross block boundaries, the organisation is called
Allocating File Blocks on Disk
There are several standard techniques for allocating the blocks of a file on disk Contiguous Allocation The file blocks are allocated to consecutive disk blocks. This makes reading the whole file very fast.
Linked Allocation: In this, each file contains a pointer to the next file block
Indexed Allocation: Where one or more index blocks contain pointers to the actual file blocks.
A file header or file descriptor contains information about a file that is needed by the system programs that access the file records.
Operations on Files
Operations on files are given below.
Retrieval Operations: These do not change any data in the files but only local certain recoids so that their field value can be examined and processed.
Update operations :These change the file, by insertion or deletion of records or by modification of field values.
open: prepares the file for reading or writing. Set the file pointer to the beginning of the file.
Reset: Sets the file pointer of an open file to the beginning of the file,
Find (or Locate): Searches for the first record that satisfies a search condition. Transfers the block containing that record into a main memory buffer (if it is not already there).
Read (or less Get): Copies the current record from the buffer to a program variable in the user program.
FindNext: Searches for the next record in the file that satisfies the search condition.
Delete: Deletes the current record and updates the file on disk to reflect the deletion.
Modify: Modifies some field values for the current record and updates the file on disk to reflect the modification.
Insert: Insert a new record in the file by locating the block, where the record is to be inserted, transferring that block into the main memory buffer, writing the record into the buffer and writing the buffer to disk to reflect the insertion.
Close: Completes the file access by releasing the buffers and performing any other needed cleanup operations.
Files of Unordered Records (Heap Files)
In the. simplest type of organization records are placed in the file in the Order n which they are inserted, so new records are inserted at the end of the file, Such an or is called a heap or pile file.
This or organisation is often used with additional access paths, such as the secondary indexes.
In this-type of organisation, insert a new record is very efficient .Linear search is used to search a record.
Files of Ordered Records (Sorted Files)
- We can physically order the records of a file on disk based on the values of one of their fields called the ordering field. This leads to an ordered sequential file.
- ordering field is also a key field of the file, a field guaranteed to have a unique value in each record, then the field is called the ordering key for the file. Binary searching is used to search a record.
Another type of primary file organisation is based on hashing which provides very fast access to records under certain search conditions. This organisation is usually called a hash file. A hash file has also been called a direct file.
- The search condition must be an equality condition on a single field, called the hash field.
- Hash function is used to determine page address for storing records. This is chosen to provide most even distribution of records and tends to minimum
There are various techniques for hashing
For internal files, hashing is typically implemented as a hash table through the use of an array of records and the array index range is from 0 to M – then we have M slots in an array.
One common hash function is
h(K) = K mod M
Which returns the remainder of an integer hash field value K after division by M.
Hash function does not calculate unique address for two or more records Here, a collision occurs when the hash field value of a record that is being inserted hashes to an address that already contains a different record. In this situation, we must insert the new record in some other position, since its hash address is occupied.The process of finding another position is called collision resolutions.
There are number of methods for collision resolution.
In this method, all the elements, where keys hash to the same hash table slot are put in linked list manner.
Collision Resolution by Open Addressing
in open addressing, the keys to be hashed is to put in the separate location of the hash table. Each location contains some key or the some other character to indicate that the particular location is free.
In this method to insert key into the table, we simply hash the key using the hash function, If the space is available, then insert the key into the hash table location, otherwise search the location in the forward direction of the table to find the slot in a systematic manner. The process of finding the slot in the hash table is called probing.
Linear probing: The linear probing use the following hash function
h(k, i) =[h’(k) + i] mod m for i = 0, 1, 2, … m-1 ,
where m is the size of the hash table, h’(k) = k mod m , the basic function, and i=The probe number.
Quadratic probing The quadratic probing uses the following hash function
h (k,i) = [h’(k) c + c2i2] mod m for i = 0,1, … m – 1
where, m = size of hash table
hi(k) = k mod in, the basic hash function
c1 and c2 # 0 are auxiliary and i = the probe number.
Double hashing In double hashing, second.hashing function H’ is use resolving a collision. Suppose a record R with key K has the hash address
H(K) = h and H’ (k)= h’ ≠ m
The we linearly search the location with addresses
h, h + h’ , h + 2h’ , h + 3h’…
If m is prime number, then the above sequence will access all the locations in the table T.
in this cyberpoint solution tutorial we are trying to solve and explain each and every concept of File Organisation Techniques in DBMS Tutorial Study Notes with Examples
External Hashing for Disk Files
Hashing for disk files is called external hashing. To suit the characteristics of disk storage, the target address space is made of buckets, each of Which holds multiple records. A bucket is either one disk block or a cluster of contiguous blocks. The hashing function maps a key into a relative bucket number, rather than assigning an absolute block address to the bucket. A table maintained in the file header converts– the bucket number into the corresponding disk block address. The collisions problem is less severe. with buckets.
Indexing Structures for Files
Indexing mechanism are used to optimize certain accesses to data (records) managed in files. e.g., the author catalog in a library is a type of index. Search key (definition) attribute or combination of attributes used to look-up records in a file.
An index file consists of records (called index entries) of the form.
Index files are typically much smaller than the original file because only the values for search key and pointer are stored. The most prevalent types of indexes are based on ordered files (single-level indexes) and tree data structures (multilevel indexes),
Types of Single Level Ordered Indexes
In an ordered index file, index eateries are stored sorted by the search key value. There are several types of ordered Indexes
Primary index is an ordered file whose records are of fixed length with two fields. The first field is of the same data type as the ordering key field called the primary key of the data file and the second field is a pointer to a disk block (a block address).
- There is one index entry in the index file for each block in the data file.
- indexes can also be characterized as dense or sparse.
- Dense index A dense index has an index entry for every search key value in the data file.
- Sparse index A sparse index (non-dense), on the other hand has index entries for only some of the search values.
- A primary index is a non-dense (sparse) index, since it includes an entry for each disk block of the data file rather than for every search value.
If fife records are physically ordered on a non-key field which does not have a distinct value for each record that field is called the clustering field. We can create a different type of index, called a clustering index, to speed up retrieval of records that have the same value for the clustering field.
- A clustering index is also an ordered file with two fields. The first field is of the same type as the clustering field of the data file.
- The record field in the clustering index is a block pointer.
- A clustering index is another example of a non-dense index.
A secondary index provides a secondary means of accessing a file for which some primary access already exists. The secondary index may be on a field which is a candidate key and has a unique value in every record or a non-key with duplicate values. The index is an ordered file with two fields. The first field is of the same data type as some non-ordering field of the data file that is an indexing field. The second field is either a block pointer or a record pointer.
A secondary time index usually needs more storage space and longer search an does a primary index.
The idea behind a multilevel index is to reduce the part of the index. A multilevel index considers the index file, which will be referred now as the first (or base) level of a multilevel index. Therefore, we can create a primary index for the first level this index to the first level is called the second level of the multilevel index and so on.
Dynamic Multilevel Indexes Using B–Trees and B”-Trees
There are two multilevel indexes.
- When data volume is large and does not fit in memory, an extension of the binary search tree to disk based environment is the B-tree.
- In fact, since the B-tree is always balanced (all leaf nodes appear at the same level); it is an extension of the balanced binary search tree.
- The problem which the B-tree aims to solve is given a large collection or objects, each having a key and a value, design a disk based index structure which efficiently supports query and update.
- A B-tree of order p, when used as an access structure on a key field to search for records in a data file, can be defined as follows.
- (i) Each internal node in the B-tree is of the form
<p1 < Ki, Pri >, P2, < K2, Pr2 >, < Kg _ 1, /3;1 >P
where, q≤ p.
Each Pi is a tree pointer to another node in the B-tree.
Each Pfj is a data pointer to the record whose search key field is equal to Ki.
- (ii) Within each node, K1 <K2 <….< K1-1
- (iii) Each node has at most p tree pointers.
- (iv) Each node, except the root and leaf nodes, has atleast [(p/2)] tree pointer.
- (v) A node within q tree pointers q < p, has q – 1 search key field value pointers,
(and hence has q – 1 data pointers).
e,g,, A B–tree of order p= 3. The values were inserted in the order 8,5,1, 7, 3, 12,9,6.
B + Trees
- It is the variation of the B-tree data structure.
- In a B-tree, every value of the search field appears once at some level in the tree, along with a data pointer. In a B+-tree, data pointers are stored only data the leaf nodes of the tree. Hence, the structure of the leaf nodes differs from the structure of internal nodes.
- The pointers in the internal nodes are tree pointers to blocks that are tree nodes whereas the pointers in leaf nodes are data pointers.
B+– Tree’s Structure
The structure of the B ± -tree of order p is as follows
- Each internal node is of the form < P1 , K1 , P2,, K2, …, Pq -1/ Kq -1 P>
- Where, q≤ p and each P1 is a tree pointer.
- Within each internal node, K1 < K2 < K3 • •• < Kq -1 tree _
- Each internal node has at most p tree pointers and except root, has at-least [(p/2)] pointers. The root node has at-least two tree pointers, if it is an internal node.
- Each leaf node is of the form.
« KI ,Pri > , < K2/ Pr2 >1 …, < Kfq–1 >, Pnext>
Where, q ≤p, each prj is a data pointer and pnext points to the next leaf node of the B+-trees.
Sorting in Design and Analysis of Algorithm Study Notes with Example
Follow Us On Cyber Point Solution Youtube Channel : Click Here
Learn More Ethical Hacking and Cyber Security click on this link. cyber security