Handbook of Computer Science (cs) and IT

operating

File Organisation

The database is stored as a collection of files. Each file is a sequence of

records. A record is a sequence of fields. Data is usually stored in the form

of records. Records usually describe entities and their attributes. e,g_. employee record represents an employee entity and each field value in the record specifies some attributes of that employee, such as Name

Birth-date, Salary or Supervisor.

If every record in the file has exactly the same size (in bytes), the file is said to be made up of fixed length records. If different records in the file have different sizes, the file is said to be made up of variable length records,

Spanned versus Unspanned Records

The records of a file must be allocated to disk blocks because a block is the unit of data transfer between disk and memory. When the block size is larger than the record size, each block will contain numerous records, although some of the files may have unusually large records that cannot fit in one block.
Suppose that block size is B For a file of fixed length records of size R bytes, with B ≥ R , then we can fit bfr =[ B/R] records per block. The value of bfr is called
the blocking factor.
In general, R may not divide B exactly, so we have some unused space in each block equal to B — (bfr * R)
To utilize this unused space, we can store part of a record on one block and the rest on another. A pointer at the end of the first block points to the block containing the remainder of the record. This organisation is called spanned.
If records are not allowed to cross block boundaries, the organisation is called

Allocating File Blocks on Disk

There are several standard techniques for allocating the blocks of a file on disk Contiguous Allocation The file blocks are allocated to consecutive disk blocks. This makes reading the whole file very fast.

Linked Allocation In this, each file contains a pointer to the next file block

Indexed Allocation Where one or more index blocks contain pointers to the actual file blocks.

File Headers

A file header or file descriptor contains information about a file that is needed by the system programs that access the file records.

Operations on Files

Operations on files are given below.

Retrieval Operations These do not change any data in the files but only local certain recoids so that their field value can be examined and processed.

Update operations These change the file, by insertion or deletion of records or by modification of field values.

open prepares the file for reading or writing. Set the file pointer to the beginning of the file.

Reset Sets the file pointer of an open file to the beginning of the file,

Find (or Locate) Searches for the first record that satisfies a search condition. Transfers the block containing that record into a main memory buffer (if it is not already there).

Read (or less Get) Copies the current record from the buffer to a program variable in the user program.

FindNext Searches for the next record in the file that satisfies the search condition.

Delete Deletes the current record and updates the file on disk to reflect the deletion.

Modify Modifies some field values for the current record and updates the file on disk to reflect the modification.

Insert Insert a new record in the file by locating the block, where the record is to be inserted, transferring that block into the main memory buffer, writing the record into the buffer and writing the buffer to disk to reflect the insertion.

Close Completes the file access by releasing the buffers and performing any other needed cleanup operations.

Files of Unordered Records (Heap Files)

In the. simplest type of organization records are placed in the file in the Order n which they are inserted, so new records are inserted at the end of ^the file, Such an or is called a heap or pile file.

This or oganisation is often used with additional access paths, such as the secondary indexes.

In this-type of orgransation, insert a new record is very efficient .Linear search is used to search a record.

Files of Ordered Records (Sorted Files)

We can physically order the records of a file on disk based on the val_ues of one of their fields called the ordering field. This leads to an ordered sequential file.
ordering field is also a key field of the file, a field guaranteed to have a unique value in each record, then the field is called the ordering key for the file. Binary searching is used to search a record.

Hashing Techniques

Another type of primary file organisation is based on hashing which
provides very fast access to records under certain search conditions. This
organisation is usually called a hash file. A hash file has also been called a

direct file.

Key Points

The search condition must be an equality condition on a single field, called the hash field.

Hash function is used to determine page address for storing records. This is chosen to provide most even distribution of records and tends to minimum

There are various techniques for hashing

Internal Hashing

For internal files, hashing is typically implemented as a hash table through the use of an array of records and the array index range is from 0 to M – then we have M slots in an array.

One common hash function is

h(K) = K mod M

Which returns the remainder of an integer hash field value K after division by M.

Collision

Hash function does not calculate unique address for two or more records Here, a collision occurs when the hash field value of a record that is being inserted hashes to an address that already contains a different record. In this situation, we must insert the new record in some other position, since its hash address is occupied.The process of finding another position is called collision resolutions.

There are number of methods for collision resolution.

..;

Chaining

In this method, all the elements, where keys hash to the same hash table

slot are put in linked list manner.

Collision Resolution by Open Addressing

in open addressing, the keys to be hashed is to put in the separate location of the hash table. Each location contains some key or the some other character to indicate that the particular location is free.

In this method to insert key into the table, we simply hash the key using the hash function, If the space is available, then insert the key into the hash table location, otherwise search the location in the forward direction of the table to find the slot in a systematic manner. The process of finding the slot

in the hash table is called probing.

Linear probing The linear probing use the following hash function h(k, i) =[h’(k) + i] mod rn for i = 0, 1, 2, … m-1 ,

where m is the size of the hash table, h’(k) = k mod m , the basic function, and The probe number.

Quadratic probing The quadratic probing uses the following hash function

h (k,i) = [h’(k) c + c₂i²] mod m for i = 0,1, … m – 1

where, m = size of hash table

hi(k) = k mod in, the basic hash function

c₁ and c2 # 0 are auxiliary and i = the probe number.

Double hashing In doble hashing, second_.hashing function H’ is use resolving a collision. Suppose a record R with key K has the hash address

H(K) = h and H’ (k)= h’ ≠ m

The we linearly search the location with addresses

h, h + h’ , h + 2h’ , h + 3h’…

If m is prime number, then the above sequence will access all the locations in the table T.

External Hashing for Disk Files

Hashing for disk files is called external hasing. To suit the characteristics of disk storage, the target address space is made of buckets, each of Which holds multiple records. A bucket is either one disk block or a cluster of contiguous blocks. The hashing function maps a key into a relative bucket number, rather than assigning an absolute block address to the bucket. A table maintained in the file header converts^– the bucket number into the corresponding disk block address. The collisions problem is less severe. with buckets.

Indexing Structures for Files

Indexing mechanism are used to optimize certain accesses to data
(records) managed in files. e.g., the author catalog in a library is a type of

index. Search key (definition) attribute or combination of attributes used to look-up records in a file.

An index file consists of records (called index entries) of the form.

Index files are typically much smaller than the original file because only the
values for search key and pointer are stored. The most prevalent types of

indexes are based on ordered files (single-level indexes) and tree data structures (multilevel indexes),

Types of Single Level Ordered Indexes

In an ordered index file, index enteries are stored sorted by the search key value. There are several types of ordered Indexes

Primary Index

Primary index is an ordered file whose records are of fixed length with two fields. The first field is of the same data type as the ordering key field called the primary key of the data file and the second field is a pointer to a disk block (a block address).

Key points

There is one index entry in the index file for each block in the data file.
indexes can also be characterised as dense or sparse.
Dense index A dense index has an index entry for every search key value in

the data file.

Sparse index A sparse index (non-dense), on the other hand has index entries for only some of the search values.
A primary index is a non-dense (sparse) index, since it includes an entry for each disk block of the data file rather than for every search value.

Clustering Index

If fife records are physically ordered on a non-key field which does not have

a distinct value for each record that field is called the clustering field. We can create a different type of index, called a clustering index, to speed up retrieval of records that have the same value for the clustering field.

Key Points

A clustering index is also an ordered file with two fields. The first field is of the same type as the clustering field of the data file.
The record field in the clustering index is a block pointer.
A clustering index is another example of a non-dense index.

Secondary Index

A secondary index provides a secondary means of accessing a file for
which some primary access already exists. The secondary index may be
on a field which is a candidate key and has a unique value in every record
or a non-key with duplicate values. The index is an ordered file with two fields. The first field is of the same data type as some non-ordering field of

the data file that is an indexing field. The second field is either a block pointer or a record pointer.

A secondary time index usually needs more storage space and longer search an does a primary index.

Multilevel Indexes

The idea behind a multilevel index is to reduce the part of the index. A multilevel index considers the index file, which will be referred now as the first (or base) level of a multilevel index. Therefore, we can create a primary index for the first level this index to the first level is called the second level of the multilevel index and so on.

Dynamic Multilevel Indexes Using B–Trees and B”-Trees

There are two multilevel indexes.

B-Trees

When data volume is large and does not fit in memory, an extension of the binary search tree to disk based environment is the B-tree.
In fact, since the B-tree is always balanced (all leaf nodes appear at the same level)_; it is an extension of the balanced binary search tree.
The problem which the B-tree aims to solve is given a large collection or objects, each having a key and a value, design a disk based index structure which efficiently supports query and update.
A B-tree of order p, when used as an access structure on a key field to search for records in a data file, can be defined as follows
Each internal node in the B-tree is of the form

<p₁ < K_i, P_ri >, P2, < K2, P_r2 >, < K_g _ 1, /³_;1 >P

where, q≤ p.

Each P_i is a tree pointer to another node in the B-tree.

Each P_fj is a data pointer to the record whose search key field is equal to K,.

Within each node, K₁ <K2 <….< K_1-₁
Each node has at most p tree pointers.
Each node, except the root and leaf nodes, has atleast [(p/2)] tree pointer.
A node within q tree pointers q < p, has q – 1 search key field value pointers,

(and hence has q – 1 data pointers).

e,g,, A B–tree of order p= 3. The values were inserted in the or-

1_, 7, 3, 12,9,6

B ⁺ Trees

It is the variation of the B-tree data structure.
In a B-tree, every value of the search field appears once at some level in

the tree, along with a data pointer. In a B⁺-tree, data pointers are stored
only data the leaf nodes of the tree. Hence, the structure of the leaf nodes differs from the structure of internal nodes.

The pointers in the internal nodes are tree pointers to blocks that are tree

nodes whereas the pointers in leaf nodes are data pointers.

B⁺– Tree’s Structure

The structure of the B ± -tree of order p is as follows

Each internal node is of the form < P₁ , K₁ , P₂^,, K₂, …, P_q -1/ Kq -1 P>
- Where, q≤ p and each P₁ is a tree pointer.
- Within each internal node, K₁ < K2 < K3 • •• < K_{q -1} tree _

Each internal node has at most p tree pointers and except root, has atleast [(p/2)] pointers. The root node has atleast two tree pointers, if it is an internal node.

Each leaf node is of the form.

« K_I ,P_ri > , < K_2/ Pr2 >₁ …, < K_fq–₁>, P_next>

Where, q ≤p, each p_rj is a data pointer and p_next points to the next leaf node of the B⁺-trees.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Handbook of Computer Science(cs) and IT

Leave a Reply Cancel reply