Monday, 22 April 2013

EVERYTHING YOU NEED TO KNOW ABOUT DATABASE.

DATA

        -The term data is derived from a Latin word "datum" .By data we mean raw facts that can be recorded and that has some implicit meaning

DATABASE

        -A database is acollection of logically related data with some inherent meaning.Data means raw facts that can be recorded and has some implicit meaning.For eg:-Consider the name,phone number and address of some people.We may record this data in an indexed address group or we may store this in a diskette,using a personal computer and some application software such as DBASE,MS-ACCESS,EXCEL etc.This is a collection of related data with some implicit meaning and hence it is a database.

IMPLICIT PROPERTIES OF A DATABASE

        -1) A database represents some aspects of the real world,sometimes called mini-world.Changes to the mini-world are reflected in the database.
          2) A database is a logically coherent collection of data with some inherent meaning.A random assortment or arrangement of data cannot be correctly recorded as a database.
          3) A database is designed,built and populated with data for a specific purpose

DBMS(DATABASE MANAGEMENT SYSTEM)

        -A database management system is a collection of programs that enables user to create and maintain a database.A DBMS is hence a general purpose software system that facilitate the process of defining,constructing,manipulating and sharing the database for various applications.
        Defining a database ,involves specifying the data types,structure and constraints for the data to be stored in the database.
        Constructing the database is the process of storing the data by itself on some storage medium which is controlled by DBMS.
        Manipulating a database includes some functions for querying the database to retrieve specific data,updating the database to reflect changes in the mini-world and generating reports from data.
        Sharing the database allow multiple users and programs to access the database concurrently.

ROLE OF A DATABASE MANAGER

        -The database manager is also called the database administrator(DBA).In precise,the functions of DBA are:-
    a)To identify the entities of interest to the organisation & to determine the attributes of the entities to be recorded.
    b)To outline the structures of the database to be formed to store the data.
    c)To decide upon the policy of retrieval of data.
    d)To resolve the conflict among the users by reallocating resources available to achieve maximum organisational benefits.
    e)To provide authentication to the users.
    f)To maintain the security of the database.
    g)To provide time to time report to the upper level administrator about any need regarding the database.
    h)To define strategy for backup and recovery.

ADVANTAGES OF DBMS:-

1)To create,delete or update databse.
2)To retrieve the records according to our requirements
3)To create various reports from the database for higher authorities
4)To index the data for better performance.
5)There are various built-in functions for our convenience.
6)It helps us in reducing redundancy.
7)The data will be consistent.
8)Security cn be applied.

DISADVANTAGES OF DBMS:-

1)Danger of a Overkill: For small and simple applications for single users a database system is often not advisable.

2)Complexity: A database system creates additional complexity and requirements. The supply and operation of a database management system with several users and databases is quite costly and demanding.

3)Qualified Personnel: The professional operation of a database system requires appropriately trained staff. Without a qualified database administrator nothing will work for long.

4)Costs: Through the use of a database system new costs are generated for the system itselfs but also for additional hardware and the more complex handling of the system.

5)Lower Efficiency: A database system is a multi-use software which is often less efficient than specialised software which is produced and optimised exactly for one problem.

1)What is a catalog?
        -The database system contains not only the database itself but also a complete definition or description of the database structure and constraints.This definition is stored in the system catalog which contains information such as-the structure of each file,the type and storage format of each data item.The information stored in the catalog is called data and it describes the structure of primary database.(data about data is also called metadata)

THREE SCHEMA ARCHITECTURE

Here we specify an architecture for data base systems,called three schema architecture for data base systems,called three schema architeccture.The goal of three scchema architecture is to separate the user application and the physical database

 

1. The internet level has an internet schema,which describes the physical storage structure of the database.The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database.

 

2.The conceptual level as a conceptual schema,which describes the structure of the whole database for a community of user.The conceptual schema hides the details of physical storage structure and concentrate on describing entities,data types,relationships and constraints.

 

3.The external or view level includes a number of external schemas or user views.Each external schema describes the part of the data base that a particular user group is interested in and hides the rest of the data base from that user group.

 

The three schema architecture is a convenient tool with which the user can visualise the schema level in a databse system.

 

 

 

"DATA MODEL"

 

        -Data Model is a collection of concepts that can be used to describe and manipulate the structure of a database. By "structure of a database" we mean the data type relationships and constraints that should hold the data.A data model can be thought of as comprising three components:-

 

1. A structural part,consisting of a set of rules,according to which databases can be constructed.

2. A manipulative part that defines the types of operations that are allowed on the data.

3. A state integriti rules,which ensures that the data is accured.In other words a data model is a set of technologies that describes how to create,insert,modify and delete the data from database.

 

 

 

TYPES OF DATA MODEL

 

A) HIERARCHICAL DATA MODEL:-It is the oldest data model which is based on the concept of tree data structure.Hierarchical data model represents data in the form of a tree.This model describes the data in the form of parent child relationship.The tree strcuture consists of hierarchy of parental child data segment

 

ADVANTAGES:-

 

        1.It is simple to construct and operate on data in this model.

        2.Easy to understand for all

        3.It is best fit for organisational data.

        4.It has simple language whose construct are very simple like GET,NEXT,etc

 

DISADVANTAGES:-

 

        1.If we delete a parent note then all the children associated with it are automatically deleted

        2.It uses programming approach for each operation which is tedious and time consuming process.

        3.It requires in depth knowledge of data structure.

        4.It uses difficult mechanism such as order or post order in order or tree traversing algorithm to access or retrieve data.

 

B)NETWORK DATA MODEL:-Network data model was proposed by the database task group known as DBTG/CODASYL the network data model overcomes the major disadvantage of hierarchical model of not describing the relation of (1:M) from the parent child.The network model uses different data structures ,that are record type and set type to represent data and relationships between them

 

ADVANTAGES:-

 

        1.It can represent (1:1),(1:n),(m:n),(m:1) relationships.

        2.It can represent complex relationships between various record types.

        3.It supports easy navigation within seconds by constructs like FIND,FIND-ONNER,FIND MEMBER ONNER etc

       

DISADVANTAGES:-

 

        1.It is much more complex and difficult.

        2.It requires use of pointers for each access of data.

        3.It has very little scope in speeding up processing.

 

C)RELATIONAL MODEL:-This model represents data in the form of relations which is a collection of inter related records.Relation is the basic concept of relational model and has the following properties:-

       1.It should be a column and homogeneous,i.e, a column should have the same type of value.

       2.All the rows of a relation must be unique,i.e,each row should be different from one another.

       3.The ordering of rows and columns are immaterial.

 

ADVANTAGES:-

 

        1.Data is represented in a very simple form of a relation.

        2.Data can be easily accessed.

        3.It reduces data redundancy by using the concept of normalisation.

        4.Due to integration of data,data is consistent and accurate in this model

        5.Security can be easily added in this model.

 

DISADVANTAGES:-

 

        1.The cost of storing data in relational model is much more than that of earlier models.

        2.It consumes more memory as compared to traditional approaches.

        3.Backup and recovery is complex due to the centralised approach.

 

DATABASE SCHEMA

 

        -In any data model,it is important to distinguish between the description of the database and the database itself.The description of a databse is called the database schema,which is specified during database design and not expected to change frequently.

 




 

 

DATABASE STATE

 

        -The data in the database at a particular moment in time is called a database state or snapshot.It is also called the current set of occurrences or instances in the database.

 

DATA INDEPENDENCE

 

        -Data independence can be defined as the capacity to change the schema at one level of the database system without having to change the schema at the next higher level.

 

a) Logical data independence

        -It is the capacity to change the conceptual schema without having to change external schemas or application programmes.We may change the conceptual schema to expand the database(Byy adding the data item),or to reduce the database(by removing data items).In this case the external schemas that refer only to the remaining data should not be affected.

 

b)Physical data independence

        -It is the capacity to change the internal schema without having to change the conceptual schema.Hence the external schema need not be changed as well.

 

 

ENTITY-RELATIONAL DIAGRAM/MODEL(ER)

 

The Entity-Relationship model is a relational schema database modeling approach used to model a system and its requirements in a top-down approach..This approach is commonly used in RDBMS design.The diagrams created using this model are called ER diagrams.

        An Entity-Relationship model(ERM) is an abstract conceptual representation of abstract data:entity-relationship engages is the modelling of this structured data into models.ER diagrams were originally proposed in 1976 by Dr. Pin Shan(Peter) Chen and many variants have appeared since then.












































             

 

 

 

 

 

 

3.What are entities and attributes?

 

        - Entity-The basic object that the ER model represents is an entity,which is a thing in the real world with an independent existence.An entity may be an object with a physical existence(e.g., a person,car,house,or employee) or it may be an object with a conceptual existence..(e.g. a company,an university course etc)

 

        -Attribute-The attributes are the particular property that totally describes an entity.For e.g.:- an employee entity may be described by the attributes ,employee name,age,address,salary.

a) Composite Vs Simple attribute

            - Composite attribute can be divided into smaller subparts which represents more basic attributes with independent meaning.E.g:- Address attribute of the student can be further divided into street address,city,state and pin number.

            Attributes  that are indivisible are called simple or automic attributes .Eg:-pin-712104.

 

b)Single Vs Multiple attribute

            -Most attributes have a single value for a particular entity.Such attributes are called single valued attributes for eg,age is a single valued attribute of an entity student.In some cases an attribute can have a set of values for the same entity.For eg,colour attribute of a car or course attribute of a college.

 

c) Stored Vs Derived Attribute

                        -In some cases two or more attribute values are related,eg,the age and birth date of a person.For a particular person the value of that person’s birth date .The age attribute is called derived attribute and is said to be derivable from the birth date attribute,which is called a stored attribute                       

 

ENTITY TYPE

 

        -An entity type defines a collection of entities that have the same attributes .Each entity type in the database is described by its name and attributes.

 

ENTITY SET

 

        -The collection of all entities of a particular entity type in the database at any point in time is called an entity set.The entity set is usually referred by using the same name as the entity type

 

KEY ATTRIBUTES

 

        -An entity type usually has an attribute has an attribute whose values are distinct from each individual entity in the collection.Such an attribute is called a key attribute and its value can be used to identify each entity uniquely.

 

WEAK ENTITY

 

        -Entity types that do not have any key attribute of their own are called weaak entity type.In contrast to regular entity types that have a key attribute are sometimes called strong entity types.Entity belonging to a weak entity type are identified by specific entities from another entity type in combination with some of their attribute values.We call these other entity type as identifying or owner entity type.

 

CARDINALITY RATIO

 

        -The cardinality ratio for a binary relationship specifies the maximum number of relationship instances that can participate in.                                

 

 

for e.g:-In the "works for" relationship ,employee is to department with cardinality ratio 1:1,means that a particular employee works for only one department.The possible cardinality ratios for binary relationship types are 1:1,1:n,n:1 and n:m.

 

PARTICIPATION CONSTRAINTS

 

        -The participation constraint specifies whether the existence of an entity depends on its being related to another entity via the relationship type.In other words  participation constraints represents how twp different entities participate with each other.There are two types of particular constraints-

 a) Total

 b) Partial

     e.g. Here the field employee all the entities are participating so it is total.But for the field department all the entities are not participating (e.g. D3),so it is partial

 

DATABASE DESIGNER

 

        -Database designer is responsible for identifying the data to be stored in the database and for choosing appropiate structure to represent and store these data.It is the responsibility of the database designer to communicate with all database user in order to understand their requirement and to come up with a design that meets these requirements.

 

 

Data dictionary

A data dictionary is a collection of descriptions of the data objects or items in a data model for the benefit of programmers and others who need to refer to them. A first step in analyzing a system of objects with which users interact is to identify each object and its relationship to other objects. This process is called data modeling and results in a picture of object relationships. After each data object or item is given a descriptive name, its relationship is described (or it becomes part of some structure that implicitly describes relationship), the type of data (such as text or image or binary value) is described, possible predefined values are listed, and a brief textual description is provided. This collection can be organized for reference into a book called a data dictionary.

When developing programs that use the data model, a data dictionary can be consulted to understand where a data item fits in the structure, what values it may contain, and basically what the data item means in real-world terms. For example, a bank or group of banks could model the data objects involved in consumer banking. They could then provide a data dictionary for a bank's programmers. The data dictionary would describe each of the data items in its data model for consumer banking (for example, "Account holder" and ""Available credit").

Database about a database. A data dictionary defines the structure of the database itself (not that of the data held in the database) and is used in control and maintenance of large databases. Among other items of information, it records  what data is stored, name, description, and characteristics of each data element,  types of relationships between data elements access rights and frequency ofaccess. Also called system dictionary when used in the context of a system design.


 

 

 

 

What is Normalization ? Why should we use it?

Normalization is a database design technique which organizes tables in a manner that reduces redundancy and dependency of data.

It divides larger tables to smaller tables and link them using relationships.

The inventor of the relational model Edgar Codd proposed the theory of normalization with the introduction of FirstNormal Form and he continued to extend theory with Second and Third Normal Form. Later he joined with Raymond F. Boyce  to develop the theory of Boyce-Codd Normal Form

 

Theory of Normalization is still being developed further. For example there are discussions even on 6th Normal Form. But in most practical applications normalization achieves its best in 3rd Normal Form. The evolution of Normalization theories is illustrated below-

 

Description: alt

First Normal Form (1NF)

First Normal Form (1NF) sets the very basic rules for an organized database as follows:

  • Eliminate duplicate columns from the same table.

  • Create separate tables for each group of related data and identify each row by using a unique column or set of columns (i.e., primary key).

Second Normal Form (2NF)

Second Normal Form (2NF) further addresses the concept of removing duplicate data as follows:

  • Meet all the requirements of the First Normal Form.

  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

  • Create relationships between these new tables and their predecessors through the use of foreign keys

Third Normal Form (3NF)

Third normal form (3NF) goes one step further as follows:

  • Meet all the requirements of the Second Normal Form.

  • Remove columns that are not dependent upon the primary key.

Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly not dealt with by 3NF as originally defined.[1]

Chris Date has pointed out that a definition of what we now know as BCNF appeared in a paper by Ian Heath in 1971.[2] Date writes:

"Since that definition predated Boyce and Codd's own definition by some three years, it seems to me that BCNF ought by rights to be called Heath normal form. But it isn't."[3]

If a relational scheme is in BCNF then all redundancy based on functional dependency has been removed, although other types of redundancy may still exist. A relational schema R is in Boyce–Codd normal form if and only if for every one of its dependencies X → Y, at least one of the following conditions hold:[4]

  • X → Y is a trivial functional dependency (Y X)

  • X is a superkey for schema R

·         BCNF: Boyce-Codd Normal Form

·         Boyce-Codd normal form (BCNF) is a structure and algorithm for the normalization of relational databases. BCNF requires that there be no non-trivial functional dependencies of attributes on something other than a superset of a candidate key (called a superkey). At this stage, all attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A). A table is said to be in the BCNF if and only if it is in the 3NF and every non-trivial, left-irreducible functional dependency has a candidate key as its determinant. In more informal terms, a table is in BCNF if it is in 3NF and the only determinants are the candidate keys.

·         Fourth Normal Form (4NF) and Fifth Normal Form (5NF)

·         Fourth normal form requires that a table be BCNF and contain no multi-valued dependencies. 4NF solves the problem of multivalued dependencies, which we would encounter with our above Sales table when a customer purchased multiple products with a single order - we would have numerous rows that were dependent on one another without any clear definition of that dependency in the table.

·         Fifth normal form, also known as join-projection normal form (JPNF), states that no non-trivial join dependencies exist. 5NF states that any fact should be able to be reconstructed without any anomalous results in any case, regardless of the number of tables being joined. A 5NF table should have only candidate keys and it's primary key should consist of only a single column.

·         The problem with these forms is the size of the joins that are required to reconstruct any non-trivial data.We can rely on views and procedures to simplify them but the underlying data still ends up very complex. There are also performance issues to consider - which is why 4NF and 5NF are often academic. In most cases, 3NF (or BCNF) are ideal.

·         Domain/key normal form ( or DKNF) requires that a table should not be subject to any constraints other than domain constraints and key constraints.

·         6-th normal form ( 6 NF )= A table is in the 6NF,if and only if it satisfies no non-trivial join dependencies at all. This obviously means that the 5th NF is also satisfied. The 6TH NF was only defined when extending the relational model to take into account the temporal dimension.

A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have two main purposes:

  1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status.

  2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program's outcome are possibly erroneous.

A database transaction, by definition, must be atomic, consistent, isolated and durable. Database practitioners often refer to these properties of database transactions using the acronym ACID.

Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage.

Keys are, as their name suggests, a key part of a relational database and a vital part of the structure of a table. They ensure each record within a table can be uniquely identified by one or a combination of fields within the table. They help enforce integrity and help identify the relationship between tables. There are three main types of keys, candidate keys, primary keys and foreign keys. There is also an alternative key or secondary key that can be used, as the name suggests, as a secondary or alternative key to the primary key

Super Key

A Super key is any combination of fields within a table that uniquely identifies each record within that table.

Candidate Key

A candidate is a subset of a super key. A candidate key is a single field or the least combination of fields that uniquely identifies each record in the table. The least combination of fields distinguishes a candidate key from a super key. Every table must have at least one candidate key but at the same time can have several.

  • It must contain unique values

  • It must not contain null values

  • It contains the minimum number of fields to ensure uniqueness

  • It must uniquely identify each record in the table

Primary Key

A primary key is a candidate key that is most appropriate to be the main reference key for the table. As its name suggests, it is the primary key of reference for the table and is used throughout the database to help establish relationships with other tables. As with any candidate key the primary key must contain unique values, must never be null and uniquely identify each record in the table

Primary keys are mandatory for every table each record must have a value for its primary key. When choosing a primary key from the pool of candidate keys always choose a single simple key over a composite key

FOREIGN KEY

A foreign key is a field in a relational table that matches a candidate key of another table. The foreign key can be used to cross-reference tables.

For example, say we have two tables, a CUSTOMER table that includes all customer data, and an ORDER table that includes all customer orders. The intention here is that all orders must be associated with a customer that is already in the CUSTOMER table. To do this, we will place a foreign key in the ORDER table and have it relate to the primary key of the CUSTOMER table.

The foreign key identifies a column or set of columns in one (referencing or child) table that refers to a column or set of columns in another (referenced or parent) table. The columns in the child table must reference the columns of the primary key or other superkey in the parent table. The values in one row of the referencing columns must occur in a single row in the parent table. Thus, a row in the child table cannot contain values that don't exist in the parent table (except potentially NULL). This way references can be made to link information together and it is an essential part of database normalization. Multiple rows in the child table may refer to the same row in the parent table. Most of the time, it reflects the one (parent table or referenced table) to many (child table, or referencing table) relationship

UNIQUE KEY

In an entity relationship diagram of a data model, one or more unique keys may be declared for each data entity. Each unique key is composed from one or more data attributes of that data entity. The set of unique keys declared for a data entity is often referred to as the candidate keys for that data entity. From the set of candidate keys, a single unique key is selected and declared the primary key for that data entity. In an entity relationship diagram, each entity relationship uses a unique key, most often the primary key, of one data entity and copies the unique key data attributes to another data entity to which it relates

Alternate Keys

In the relational model, an alternate key is, like a primary key, a column or group of columns of a table, of which the values are unique at all times. We know that an alternate key is a candidate key that is not chosen to be the primary key. There are two important distinctions between primary and alternate keys. First, a table may have many alternate keys but only one primary key. Second, according to the theory of the relational model, primary keys cannot contain NULL values, whereas alternate keys can (unless it is explicitly forbidden with a NOT NULL integrity rule). However, SQL also follows the rule that alternate keys can never contain NULL values.

1.    In computer science, relational algebra is an offshoot of first-order logic and of algebra of sets concerned with operations over finitary relations, usually made more convenient to work with by identifying the components of a tuple by a name (called attribute) rather than by a numeric column index, which is what is called a relation in database terminology.

2.    The main application of relational algebra is providing a theoretical foundation for relational databases, particularly query languages for such databases, chief among which is SQL

TRIVIAL DEPENDENCY

 

    -A trivial functional dependency is a functional dependency of an atttribute on a superset of itself. { Employee ID, Employee Address}-> {Employee Address} is trivial, as is {Employee address}-> {Employee address}

 

FULL FUNCTIONAL DEPENDENCY

 

    -An attribute is fully functional dependent on a set of attributes X if it is functionally dependant on X, and b and not functionally dependant on any proper subset of X. {Employee Address} has a functional dependency on  { Employee ID,skill}, but not a full functional dependency, for it is also dependant on  { Employee ID}

 

TRANSITIVE DEPENDENCY

 

    -A transitive dependency is an indirect functional dependency, one in which X->Z only by virtue of X->Y and Y->Z

 

JOIN DEPENDENCY

 

    -A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

 

FILE ORGANISATION

 

-Precisely file organisation means organisation of records in a file.Some important types of file organisation are as follows:-

 

a)Heap or File organisation

 

        -In this type of organisation,records are collected in the order they arrive.The blocks used in a heap are connected to by pointers.when a new record is inserted ,it is placed in the last block provided it has space.If the last block is unable to keep the record,a new block is allocated and the new record is kept there.

 

b)Sequential File organisation

 

        -In this type of organisation,record are placed sequentially one after another.Also,this ordering is achieved by some key value .The key (usually primary key) is used to identify the records uniquely.If a new attribute is added to a record,then the entire file needs re-organisations.To avoid this,an extra space allocated in the sequential filter.In general,fixed length records are stored preferably.Several sorting routines like bubble sort,Heap sort,Quick sort etc can be used to sort the records before the field is created.

 

c)Indexed sequential file organisation

 

        -The indexed sequential method of file organisation reduces the inherent access problem in sequential file organisation without losing the benefits of sequential file organisation without losing the benefits of sequential file.In a batched mode,the sequential processing of records can be done efficently as in sequential file organisation.An indexed sequential file has three major components:-

 1) the prime area

 2) overflow area

 3) the index.

 

SQL AND DATA TYPES

 

SQL commonly expanded as Structured Query Language , is a computer language designed for the retrieval and management of data in relational database management systems,database schema creation and modification, and databse object access control management . SQL is a standars interactive and programming language for getting information from and updating a database.

 

-The major parts of SQL are as follows:-

a) SQL Data Definition Language (DDL)

        -DDL part of SQL permits database tables to be created or deleted.WE can also define indexes (keys),specify links between tables,and impose constraints between database tables.Important DDl statements are:- CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX, DROP INDEX

 

b)SQL Data Manipulation Language (DML)

        -SQL is a syntax for executing queries.But the SQL language also includes a syntax to update,insert ,and delete records.The important update and query commands are:-SELECT, UPDATE, DELETE, INSERT INTO.There are 2 types of DML,namely:-

     1.Procedural DML

     2.Non-procedural DML

 

c)SQL Data Control Language(DCL)

        -Data Control Language part of Sql permits to control the databse.Some important commands are:-GRANT, COMMIT, REVOKE, ROLLBACK

 

TRANSACTION

 

A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have two main purposes:

  1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops (completely or partially) and many operations upon a database remain uncompleted, with unclear status.

  2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the program's outcome are possibly erroneous.

A database transaction, by definition, must be atomic, consistent, isolated and durable.[1] Database practitioners often refer to these properties of database transactions using the acronym ACID.

Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage.

Transactional databases

A transactional database is a DBMS where write transactions on the database are able to be rolled back if they are not completed properly (e.g. due to power or connectivity loss).

Most modern relational database management systems fall into the category of databases that support transactions.

In a database system a transaction might consist of one or more data-manipulation statements and queries, each reading and/or writing information in the database. Users of database systems consider consistency and integrity of data as highly important. A simple transaction is usually issued to the database system in a language like SQL wrapped in a transaction, using a pattern similar to the following:

  1. Begin the transaction

  2. Execute a set of data manipulations and/or queries

  3. If no errors occur then commit the transaction and end it

  4. If errors occur then rollback the transaction and end it

If no errors occurred during the execution of the transaction then the system commits the transaction. A transaction commit operation applies all data manipulations within the scope of the transaction and persists the results to the database. If an error occurs during the transaction, or if the user specifies a rollback operation, the data manipulations within the transaction are not persisted to the database. In no case can a partial transaction be committed to the database since that would leave the database in an inconsistent state.

Internally, multi-user databases store and process transactions, often by using a transaction ID or XID.

There are multiple varying ways for transactions to be implemented other than the simple way documented above. Nested transactions, for example, are transactions which contain statements within them that start new transactions (i.e. sub-transactions). Multi-level transactions are similar but have a few extra properties. Another type of transaction is the compensating transaction.

CONCURRENCY

   - Concurrency control in DBMS ensures that database transactions are performed concurrently without the concurrency violating the data integrity of a database .Transactions should be executed safely and follow the ACID rules. The DBMS must guarantee that only serializable, recovering schedules are generated an also that no committed actions are lost while undoing aborted transactions.

Concurrency control mechanisms.:-

The main categories of concurrency control mechanisms are:-

1. Optimistic:- Delay the synchronization for a transaction until its end without blocking operations, and then abort transactions that violate desired synchronization tables.

2. Pessimistic:- Block operations of transaction that would cause violation of synchronization rules.

There are several methods for concurrency control. These are:-

·         Two-phase locking

·         Strict two-phase locking

·         Conservative two-phase locking

·         Index locking

·         Multiple granularity locking

HASHING

Hashing involves computing the address of data item by computing the address of a data item by computing a function on the search key value. A hash function h is a function from the set of all search key values K to the set of all bucket addresses B.

1. We choose a number of buckets to correspond to the search key values we will have stored in the databse.

2. To perform a lookup on a search key value Ki, we compute h (Ki), and search the bucket with that address

3. If two keys I and j map to the same address, because h(Ki)=h(Kj), then the bucket at the address obtained will contain records with both search key values.

4. In this case we will have to check the search key value of every record in the bucket to get the ones we want.

5. Insertion and deletion are simple.

 

 

 

 












No comments: