Import Function” in SQL Developer and import the provided datasets for evaluation

computer science

Description

Analysis and description of supplied dataset

We have used an “Import Function” in SQL Developer and import the provided datasets for evaluation. The dataset we import is now in excel spreadsheet format comprising tables with fields.  Now we are going to analyze the techniques for dataset analysis in better understanding of the data records in the table. For the given dataset in Excel (e.g., Order Table form), we identified the succeeding things from the given dataset.

Duplicate data records: The columns that comprise of duplicate records, for example, are Order ID, Product ID, Quantity, and Customer ID.

Quantitative Data: The dataset holds discrete records in the columns/fields like Quantity, Sales, and Profit.

Necessary tables and fields (Possible database design solution)

From the supplied dataset (order), we derived important tables along with fields that define the relations in the tables. The tables are the following

Order table (with fields)

Order ID

(primary key/Not Null)

Int

Order Date

Varchar

Ship Date

Varchar

Ship Mode

Varchar

Customer ID (Foreign key)

Int

Product ID (Foreign key)

Int

 

Customer table (with fields)

Customer ID (Primary Key/Not Null)

Int

Customer Name

Varchar

Segment

Varchar

Country

Varchar

City

Varchar

State

Varchar

Postal Code

Varchar

Region

Varchar

 

Product table (with fields)

Product ID (primary key/Not Null)

Int

Product Name

Varchar

Sales

Varchar

Quantity

Varchar

Discount

Varchar

Profit

Varchar

Category Name (Foreign Key)

Varchar

 

Product_Category table (with fields)

Category ID (primary key)

Int

Category Name (Not Null)

Varchar

Sub-Category (Unique)

Varchar

Product ID (foreign key) Int

 

Business rules and relationships

These are the certain identified rules and relationships from the possible database solution of the given dataset. The database comprises of tables that have recognized relationships and business rules to identify their relationships with each other and respective fields.

-          When a new kind of order is placed, the Sales field for the product must be increased by the amount of order of the products.

-          Removing an order or altering the order quantity/product price must likewise cause the Sales field/column to be changed/adjusted.

-          If the consumer order’s product total is $1,000 or more than that in a year, then provide a minimum 10% discount on the order sale products.

-          No consumer is permitted to place product orders that would surpass client’s credit boundary.

-          When a novel order is placed by a customer for any product, the quantity column for the particular product being placed must be reduced by the quantity of the items ordered.  Similarly, deleting a particular order, altering the product ordered, or altering the quantity ordered from the database must likewise cause corresponding modifications to the quantity column (Groff, 2009).

Data Issues- Handling the data issues in database design

Enforcing or implementing the business rules and relationships amid the different entities originates some issues with the data in the database derived from the given dataset. Some of them include:

-          Redundancy- Firstly, as we derived the database from the specified Order dataset, there are several records we have seen are redundant and duplicate entries. There are repeated values in several records of the Order datasets or database.

-          Inconsistency and Absence of Normalization: The Order database comprises of tables that are present in 1NF (First Normal Form). Therefore, its data records are not unique in the fields. It makes the whole database inconsistent and redundant (Mahal, 2010).

Primary and foreign keys

Each table has its defined primary and foreign keys. For example;

1. Order Table has the following:

-          Primary key: Order ID

-          Foreign Keys: Customer ID and Product ID

2. Product Table has the following:

-          Primary Key: Product ID

-          Foreign Key: Category Name

3. Customer Table has the following:

-          Primary Key: Customer ID

-          Foreign Key: Not Existed

4. Product_Category Table has the following:

-          Primary Key: Category ID

-          Foreign Key: Product ID

CHECK, NOT NULL and UNIQUE Constraints

Each table has its specified constraints put on the database design. The tables consist of fields that are defined UNIQUE, NOT NULL and CHECK constraints.

1. Order Table has the following:

Order ID Not Null

2. Product Table has the following:

Product ID Not Null

3. Customer Table has the following:

Customer ID Not Null

4. Product_Category Table has the following:

UNIQUE Sub-Category

Derived or Virtual columns of the database are Discount and Profit as we analyzed that profits column values are derived from the values in Discounts column of the database respectively.

Questions asked to extract additional information from dataset

For extracting the additional data from the dataset, we need to ask the frequent questions like;

1. What are the types and number of attributes and statistical procedures pertinent to them?

2. Which values are optimized and unique in the dataset?

3. Is there any missing values in the data records in the given dataset? (Groff, 2009)

Assumptions for the dataset

We do assume certain things about the given dataset while analyzing and normalizing it into tables of the database design solution. The dataset is used to analyze in the form of relational database. We aim to create a database (relational database) for utilization in future forecast/study of consumer demographics and purchases. Therefore, we assume the following things for the dataset to resolve into a relational database for the business scenario/condition.

1. The main assumption for the dataset is that it is discrete. It means finite values are likely to exist that can’t be further divided.

2. The analysis and measurement structure is precise and accurate.

3. Additional assumptions are independence, random samples, stability and equivalent variance (Mahal, 2010).

Risks in converting the dataset to a relational database

-          The important risk in transforming the dataset into the database (relational) is the lack of proper series of records in the new database.

-          The proper series or sequence might not be retained after changing into tables.

-          The dataset has records within fields that maps to tables might not be consistent as analyzed in the dataset.

-          Last but not the lease, data variety might be changed after changing into a relational database (Powell, 2006).

Benefits of relational database versus dataset

There are several benefits of investigating the dataset through a relational database as opposed to leaving it as an Excel Spreadsheet.

-          Data arranged in different tables with fields and columns would be more efficient to analyze rather than just a single dataset in Excel Spreadsheet comprising fields in a single table format/matrix.

-          Each field/column defined in the relational tables of database describes an entity with more details as compared to Excel Spreadsheet.

-          It specifies more attributes of a record or data in the simple and efficient form in comparison to Excel spreadsheet.

-          Dataset holds values of records with inconsistency, redundancy, and inaccuracy that will not be present in the relational database.

-          It becomes easy and effective to examine records in the relational database as tables comprise of updated database records, defined primary key restrictions, etc. functionalities.

-          The database tables are accurately arranged, and adjusted databases continuously perform and accomplish best.

-          Relational databases contain simplified records with defined constraints and relationships between different fields/columns are stated properly. These things are not present in Excel Spreadsheet (Powell, 2006).

 

 

 

 

References

Groff, J. (2009). Sql The Complete Reference. Tata McGraw-Hill Education.

Mahal, A. (2010). How Work Gets Done: Business Process Management, Basics & Beyond. Technics Publications.

Powell, G. (2006). Beginning Database Design. John Wiley & Sons.

 


Related Questions in computer science category