Analysis
and description of supplied dataset
We
have used an “Import Function” in SQL Developer and import the provided
datasets for evaluation. The dataset we import is now in excel spreadsheet
format comprising tables with fields. Now we are going to analyze the techniques for
dataset analysis in better understanding of the data records in the table. For
the given dataset in Excel (e.g., Order
Table form), we identified the succeeding
things from the given dataset.
Duplicate
data records: The columns that comprise of duplicate
records, for example, are Order ID,
Product ID, Quantity, and Customer ID.
Quantitative
Data: The dataset holds discrete records in the columns/fields
like Quantity, Sales, and Profit.
Necessary
tables and fields (Possible database design solution)
From
the supplied dataset (order), we derived important
tables along with fields that define the relations in the tables. The tables are the following
Order
table (with fields)
Order
ID (primary
key/Not Null) Int |
Order
Date Varchar |
Ship
Date Varchar |
Ship
Mode Varchar |
Customer
ID (Foreign key) Int |
Product
ID (Foreign key) Int |
Customer
table (with fields)
Customer
ID (Primary Key/Not Null) Int |
Customer
Name Varchar |
Segment Varchar |
Country Varchar |
City Varchar |
State
Varchar |
Postal
Code Varchar |
Region Varchar |
Product
table (with fields)
Product
ID (primary key/Not Null) Int |
Product
Name Varchar |
Sales Varchar |
Quantity Varchar |
Discount
Varchar |
Profit Varchar |
Category
Name (Foreign Key) Varchar |
Product_Category
table (with fields)
Category
ID (primary key) Int |
Category
Name (Not Null) Varchar |
Sub-Category
(Unique) Varchar |
Product
ID (foreign key) Int |
Business
rules and relationships
These
are the certain identified rules and
relationships from the possible database solution of the given dataset. The
database comprises of tables that have
recognized relationships and business rules to identify
their relationships with each other and respective fields.
-
When a new kind of order is placed, the Sales field for the product must
be increased by the amount of order of the products.
-
Removing an order or altering the order
quantity/product price must likewise cause the Sales field/column to be
changed/adjusted.
-
If the consumer order’s product total is
$1,000 or more than that in a year, then provide a minimum 10% discount on the
order sale products.
-
No consumer is permitted to place
product orders that would surpass client’s credit boundary.
-
When a
novel order is placed by a customer for any product, the quantity column
for the particular product being placed
must be reduced by the quantity of the items ordered. Similarly, deleting a particular order,
altering the product ordered, or altering the quantity
ordered from the database must likewise cause corresponding modifications to
the quantity column
Data
Issues- Handling the data issues in database design
Enforcing
or implementing the business rules and relationships amid the different
entities originates some issues with the
data in the database derived from the given dataset. Some of them include:
-
Redundancy-
Firstly, as we derived the database from the specified Order dataset, there are
several records we have seen are redundant and duplicate entries. There are
repeated values in several records of the
Order datasets or database.
-
Inconsistency
and Absence of Normalization: The Order database comprises of tables that are present in 1NF
(First Normal Form). Therefore, its data
records are not unique in the fields. It makes the whole database inconsistent
and redundant
Primary
and foreign keys
Each
table has its defined primary and foreign
keys. For example;
1.
Order Table has the following:
-
Primary
key:
Order ID
-
Foreign
Keys: Customer ID and Product ID
2.
Product Table has the following:
-
Primary
Key:
Product ID
-
Foreign
Key:
Category Name
3.
Customer Table has the following:
-
Primary
Key:
Customer ID
-
Foreign
Key:
Not Existed
4.
Product_Category Table has the following:
-
Primary
Key:
Category ID
-
Foreign
Key:
Product ID
CHECK,
NOT NULL and UNIQUE Constraints
Each
table has its specified constraints put on the database design. The tables consist of fields that are defined UNIQUE, NOT
NULL and CHECK constraints.
1.
Order Table has the following:
Order ID Not Null
2.
Product Table has the following:
Product ID Not Null
3.
Customer Table has the following:
Customer ID Not Null
4.
Product_Category Table has the following:
UNIQUE Sub-Category
Derived
or Virtual columns of the database are Discount and Profit
as we analyzed that profits column values
are derived from the values in Discounts column of the database respectively.
Questions
asked to extract additional information from dataset
For extracting the
additional data from the dataset, we need to ask the frequent questions like;
1. What are the types
and number of attributes and statistical procedures pertinent to them?
2. Which values are
optimized and unique in the dataset?
3. Is there any missing values in the data records in the given dataset?
Assumptions
for the dataset
We
do assume certain things about the given dataset while analyzing and
normalizing it into tables of the database design solution. The dataset is used
to analyze in the form of relational
database. We aim to create a database (relational database) for utilization in
future forecast/study of consumer demographics and purchases. Therefore, we assume
the following things for the dataset to resolve into a relational database for
the business scenario/condition.
1. The main assumption for
the dataset is that it is discrete. It means finite values are likely to exist that can’t be further
divided.
2. The analysis and
measurement structure is precise and accurate.
3. Additional assumptions
are independence, random samples, stability and equivalent
variance
Risks
in converting the dataset to a relational database
-
The important risk in transforming the dataset into the database (relational) is the lack of proper series of records in
the new database.
-
The proper series or sequence might not be retained after changing into tables.
-
The dataset has records within fields
that maps to tables might not be consistent as analyzed in the dataset.
-
Last but not the lease, data variety
might be changed after changing into a relational
database
Benefits
of relational database versus dataset
There
are several benefits of investigating the dataset through a relational database
as opposed to leaving it as an Excel
Spreadsheet.
-
Data arranged in different tables with
fields and columns would be more efficient to analyze rather than just a single
dataset in Excel Spreadsheet comprising fields
in a single table format/matrix.
-
Each field/column
defined in the relational tables of database describes an entity with more
details as compared to Excel Spreadsheet.
-
It specifies more attributes of a record
or data in the simple and efficient form
in comparison to Excel spreadsheet.
-
Dataset holds values of records with
inconsistency, redundancy, and inaccuracy
that will not be present in the relational database.
-
It becomes easy and effective to examine records in the relational database as tables comprise of
updated database records, defined primary key restrictions,
etc. functionalities.
-
The database tables are accurately arranged, and adjusted databases
continuously perform and accomplish best.
-
Relational databases contain simplified
records with defined constraints and relationships between different
fields/columns are stated properly. These
things are not present in Excel Spreadsheet
Groff, J. (2009). Sql The Complete Reference.
Tata McGraw-Hill Education.
Mahal, A. (2010). How Work
Gets Done: Business Process Management, Basics & Beyond. Technics
Publications.
Powell, G. (2006). Beginning
Database Design. John Wiley & Sons.
Get Free Quote!
313 Experts Online