When You Can't Change the Design
- Item 18: Use Views to Simplify What Cannot Be Changed
- Item 19: Use ETL to Turn Nonrelational Data into Information
- Item 20: Create Summary Tables and Maintain Them
- Item 21: Use UNION Statements to "Unpivot" Non-normalized Data
Sometimes, despite their best efforts, SQL database developers are forced to deal with external data outside of their control. World-class database consultants and instructors present effective ways to help you deal with such situations.
You have spent considerable time ensuring that you have a proper logical data model for your situation. You have worked hard to ensure that it has been implemented as an appropriate physical model. Unfortunately, you find that some of your data must come from a source outside your control.
This does not mean that you are doomed to have SQL queries that will not perform well. The items in this chapter are intended to help you understand some options you have to be able to work with that inappropriately designed data from other sources. We will consider both the case when you can create objects to hold the transformations and the case when you must perform the transformation as part of the query itself.
Because you do not have control over the external data, there is nothing you can do to change the design. However, you can use the information in the items in this chapter to work with the DBAs and still end up with effective SQL.
Item 18: Use Views to Simplify What Cannot Be Changed
Views are simply a composition of a table in the form of a predefined SQL query on one or many tables or other views. Although they are simple, there is much merit to their use.
You can use views to ameliorate some denormalization issues. You have already seen the denormalized CustomerSales table in Item 2, “Eliminate redundant storage of data items,” and how it should have been modeled as four separate tables (Customers, AutomobileModels, SalesTransactions, and Employees). You’ve also seen the Assignments table with repeating groups in Item 3, “Get rid of repeating groups,” that should have been modeled as two separate tables (Drawings and Predecessors). While working to fix such problems, you could use views to represent how the data should appear.
You can create different views of CustomerSales as shown in Listing 3.1.
Listing 3.1 Views to normalize a denormalized table
CREATE VIEW vCustomers AS SELECT DISTINCT cs.CustFirstName, cs.CustLastName, cs.Address, cs.City, cs.Phone FROM CustomerSales AS cs; CREATE VIEW vAutomobileModels AS SELECT DISTINCT cs.ModelYear, cs.Model FROM CustomerSales AS cs; CREATE VIEW vEmployees AS SELECT DISTINCT cs.SalesPerson FROM CustomerSales AS cs;
As Figure 3.1 shows, vCustomers would still include two entries for Tom Frank because two different addresses were listed in the original table. However, you have a smaller set of data to work with. By sorting the data on CustFirstName and CustLastName, you should be able to see the duplicate entry, and you can correct the data in the CustomerSales table.
Figure 3.1 Data for view vCustomers
You saw in Item 3 how to use a UNION query to “normalize” a table that contains repeating groups. You can use views to do the same thing, as shown in Listing 3.2.
Listing 3.2 Views to normalize a table with repeating groups
CREATE VIEW vDrawings AS SELECT a.ID AS DrawingID, a.DrawingNumber FROM Assignments AS a; CREATE VIEW vPredecessors AS SELECT 1 AS PredecessorID, a.ID AS DrawingID, a.Predecessor_1 AS Predecessor FROM Assignments AS a WHERE a.Predecessor_1 IS NOT NULL UNION SELECT 2, a.ID, a.Predecessor_2 FROM Assignments AS a WHERE a.Predecessor_2 IS NOT NULL UNION SELECT 3, a.ID, a.Predecessor_3 FROM Assignments AS a WHERE a.Predecessor_3 IS NOT NULL UNION SELECT 4, a.ID, a.Predecessor_4 FROM Assignments AS a WHERE a.Predecessor_4 IS NOT NULL UNION SELECT 5, a.ID, a.Predecessor_5 FROM Assignments AS a WHERE a.Predecessor_5 IS NOT NULL;
One point that needs to be mentioned is that although all the views shown previously mimic what the proper table design should be, they can be used only for reporting purposes. Because of the use of SELECT DISTINCT in the views in Listing 3.1, and the use of UNION in Listing 3.2, the views are not updatable. Some vendors allow you to work around this limitation by defining triggers on views (also known as INSTEAD OF triggers) so that you can write the logic for applying modifications made via the view to the underlying base table yourself.
Some other reasons to use views include the following:
To focus on specific data: You can use views to focus on specific data and on specific tasks. The view can return all rows of a table or tables, or a WHERE clause can be included to limit the rows returned. The view can also return only a subset of the columns in one or more tables.
To simplify or clarify column names: You can use views to provide aliases on column names so that they are more meaningful.
To bring data together from different tables: You can use views to combine multiple tables into a single logical record.
To simplify data manipulation: Views can simplify how users work with data. For example, assume you have a complex query that is used for reporting purposes. Rather than make each user define the subqueries, outer joins, and aggregation to retrieve data from a group of tables, create a view. Not only does the view simplify access to the data (because the underlying query does not have to be written each time a report is being produced), but it ensures consistency by not forcing each user to create the query. You can also create inline user-defined functions that logically operate as parameterized views, or views that have parameters in WHERE clause search conditions or other parts of the query. Note that inline table-valued functions are not the same as scalar functions!
To protect sensitive data: When the table contains sensitive data, that data can be left out of the view. For instance, rather than reveal customer credit card information, you can create a view that uses a function to “munge” the credit card numbers so that users are not aware of the actual numbers. Depending on the DBMS, only the view would be made accessible to users, and the underlying tables need not be directly accessible. Views can be used to provide both column-level and row-level security. Note that a WITH CHECK OPTION clause is necessary to protect the data integrity by preventing users from performing updates or deletes that go beyond the constraints imposed by the view.
To provide backward compatibility: Should changes be required to the schemas for one or more of the tables, you can create views that are the same as the old table schemas. Applications that used to query the old tables can now use the views, so that the application does not have to be changed, especially if it is only reading data. Even applications that update data can sometimes still use a view if INSTEAD OF triggers are added to the new view to map INSERT, DELETE, and UPDATE operations on the view to the underlying tables.
To customize data: You can create views so that different users can see the same data in different ways, even when they are using the same data at the same time. For example, you can create a view that retrieves only the data for those customers of interest to a specific user based on that user’s login ID.
To provide summarizations: Views can use aggregate functions (SUM(), AVERAGE(), etc.) and present the calculated results as part of the data.
To export and import data: You can use views to export data to other applications. You can create a view that gives you only the desired data, and then use an appropriate data utility to export just that data. You can also use views for import purposes when the source data does not contain all columns in the underlying table.
Things to Remember
Use views to structure data in a way that users will find natural or intuitive.
Use views to restrict access to the data such that users can see (and sometimes modify) exactly what they need and no more. Remember to use WITH CHECK OPTION when necessary.
Use views to hide and reuse complex queries.
Use views to summarize data from various tables that can be used to generate reports.
Use views to implement and enforce naming and coding standards, especially when working with legacy database designs that need to be updated.