Refactoring has proven its value in a wide range of development projects—helping software professionals improve system designs, maintainability, extensibility, and performance. Now, for the first time, leading agile methodologist Scott Ambler and renowned consultant Pramodkumar Sadalage introduce powerful refactoring techniques specifically designed for database systems.
Ambler and Sadalage demonstrate how small changes to table structures, data, stored procedures, and triggers can significantly enhance virtually any database design—without changing semantics. You’ll learn how to evolve database schemas in step with source code—and become far more effective in projects relying on iterative, agile methodologies.
This comprehensive guide and reference helps you overcome the practical obstacles to refactoring real-world databases by covering every fundamental concept underlying database refactoring. Using start-to-finish examples, the authors walk you through refactoring simple standalone database applications as well as sophisticated multi-application scenarios. You’ll master every task involved in refactoring database schemas, and discover best practices for deploying refactorings in even the most complex production environments.
The second half of this book systematically covers five major categories of database refactorings. You’ll learn how to use refactoring to enhance database structure, data quality, and referential integrity; and how to refactor both architectures and methods. This book provides an extensive set of examples built with Oracle and Java and easily adaptable for other languages, such as C#, C++, or VB.NET, and other databases, such as DB2, SQL Server, MySQL, and Sybase.
Using this book’s techniques and examples, you can reduce waste, rework, risk, and cost—and build database systems capable of evolving smoothly, far into the future.
Download the Sample
Chapter related to this title.
About the Authors xv
Chapter 1: Evolutionary Database Development 1
Chapter 2: Database Refactoring 13
Chapter 3: The Process of Database Refactoring 29
Chapter 4: Deploying into Production 49
Chapter 5: Database Refactoring Strategies 59
Chapter 6: Structural Refactorings 69
Chapter 7: Data Quality Refactorings 151
Chapter 8: Referential Integrity Refactorings 203
Chapter 9: Architectural Refactorings 231
Chapter 10: Method Refactorings 277
Chapter 11: Transformations 295
Appendix: The UML Data Modeling Notation 315
References and Recommended Reading 327
© Copyright Pearson Education. All rights reserved.
Evolutionary, and often agile, software development methodologies, such as Extreme Programming (XP), Scrum, the Rational Unified Process (RUP), the Agile Unified Process (AUP), and Feature-Driven Development (FDD), have taken the information technology (IT) industry by storm over the past few years. For the sake of definition, an evolutionary method is one that is both iterative and incremental in nature, and an agile method is evolutionary and highly collaborative in nature. Furthermore, agile techniques such as refactoring, pair programming, Test-Driven Development (TDD), and Agile Model-Driven Development (AMDD) are also making headway into IT organizations. These methods and techniques have been developed and have evolved in a grassroots manner over the years, being honed in the software trenches, as it were, instead of formulated in ivory towers. In short, this evolutionary and agile stuff seems to work incredibly well in practice.
In the seminal book Refactoring, Martin Fowler describes a refactoring as a small change to your source code that improves its design without changing its semantics. In other words, you improve the quality of your work without breaking or adding anything. In the book, Martin discusses the idea that just as it is possible to refactor your application source code, it is also possible to refactor your database schema. However, he states that database refactoring is quite hard because of the significant levels of coupling associated with databases, and therefore he chose to leave it out of his book.
Since 1999 when Refactoring was published, the two of us have found ways to refactor database schemas. Initially, we worked separately, running into each other at conferences such as Software Development (http://www.sdexpo.com) and on mailing lists (http://www.agiledata.org/feedback.html). We discussed ideas, attended each other's conference tutorials and presentations, and quickly discovered that our ideas and techniques overlapped and were highly compatible with one another. So we joined forces to write this book, to share our experiences and techniques at evolving database schemas via refactoring.
The examples throughout the book are written in Java, Hibernate, and Oracle code. Virtually every database refactoring description includes code to modify the database schema itself, and for some of the more interesting refactorings, we show the effects they would have on Java application code. Because all databases are not created alike, we include discussions of alternative implementation strategies when important nuances exist between database products. In some instances we discuss alternative implementations of an aspect of a refactoring using Oracle-specific features such as the SE,T UNUSED or RENAME TO commands, and many of our code examples take advantage of Oracle's COMMENT ON feature. Other database products include other features that make database refactoring easier, and a good DBA will know how to take advantage of these things. Better yet, in the future database refactoring tools will do this for us. Furthermore, we have kept the Java code simple enough so that you should be able to convert it to C#, C++, or even Visual Basic with little problem at all.
Evolutionary database development is a concept whose time has come. Instead of trying to design your database schema up front early in the project, you instead build it up throughout the life of a project to reflect the changing requirements defined by your stakeholders. Like it or not, requirements change as your project progresses. Traditional approaches have denied this fundamental reality and have tried to "manage change," a euphemism for preventing change, through various means. Practitioners of modern development techniques instead choose to embrace change and follow techniques that enable them to evolve their work in step with evolving requirements. Programmers have adopted techniques such as TDD, refactoring, and AMDD and have built new development tools to make this easy. As we have done this, we have realized that we also need techniques and tools to support evolutionary database development.
Advantages to an evolutionary approach to database development include the following:
You minimize waste. An evolutionary, just-in-time (JIT) approach enables you to avoid the inevitable wastage inherent in serial techniques when requirements change. Any early investment in detailed requirements, architecture, and design artifacts is lost when a requirement is later found to be no longer needed. If you have the skills to do the work up front, clearly you must have the skills to do the same work JIT.
You avoid significant rework. As you will see in Chapter 1, "Evolutionary Database Development," you should still do some initial modeling up front to think major issues through, issues that could potentially lead to significant rework if identified late in the project; you just do not need to investigate the details early.
You always know that your system works. With an evolutionary approach, you regularly produce working software, even if it is just deployed into a demo environment, which works. When you have a new, working version of the system every week or two, you dramatically reduce your project's risk.
You always know that your database design is the highest quality possible. This is exactly what database refactoring is all about: improving your schema design a little bit at a time.
You work in a compatible manner with developers. Developers work in an evolutionary manner, and if data professionals want to be effective members of modern development teams, they also need to choose to work in an evolutionary manner.
You reduce the overall effort. By working in an evolutionary manner, you only do the work that you actually need today and no more.
There are also several disadvantages to evolutionary database development:
Cultural impediments exist. Many data professionals prefer to follow a serial approach to software development, often insisting that some form of detailed logical and physical data models be created and baselined before programming begins. Modern methodologies have abandoned this approach as being too inefficient and risky, thereby leaving many data professionals in the cold. Worse yet, many of the "thought leaders" in the data community are people who cut their teeth in the 1970s and 1980s but who missed the object revolution of the 1990s, and thereby missed gaining experience in evolutionary development. The world changed, but they did not seem to change with it. As you will learn in this book, it is not only possible for data professionals to work in an evolutionary, if not agile, manner, it is in fact a preferable way to work.
Learning curve. It takes time to learn these new techniques, and even longer if you also need to change a serial mindset into an evolutionary one.
Tool support is still evolving. When Refactoring was published in 1999, no tools supported the technique. Just a few years later, every single integrated development environment (IDE) has code-refactoring features built right in to it. At the time of this writing, there are no database refactoring tools in existence, although we do include all the code that you need to implement the refactorings by hand. Luckily, the Eclipse Data Tools Project (DTP) has indicated in their project prospectus the need to develop database-refactoring functionality in Eclipse, so it is only a matter of time before the tool vendors catch up.
Although this is not specifically a book about agile software development, the fact is that database refactoring is a primary technique for agile developers. A process is considered agile when it conforms to the four values of the Agile Alliance (http://www.agilealliance.org). The values define preferences, not alternatives, encouraging a focus on certain areas but not eliminating others. In other words, whereas you should value the concepts on the right side, you should value the things on the left side even more. For example, processes and tools are important, but individuals and interactions are more important. The four agile values are as follows:
Individuals and interactions OVER processes and tools. The most important factors that you need to consider are the people and how they work together; if you do not get that right, the best tools and processes will not be of any use.
Working software OVER comprehensive documentation. The primary goal of software development is to create working software that meets the needs of its stakeholders. Documentation still has its place; written properly, it describes how and why a system is built, and how to work with the system.
Customer collaboration OVER contract negotiation. Only your customer can tell you what they want. Unfortunately, they are not good at thisthey likely do not have the skills to exactly specify the system, nor will they get it right at first, and worse yet they will likely change their minds as time goes on. Having a contract with your customers is important, but a contract is not a substitute for effective communication. Successful IT professionals work closely with their customers, they invest the effort to discover what their customers need, and they educate their customers along the way.
Responding to change OVER following a plan. As work progresses on your system, your stakeholders' understanding of what they want changes, the business environment changes, and so does the underlying technology. Change is a reality of software development, and as a result, your project plan and overall approach must reflect your changing environment if it is to be effective.
The majority of this book, Chapters 6 through 11, consists of reference material that describes each refactoring in detail. The first five chapters describe the fundamental ideas and techniques of evolutionary database development, and in particular, database refactoring. You should read these chapters in order:
Chapter 1, "Evolutionary Database Development," overviews the fundamentals of evolutionary development and the techniques that support it. It summarizes refactoring, database refactoring, database regression testing, evolutionary data modeling via an AMDD approach, configuration management of database assets, and the need for separate developer sandboxes.
Chapter 2, "Database Refactoring," explores in detail the concepts behind database refactoring and why it can be so hard to do in practice. It also works through a database-refactoring example in both a "simple" single-application environment as well as in a complex, multi-application environment.
Chapter 3, "The Process of Database Refactoring," describes in detail the steps required to refactor your database schema in both simple and complex environments. With single-application databases, you have much greater control over your environment, and as a result need to do far less work to refactor your schema. In multi-application environments, you need to support a transition period in which your database supports both the old and new schemas in parallel, enabling the application teams to update and deploy their code into production.
Chapter 4, "Deploying into Production," describes the process behind deploying database refactorings into production. This can prove particularly challenging in a multi-application environment because the changes of several teams must be merged and tested.
Chapter 5, "Database Refactoring Strategies," summarizes some of the "best practices" that we have discovered over the years when it comes to refactoring database schemas. We also float a couple of ideas that we have been meaning to try out but have not yet been able to do so.
Each book in the Martin Fowler Signature Series has a picture of a bridge on the front cover. This tradition reflects the fact that Martin's wife is a civil engineer, who at the time the book series started worked on horizontal projects such as bridges and tunnels. This bridge is the Burlington Bay James N. Allan Skyway in Southern Ontario, which crosses the mouth of Hamilton Harbor. At this site are three bridges: the two in the picture and the Eastport Drive lift bridge, not shown. This bridge system is significant for two reasons. Most importantly it shows an incremental approach to delivery. The lift bridge originally bore the traffic through the area, as did another bridge that collapsed in 1952 after being hit by a ship. The first span of the Skyway, the portion in the front with the metal supports above the roadway, opened in 1958 to replace the lost bridge. Because the Skyway is a major thoroughfare between Toronto to the north and Niagara Falls to the south, traffic soon exceeded capacity. The second span, the one without metal supports, opened in 1985 to support the new load. Incremental delivery makes good economic sense in both civil engineering and in software development. The second reason we used this picture is that Scott was raised in Burlington Ontarioin fact, he was born in Joseph Brant hospital, which is near the northern footing of the Skyway. Scott took the cover picture with a Nikon D70S.
© Copyright Pearson Education. All rights reserved.
Download the Foreword
file related to this title.