Introduction to Viral Data in SOA
In the book of Genesis, Adam and Eve both consume fruit from the tree of the knowledge of good and evil.1 Although the type of fruit is not explicitly mentioned, for many centuries the apple has survived as a popular guess.
Potentially, the inference to an apple is sustained by the term given to the noticeable protrusion on the neck of many (human) males. The laryngeal prominence, commonly known as an Adam’s apple, is the result of thyroid cartilage surrounding the larynx. The associative anatomical term, Adam’s apple, gives further credence to the popular belief that the forbidden fruit was, in fact, an apple.
In the centuries that followed the death of Jesus Christ, the Old and New Testaments were translated into Latin from their original languages (Hebrew for the Old Testament and Greek for the New Testament). Then, in the early 1380s, Oxford Professor John Wycliffe (1330–1384) led a translation of the Old and New Testaments into English.2
Until that time, Latin had been the predominant language for these Testaments in the Christian world. In Latin, the word for evil (as in the tree of the knowledge of good and evil) is malum.3 As a Latin word, malum is a homonym and has other meanings besides evil. An alternative translation or meaning of the word malum is apple.4
Hebrew texts translated into Latin and then translated from Latin into English might be one reason how an unnamed fruit obtained an identity. Could that be true? Was the apple chosen as the result of a homonym? What is the actual provenance5 of the apple in the Adam and Eve story? How has the apple persisted6 over time?
To venture into understanding meanings in communication or to delve into linguistics is to study semantics. What one person means (to say) and another person takes away (as being said) can represent a significant challenge with both verbal and written styles of communication. The discourse of viral data is concerned not only with verbal and written communication, but the communication and linguistics involved in the sharing of data between software programs—especially, those deployed as part of a service-oriented architecture (SOA).
The title of this book relies on you, the reader, forming a connection between terms more commonly used in physiology or medicine with those found in information technology (IT). The use of analogies and other explanatory devices such as metaphors and allegories can often assist in making a point or clarifying a point—even if only to provide a generality (a gist).
IT is no stranger to the concept of analogical association; many IT terms have been adopted because of their analogical ties, including the following:
- Software engineer
- Technical architect
- Chief technology officer
- Data governance
- Service-oriented architecture
The terms engineer, architect, chief, officer, deployment, governance, service, and architecture are all borrowed from other previously existing disciplines. The business side of an organization does the same thing—terms such as business strategy, tactics, and mission statement all carry nuances from their original intent into a different paradigm. In this case, the terms strategy, tactics, and mission are military words redeployed for a seemingly more peaceful use in commerce.
Regarding analogies, law professor Cass Sunstein (b. 1954) wrote,7 “Ordinary people make sense of the world by discerning patterns routed in analogical thinking.” As we are starting to see, analogies and other similar devices can assist the art of communication by helping to correlate what is already known or understood with other new or more complex concepts.
In 1966, the cognitive psychologist Peter Wason (1924–2003) devised a logic puzzle known as a Wason test.8 The Wason test consists of four cards with a letter of the alphabet on one side and a number on the other. When placed on top of a table, the participant, initially, only gets to see one side of each card. If the visible side of each card reads L, A, 6, and 11, could you determine the minimum number of cards to turn over to prove the following rule as being either true or false: When a card shows an L, the flipside is always a 6.
Before the answer is revealed, if the letters and numbers are substituted for something else (for example, DB2®, Java™, database, and network) and then restate the question as follows: When a card shows DB2, the flipside is always a database. Deducing that the correct answer is DB2 and network is easier in this particular exercise—assuming that you are already familiar with the terms DB2, Java, database, and network.
Without some type of supporting information, a recognizable concept or term is often easier to deal with than an abstract concept or term. In both cases, the underlying problem remained the same; fundamentally, the only change that occurred was the potential familiarity or association with what was depicted on the cards and the way the question was phrased.
The answer to the first question is L and 11.
Associative linguistic devices such as analogies, metaphors, and allegories open a porthole by which an unknown or unfamiliar concept can readily become familiar by establishing a connection to something that is, hopefully, more familiar. Other forms of aids to help clarify meaning include innuendos, proverbs, and similes.
In a legal setting, innuendos can be used to provide an interpretation of words where the meaning is not too obvious. The use of proverbs and similes often provide just one side of the analogy or metaphor, requiring the reader to complete the association. The proverb, you cannot make an omelet without breaking eggs, speaks to the fact that progress often requires change of some manner—possibly irrevocably. What is or needs to change is left up to the reader to complete.
Viral data is a metaphor used to indicate that business-oriented data can exhibit qualities of a specific type of human pathogen: the virus.
In humans, the classification scheme for disease-causing organisms comprises five categories: viruses, bacteria, protozoa, fungi, and worms. Interestingly, the virus is the most simplistic of the pathogens and parasites. In our human world, viruses and bacteria are the germ-causing agents of disease. Viruses and bacteria have been responsible for the great plagues and, yes, pandemics.
The term virus was coined by a Dutch botanist Martinus Beijerinck (1851–1931).9 The word virus is also Latin, literally meaning poison or poisonous slime. Now, data by itself is inert. Data requires software (or people) for the data to appear alive (or actionable) and cause a positive, neutral, or negative effect. The inert and actionable properties of data are covered throughout this book.
In biology, all living things are cellular with the exception of the virus. Absent in the virus are any of the structures within a cell necessary to perform the life (action) activities such as eating, energy production, and growth. Like data, a virus is inert.10 A virus is an inert particle—tiny and lifeless.
If a piece of data is never used by a computer program or seen by a person, the data cannot have a negative or infectious effect. When data gets inside a computer program or appears on a screen that is scanned by a person, that data has entered an actionable world. The inert biological virus is skilled at getting into a cell—the virus’ version of an actionable world. Once the virus is inside a cell, the game of infection begins.
A virus outside of a cell is not dangerous; likewise, digital data outside of a computer program is not dangerous, and neither is printed data that is not viewed or referenced in any manner.
To be a pandemic requires more than just a widespread disease—even if the disease causes many unfortunate deaths. Cancer is a deadly widespread disease, but cancer is not a pandemic because the disease is, generally believed, not to be infectious. To be classified as a pandemic, the disease (or condition) needs to be widespread and infectious.
According to the World Health Organization, “Once a pandemic begins it will be too late to accomplish the many key activities required to minimize the impact.”11 Within the boundary of an enterprise, viral data can be pandemic when a service-oriented architecture achieves high degrees of interoperability throughout the corporate value chain while leveraging synchronous data stores.
To paraphrase science writer Bernard Dixon,12 directionally set magnetized ferromagnetic material is oh so small (the analogical 0s and 1s on a hard drive). By comparison, each Forbes Global 2000 company is absolutely huge, employing thousands of people, across numerous countries in numerous locations. The magnetized ferromagnetic material can kill a Forbes Global 2000 company.
A generation for humans is measured approximately by a 30-year interval. For some viruses, the equivalent measure is less than 5 days. Furthermore, for some bacteria, the time span from one generation to the next can be as little as 10 minutes. As for data traveling through the real-time enterprise, the generational time from one data store to the next can probably be measured in milliseconds.
Stock exchanges around the world now permit brokerages to co-locate their servers within an exchange’s data center to help reduce latencies—if only by nanoseconds. “And latency affects more than execution; it also impacts prices and the distribution of data inside an enterprise.”13
The Spanish proverb “the beginning of health is to know the disease” infers that one must stand on a strong foundation if one is to move forward with success. A necessity for gaining the upper hand on viral data is to scrap some of the data-oriented dogma that has existed since the late 1960s and rethink many of the precepts associated with data.
Master of the modern short story, Anton Chekov (1872–1937) wrote, “When you’re offered a thousand remedies you can be sure the disease is incurable.”14 Solutions for viral data in SOA may not be unilaterally preventative in all business situations, but the more one understands about the causes and what symptoms to look for, the better chance the enterprise has to control its pathogens.
Edward Tenner (b. 1943), coined the phrase revenge theory and has observed in a very Jerry Seinfeld (b. 1954) manner that automobiles move through London’s congested streets at the same pace as horse-drawn carriages from the mid-1850s and that computers have no real effect on productivity because people learn to complicate and repeat tasks that have been made easier. On decision making, Tenner has remarked:
From 1981 to the present, American businesses have spent hundreds of millions of dollars on software to help improve decision-making. Software producers are now disbursing millions of dollars on research to make their programs even better and easier to use. Spreadsheets and other decision-making aids are now ubiquitous in academia and the professions as well as in business. Thus it is all the more remarkable how little research has been devoted to the effects of computers on the quality of decision-making.15
The fact that Tenner’s reference to the present was the year 1996 makes it all the more impressive that the argument still remains valid. In part, the progress of decision making has been hampered by inconsistent, inaccessible, and nonexistent data. Folklore has it that there are three degrees of falsehood: First there is the fib, then the lie, and finally, there is statistics.
Ordinarily, knowledge workers in a corporation do not seek to record or manufacturer information that deviates from the truth. However, misinformation can systemically happen in a business—the viral data can enter through sheer carelessness, a business rule, or a host of different ways.
What is believed to be true can be true, or false, or both true and false at the same time. Again, provenance is a concept that can prove very helpful in establishing truthfulness in data—trusted information.
The modern world of sports is big business, from American collegiate football stadiums that are able to accommodate well over one hundred thousand sitting fanatics;16 to the payment of $894 million by NBC Universal to acquire the American television rights for the 2008 Olympic Games held in Beijing, China;17 and to the professional sports player, where contracts for high-caliber individuals like David Beckham (b. 1975) can extend into the hundreds of millions of dollars.18
Nearly everyone involved in sports is drawn to statistics—from the soccer mom hoping her child receives enough playing time, to workplace discussions around the water cooler that deliberate the minutest fractions of time that determined who stood on the winner’s podium, to coaches, players, and fans.
Baseball is a competitive sports game that is played professionally in a number of countries and on occasion has been a medal competition at the summer Olympic games.19 In a U.S. Major League Baseball (MLB) game held on April 28, 2008, the Baltimore Orioles beat the Chicago White Sox 4 to 3.20 The winning run, finally, came in the fourteenth inning, and the Orioles pitcher, Alberto Castillo (b. 1975), earned his first Major League victory. In all MLB games, for historical and statistical purposes, one of the pitcher’s on the winning side is given the victory.
Of particular note during that Orioles and White Sox game was that Castillo had yet to start his MLB career. In fact, Castillo and a handful of other players were not on either team’s rosters on that date. Before the game could be concluded, rain halted play, and the teams had to wait nearly four months for the game to be restarted. The game was restarted and concluded on August 25 in Baltimore, Maryland. Officially, all the statistics for the game have been attributed to April 28 in Chicago, Illinois. You can extrapolate a lesson from this: Even in business, statistics can, on occasion, distort the truth.
Without the provenance to go along with the statistics, decisioning21 on the truth might become problematic. Statistics or other derived or aggregated data become part of the semantic landscape for communicating (whether verbally, in writing, or as a message exchanged between services).
Service-oriented architectures build on the concepts of distributed computing and modular programming. Services are often grouped as interoperable packages. In a general sense, interoperability in software is a “capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units.”22 Essentially, interoperability refers to a capability of two or more programs to exchange and use exchanged data.
During each interchange, data can be further manipulated, persisted, or decisioned on like data in any traditional programming environment. Services that are orchestrated to interoperate in a real-time mode are on the one-hand extending the capabilities that IT departments can bring to the business and, on the other hand, are capable of causing unimpeded havoc.
The ability to turn disparate or stand-alone functionality into seamless-string business processes or distribute the same data from a common data store to disparate processes creates for viral data a perfect storm,23 a perfect opportunity to miscommunicate with ubiquity and simultaneity—a service-oriented pandemic reaching all corners of the enterprise.
A starting point in the search to control viral data in SOA is to begin with the semantics associated with the lingua franca. A subject matter expert from the business may know that preferred customers are offered a standard 15 percent discount, or that Federal Express delivers all plastic kumquats from suppliers located in Tennessee, or that excessive humidity can cause quality problems on the production line, without knowing exactly why these facts are true.
From an epistemological point, a subject matter expert can know that a certain something is true without fully understanding why that certain something is true in the first place. The functioning enterprise naturally allows conflicts in its vocabulary and is generally capable of putting into place exception criteria to manage undesirable behavior on a whim without needing to fully comprehend the ramifications. But then, sometimes, what we believe to be true turns out to be false.
Consider, for example, the history surrounding the creation of the Federal Reserve System. The Federal Reserve is an intricate part of monetary policy and the money system in the United States. Popular economics textbooks suggest that Congress created the Federal Reserve in 1913 as the central bank and monetary authority of the United States.24
The economic crisis of 1907 had led the U.S. Congress to study the shortcomings of the American banking system and, eventually, to establish the Federal Reserve System.25 The panic of 1907, with its more-than-usual epidemic of bank failures, was the straw that broke the camel’s back: The country was fed up once and for all with the anarchy of unstable private banking.26
Despite these descriptions, the Federal Reserve System is not federal, and there are no reserves. Further, the Federal Reserve Banks are not even banks. The Federal Reserve System is a privately owned corporation, and its ownership, in the United States, has control over all things that deal with money.
The initial plan for the Federal Reserve was drafted at a secret meeting held in November 1910 at the private resort of J. P. Morgan (1837–1913) on Jekyll Island, located off the coast of Georgia. The seven meeting attendees represented an estimated one-fourth of the world’s total financial wealth and included high-powered banking moguls, the leader of the Republican Party in the U.S. Senate, and the assistant secretary of the Treasury.
In attendance at Jekyll Island were Nelson Aldrich (1841–1915), Abram Piatt Andrew, Jr. (1873–1936), Frank Vanderlip (1864–1937), Henry Davison (1867–1922), Charles Norton (1870–1923), Benjamin Strong (1872–1928), and Paul Warburg (1868–1932).
At Jekyll Island, the meeting attendees conspired to:27
- Stop the growing influence of small, rival banks, and to ensure that control over the nation’s financial resources remained in the hands of those present
- Make the money supply more elastic (available) to reverse the trend of private capital formation and to recapture the industrial loan market
- Pool the meager reserves of the nation’s banks into one large reserve so that all banks would be motivated to follow the same loan-to-deposit ratios (to protect at least some of the banks from currency drains and bank runs)
- Should this cartelization approach lead ultimately to collapse of the whole banking system, shift the losses from the owners of the banks to the taxpayers
Finally, in December 1913, President Woodrow Wilson (1856–1924) signed into law the Federal Reserve Act. The act that came into law did not include everything that the gang of seven had hoped, but in due course, with the inclusion of more than a hundred amendments, the cartel, known as the Federal Reserve System, achieved all of its initial objectives.
In business, some truths, for one reason or another, are hidden. Other truths are concealed because of a lack of information or because of the way the information is presented. For example, the Jekyll Island story, with its factual information about the formation of the Federal Reserve, is not as well known as some fictional (perhaps even conspiratorial) tales about its formation.
Implicit information hiding is why some people suggest that Microsoft® PowerPoint® has destroyed our ability to adequately communicate. For example, Edward Tufte (b. 1942), a professor at Yale University, believes that PowerPoint played a significant role in the Columbia space shuttle tragedy in 2003 because a Boeing representative did not, during a PowerPoint presentation to NASA,28 fully convey the risks associated with the shuttle’s reentry.
From school, to government, to work, speaking in abbreviated sound bites—bullet points—has become a natural way for us to communicate both simple and complex issues. At times, our speech, and not just our writing, becomes all too terse, as well. Over time, news about the Iraq war has been whittled down to small phrases: It is a civil war, it is a quagmire, America cannot be victorious, America needs to leave, America is winning, and America is fighting terrorism over there.
After repetitive use of what can be described as PowerPoint bullet points, some individuals discuss and debate the war by repeating bullet points and without exploring the topics further. Unexplored bullet points can become ingrained thoughts, beliefs, and accepted facts.
In a story about the Iraqi war, General Tommy Franks (b. 1945) remarked that, “It’s quite frustrating the way this works, but the way we do things nowadays is combatant commanders brief their products in PowerPoint up in Washington to the Office of the Secretary of Defense (OSD) and the Secretary of Defense... In lieu of an order, or a fragmentary order, or plan, you get a set of PowerPoint slides...That is frustrating, because nobody wants to plan against PowerPoint slides.”29
Relying on PowerPoint slides rather than formal written orders has been thought by some military professionals to capture the amateurish approach to war planning. Franks has also been quoted as saying, “Here may be the clearest manifestation of OSD’s contempt for the accumulated wisdom of the military profession and of the assumption among forward thinkers that technology—above all information technology—has rendered obsolete the conventions traditionally governing the preparation and conduct of war.”
Furthermore, Colonel Andrew Bacevich (b. 1947) has quipped, “To imagine that PowerPoint slides can substitute for such means is really the height of recklessness.” As an analogy, imagine an automobile mechanic who uses a manufacturer’s glossy sales brochure to figure out how to repair an engine.
Consider the trend in business intelligence that leverages knowledge through a series of dashboards. One wonders whether the brevity and semantic stumbling blocks of PowerPoint will be repeated in the use of management dashboards.
To quote physicist Richard Feynman (1918–1988), “You can know the name of a bird in all the languages of the world, but when you’re finished, you’ll know absolutely nothing whatever about the bird... I learned very early the difference between knowing the name of something and knowing something.”
What does this mean? What does that mean? Trying to determine what information is important and what information possibly represents excessive noise30 exemplifies a culture facing information overload. “Everyday living is too fast, too busy, too complicated. More than at any time in history, it’s important to have good information on just about every aspect of life, and there’s more information available than ever. Too much, in fact. There’s simply no time for people to gather and absorb the information they need.”31
The quote about everyday living too fast is not recent. That quote is attributed to Briton Hadden (1898–1929). Hadden’s remark about excessive information was made around 1922, a year before he started Time magazine with Henry Luce (1898–1967).
Semantically, we can pose a simple question: What is a book? On the surface, this benign question may seem like it has a simple answer. However, some people might limit their definition of a book to editions that are presented in hardcover or paperback. Others may argue that even the story itself, is the book. In turn, are abridged books real books? What about books found on other media such as audiobooks?
Perhaps we can define a book as something that has an assigned International Standard Book Number (ISBN). Consider the popular 2003 novel The Da Vinci Code written by Dan Brown (b. 1964). Brown’s book, like the majority of books sold commercially was given an ISBN, but not just one. The following is a small sampling of the ISBNs assigned to Brown’s The Da Vinci Code:
- ISBN-10: 0375432302
- ISBN-10: 0385504209
- ISBN-10: 0385504217
- ISBN-10: 0385513224
- ISBN-10: 0385513224
- ISBN-10: 0385513755
- ISBN-10: 0552154016
- ISBN-10: 1400079179
- ISBN-13: 978-0375432309
- ISBN-13: 978-0385504201
- ISBN-13: 978-0385504218
- ISBN-13: 978-0385513227
- ISBN-13: 978-0385513227
- ISBN-13: 978-0385513753
- ISBN-13: 978-0552154017
- ISBN-13: 978-1400079179
What royalties are due to Mr. Brown? Answering this question can pose a difficult semantic question depending on how the question is asked, how and where the information is stored, who has access to what information, and what, if any, assumptions were implied in the way the question was initially phrased.
If establishing meaning can be difficult even when the information present is correct, the latency involved in uncovering incorrect information can be outright painful. The town of Valparaiso, Indiana, lies about 40 miles southeast of Chicago, Illinois. The town has a population approaching 30,000, and prior to 2006 had a township budget of $21 million.
In 2004, Dennis Charnetzky (b. 1974) and Daelyn Charnetzky (b. 1974) started a series of repairs to their two-bedroom home located in Valparaiso. The Charnetzky’s renovated their bathroom, refinished their hardwood floors, added a splash of paint, and put up some new wallpaper. These improvements caused the property value shoot up nearly $400 million.32
Because the property value increased, the county’s computers automatically increased the Charnetzky’s property tax liability. With an increase in the tax base, municipal budgets were raised accordingly. By the time the typo that was entered into the county computer system was discovered, the Valparaiso school district and government agencies faced a financial shortfall and were forced to cut budgets by $3.1 million.
Another type of semantic dilema involves the ISO 4217 standard that is used to govern currency types and symbols. Over time, the ISO standard has transitioned beyond its intended scope. Examples of an ISO currency symbol include USD for United States dollars, BRL for the Brazilian real, and DKK for the Danish krone.
However, the symbol XXX is part of the standard, but does not represent the currency of any nation. Code XTS is reserved for testing purposes, while XAU and XAG are codes for precious metals. CLF, USN, USS, and a number of other codes are for bonds or other fund types. At the semantic level, ISO 4217 covers more than currencies.
Political boundaries provide an interesting dynamic in semantics. For example, many companies do business in England. However, England is not a sovereign country. The sovereign country of which England is a part is the United Kingdom of Great Britain and Northern Ireland. However, England as well as Scotland, Wales, and Northern Ireland are regarded as countries when participating in sporting events such as soccer, rugby, and cricket. This is not the case in all sporting events. For example, for the Olympics, the United Kingdom enters a team under the moniker GB (Great Britain).
The latitude afforded the United Kingdom would be equivalent to Germany entering teams from East Germany, West Germany, and Bavaria, or the United States of America entering teams from the Union, Confederacy, and Texas.
By way of a final example regarding political boundaries, since 1997, Hong Kong has been part of China. When analyzing trade with China, however, one has to know whether or not Hong Kong is included. Hong Kong operates with a degree of autonomy, at least until 2047 (50 years after the official transfer back to China from United Kingdom).
The viral affects of data are without bounds and can, under certain conditions, reach a pandemic state. For example, corporations can be affected by laws in other countries as well as the actions of other corporations—regardless of geographic distance. Terra Gruppen is an umbrella company located in Norway and consists of almost 80 local savings banks.
Four of the communities severed by Terra Gruppen are located in the Arctic Circle—Rana, Hemnes, Hattfjelldal, and Narvik. Each community faced bankruptcy as a ripple-effect of the subprime fiasco that started in the United States.33 One of Terra Gruppen’s company’s knowingly violated a code of conduct put in place by Norwegian financial regulators resulting in many lives being seriously and negatively impacted.