Data Model: World Cup 2018 by Viji Kumar - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

Chapter 1 - Explanatory Framework

The aim of this book is to explain, using a data model, how FIFA World Cup tournament data can be used to produce a set of searchable records to document the outcomes of the games played. This will be done by the linking of individual items of information (data) about an entity, e.g. the names of teams, with other items of information e.g. number of games played, goals scored etc. The data of interest are that which enable the deduction of the result of each game. Here the word database will refer to any collection of linked data in any structured format e.g. in rows and columns. To explain how databases may be constructed in accordance with the data model, I shall work through the creation and population of a database that can be used to document the FIFA World Cup 2018 tournament to be held in Russia. The intention is to use a method that is simple, rigorous and repeatable. The techniques employed and the models discussed are independent of any proprietary database management system. The models can also be extended.

The initial analysis should produce data models and schemas or in the demotic, the metadata. The outcome of that analysis is the specification of a catalogue of entities (entities that will be linked) about which data will have to be collected. These entity types are defined as distinct constructs with a separate existence, e.g. group games, final game, group goals etc. Entities can also refer to separate, definable and distinct constructs that have no material existence e.g. competitions, tournaments, groups etc. In this model all abstract entities have links, either directly or indirectly, to material entities e.g. all groups during a group phase of a tournament are associated with group games.

The first step, listing the different types of entities necessary to describe this model of a football tournament, is uncontroversial. The entity types required are competition, tournament, team, venue, group phase, single-leg (knock-out) phase, group, 3 types of games and 5 types of goals including penalty shoot-outs. Essentially you count the number of goals scored in each game by each team and deduce the result. In any type of game, the two participating teams are labelled team one and team two. Round robin contests, e.g. groups, are decided by the accumulation of points e.g. 3 points for a victory and a point for a draw. To effectively report on a group, a table summarising the results and ranking the teams based on the points total is required. This explanation will follow all the stages from creating the data model representing the relationships between the different entity types listed above to providing sample scoring data to test the model.

Carolus (Father of Taxonomy) Linnaeus’s words below are an excellent starting point.

The first step in wisdom is to know the things themselves; this notion consists in having a true idea of the objects; objects are distinguished and known by classifying them methodically and giving them appropriate names. Therefore, classification and name-giving will be the foundation of our science.

Systema Naturae (1735), trans. M. S. J. Engel-Ledeboer and H. Engel (1964)

Image

A tournament, as defined here, starts with four or more teams and then whittles the teams down with each successive phase, e.g. the 2014 FIFA World Cup in Brazil started with a group phase (32 teams), followed by 3 single-leg knock-out phases (16, 8 and 4 teams) ending with the final and third place games. Tournament prizes are decided by the final and optionally, the third-place game and/or the plate final, collectively known here as apex games. (A plate tournament is a parallel tournament for teams eliminated at an early phase of a tournament.) Apex games are different from other types of games because there can only be one final, third place decider or plate final for a tournament.

The relationship or the link between any two entities is represented by a line starting at entity type 1 and ending with a nought and a crow’s foot at entity type 2. This type of link specifies that there may be none, one or many type 2 entities associated with a single type 1 entity in a specific role and that the type 2 entities must be linked to one and only one type 1 entity in that role. An alternate depiction of the link (MS Access) is also provided below showing additionally the mechanism that enforces the relationship.

Image

Image

The tournament model used here has 19 entity types, the 15 listed above plus 3 documenting the qualification paths to subsequent phases and 1 documenting the teams in a group. While this exercise covers a tournament, competitions may also be decided by league seasons and head to head encounters. Head to head competitions include Test series e.g. the Ashes and competitions where the two participants qualify by winning other competitions e.g. the English Community Shield and the Spanish Super Cup. This extensibility for the model is required because there are teams that play league competitions and knock out tournaments or head to head competitions during the same season. Some of the teams that play in the Football Association Challenge Cup (FA Cup) competition simultaneously play in the English Premier League competition. So, it may be necessary to collect data from different types of competitions to report comprehensively on a team's performance over any given period. Furthermore, a competition may have both, a league season followed by a tournament to decide the prizes for the season. The Football League Championship has had in recent years, a (play-off) tournament in addition to the league season to decide the 3 teams that are promoted to the Premier League. In England, the top tier of professional Rugby League also employs play-offs.

Image

The data model and the sample reports should enable a non-database specialist to understand what data are being collected and how that data will be managed. A portion of the data model showing some of the relationships of the apex game entity type is also specified as a schema using a markup language, XML. The XML schema can be safely ignored if the data model makes sense. The XML schema is additionally provided in the same helpful spirit that the Rosetta Stone provides Ptolemy V's decree in 3 different scripts (Ancient Egyptian hieroglyphs, Demotic and Ancient Greek). That thoughtful act subsequently helped Thomas Young decipher Egyptian hieroglyphs (thank you Wikipedia for this and many other snippets in this document).

There is a large amount of historical data available to test models purporting to explain how competitions work. The raw data necessary to test the model can be found in existing databases or in documents such as newspapers and books. Constructing and populating databases in accordance with the model in this book should produce databases amenable to being queried to obtain the results of games, enable the deduction of the qualifiers from each phase and the eventual winning team. The detailed model will define each type of entity by listing attributes that are of interest e.g. the name of a competition or team, the date, time and venue of a game et al.

The definitions of the organisational units, e.g. league seasons, tournaments, group phases, groups, single-leg phases, will be discussed, highlighting their common ancestry. The derivation of the different variants of the other types of entities, e.g. the different types of games, goals and qualification paths will also be explained. For every goal scored (excluding penalty shoot-out goals), the type of goal (open play, penalty or own goal) is noted in addition to the number of elapsed minutes (rounded up) from the start of the game. Each goal is treated as a separate entity to allow extensions to record the scorers if required. For the sake of simplicity, the venerable convention of recording all stoppage/injury time goals as being scored in the last minute of the relevant half is observed, enabling half time, full time and extra time scores to be generated separately. The disadvantage of observing this convention is that the duration of injury time and the accurate timings of injury time goals are not recorded. Other data not discussed here include the starting line-up for games, disciplinary events, goal scorers, missed penalties and substitutions but the model can be extended. That sums up the remit of this tournament model.

Image

In keeping with the imperative expressed by Occam’s razor, a principle advocating parsimony, an attempt has been made to keep to a minimum the number of entity types required for the construction of the models.