Inventory Domain Name Extension Proposal (draft)
Introduction
IST is changing the schema of their IP Management system (currently "Maintain"). Historically, campus hostname and SOA domain information was stored as "dotted hostname" in "uwaterloo.ca" domain name. For example, "myhost.cs.uwaterloo.ca" would be stored as "myhost.cs" in the domain "uwaterloo.ca".
Elsewhere in the world, and in the DNS RFCs, the dot is a separator between hostname,
subdomain, domain, etc. "myhost.cs.uwaterloo.ca would be stored as hostname "myhost", domain "cs.uwaterloo.ca".
IST is moving to the traditional DNS format for campus-wide tools. So the question arises whether or not we want or need our inventory schema to match the IST schema.
Functional requirement
Ultimately, we need our inventory system to be able to provide hostname and domain information for its host object records. In particular, we would like it to be able to produce information about the hostname and domain that correspond to what will be stored in the IST system.
- it is a stated goal to be able to update the IST database with changes in Inventory (ie to treat our inventory system as authoritative). The goal is not to replicate all Maintain functionality within Inventory; our users will still need to do some DNS operations in Maintain.
- our DHCP server (via our post-processing program) also requires hostname and domain information.
Two discussed alternatives
We have discussed the following basic alternatives for deriving hostname and domain for a host:
- store the information explicitly
- compute the information from existing inventory fields
Further characterizing the options
Inventory can usefully be seen as having subsystems:
- web UI
- database/programmatic UI
- output to external systems: dipaas, ST, Maintain, [any others?]
Each subsystem can be seen to have input and an output, and they are
logically connected: output to web UI is database; output to database
is external systems.
Changes concerning hosts and domains present the following design options:
Subsystem will:
- ACCEPT status-quo format "host.domain"
- ACCEPT separate host and domain
- ACCEPT both
Subsystem will:
- OFFER status-quo format "host.domain"
- OFFER separate host and domain
- OFFER both
Each subsystem suggests (3*3) design choices per
subsystem, or at least 9*9*9 total.
It seems an obvious good idea to reduce
these choices to a smaller set. A reasonable design goal is a
solution that requires the lowest cost of production and development time,
minimal amount of confusion for users, and greatest number of useful features.
The proposals we have discussed so far look like the choices:
Option A, "store the information explicitly":
- Web UI: ACCEPT both, OFFER separate
- DB: ACCEPT both, OFFER separate
- External: ACCEPT separate, OFFER both.
Option B, "compute the information from existing inventory fields":
- Web UI: ACCEPT status-quo, OFFER status-quo
- DB: ACCEPT status-quo, OFFER status-quo
- External: ACCEPT status-quo, OFFER both.
Store the information explicitly (option A)
With this technique, the existing "hostname" field would be replaced with a pair of fields, name and domain. Name would be a text field, constrained to conform to the legal syntax of a DNS "label" (
RFC1123). The domain field would be an enumeration of the set of known domains.
Additional changes to the inventory database schema might be required, in particular to the uniqueness constraint that presently exists for the "hostname" field -- the name will no longer be unique. Instead, the concatenation of the name and domain will need to be constrained uniquely. Defining a "FQDN" view field in the database as the concatenation of the two base fields might be a trivial way to implement this.
This method would require changes to the inventory web application in the following areas:
- data entry of the new name field
- data entry of the domain -- likely a drop-down/selection mechanism would be appropriate
- the search functionality for names would need to be expanded. We would want to be able to continue the current practice of searching for "myhost.cs" (for example), which would require matching against the concatenation of the name and domain (eg using the proposed "FQDN" view field).
Providing the domain for a host is trivial with this method: it is a selectable database field.
Explicitly storing the domain separately from the hostname would require a data migration for the existing data. It it proposed to do this by creating the domain field and populating it with "uwaterloo.ca" for all relevant records in the inventory database.
Compute the information (option B)
With this technique, the existing hostname field will be unchanged and no other schema changes are proposed. Instead, an API or similar wrapper within the inventory system will be created that knows how to synthesize the name and domain for a given hostname.
The basic heuristic for extracting the relevant information is straightforward:
- compose a FQDN by concatenating the hostname field with the constant string ".uwaterloo.ca"
- parse the resulting string, taking everything to the left of the first "." character as the name and everything following the first "." as the domain.
- the correctness of the result is verified by making a DNS query to confirm that the purported domain-name is in fact a legitimate SOA.
No data-entry changes are required in this method, and no changes to the search semantics would be required.
Discussions
Features
- A useful feature would be supporting domains that aren't in "uwaterloo.ca" such as iqc.ca, ingimp.org, inkpotinc.ca. This is more straightforward with separate host/domain names at the web UI layer and at the external layer.
Technical tradeoffs
- the database layer can trivially store one column "host.domain", two column "host" "domain", or both. Conversion is a simple operation; development costs for the database layer are minimal.
- space cost (DB size) is not important
- computation time is not important, but the DNS lookup could be significant for applications such as the Maintain exporter, if we bulk-update numerous domains at once.
- It is fairly clear we need to offer both formats to external programs, because our DHCP hosts will need to operate in the new IST DNS system.
- dipaas will require data in the separate format.
- Maintain will accept data in either format, currently; though IST has a timeline for requiring DNS data in separate format produced by option A (see ST#72405)
- Process of updating Maintain via inventory, under option B, will require the same wrapper as for dipaas.
User-community impact
- Changing the web UI requires user education; the UI can make reasonable guesses (eg., for user-supplied hostnames including dots, in data-entry and in searches) and inform the user of the change. This does not seem challenging.
- Leaving the web UI unchanged does not require immediate user education. However, shortly there will be a difference between Inventory and Maintain requirements, requiring education about the Maintain change.
Data integrity considerations
- Currently, data is duplicated manually between inventory and maintain, neither is authoritative.
- The project to link Inventory/Maintain makes it easier to suppose that our inventory is authoritative, as the maintain data is (mostly) a subset (excluding such aspects as sub-records, which will continue to be edited at Maintain).
- We will need to offer Maintain the separated data.
Development and code maintenance costs
UI change to accept both requires development costs, but these appear to be minor:
- _Estimated development cost: 3 days work_
- accepting separate host/domain would require: adding a dropdown menu and adding validation code. It needs an interface to update list of known domains (which could be handled inexpensively via manual phpMyAdmin updates).
- supplying reasonable guesses for users including dots in hostnames is no more than a few if/thens with regular expressions;
- searches for separate host/domain would involve adding a regular expression and a term to the search SQL.
- the bulk tools to add and update hosts would need some recoding.
External output change to offer both:
- as long as the data is stored in the database, converting combined hostnames to host and domain names would be minor extra work; converting split to combined would be essentially no work.
- _Estimated development cost: 1 hour work_
- if the database remains in status-quo format, providing split output involves doing a
dig
lookup and various data-tests before providing data.
- _Estimated development cost: 1 day work_
Database changes:
- Any of the changes (separate, or both) involve development cost to update existing records.
- _Estimated development cost: 1/2 day work_
- A change to use mysql stored procedures and/or triggers, which would solve the problem of verifying data integrity, is not available unless we switched to mysql 5.0.
Conclusion
- we cannot change the semantics of the externally visible schema to the database, if there are clients with write-access we don't know about (who cannot be relied upon to do checking).
- we can add two new fields to the database, which are specified NOT NULL, so clients with write-access will generate errors rather than add incorrect data.
- Retaining the status quo or adding functionality is partially a policy decision
- Changes to allow accepting and outputting both types of output is technically feasible with minor cost.
- The cost to produce verified domain-names for external programs is high if we do not store the domain name explicitly. (Real time verification during the query increases the runtime of the query).
- We can store verified domain names on web UI entry; the risk of incorrect data (inserted through other means) is low.
- The optimal solution seems to maximize function by making low-cost changes to the web UI and external interface.