Term Goals Spring 2016 - Research Support Group
Master ST#104569
Hardware Acquisition, Deployment, and Lifecycle Management
Objective: timely deployment of desktop/laptop equipment to incoming graduate students
- S16 (INF/RSG) acquire and deploy equipment for incoming F16 students by 31 August, for equipment choices/locations identified by the grad office before 10 August - ST#106458
- order in July based on grad office estimates and historical data
Machine Rooms
Machine rooms for School and research equipment.
- Objective: redundant machine rooms to host CS on-line services
- S16 (INF): improved bandwidth between CS machine rooms to support replicated services - ST#
File storage
- Implement Distributed File Service - Lawrence/Lori w/Dave/Anthony (ST#98307)
Database services
- Review the use of PostGRES vs MySQL for ongoing development projects
- not a priority. Maybe decide which to favour for future development?
- look at other applications that are not on MySQL cluster and decide if they need to move
- understand Odyssey/PostGRES infrastructure - have Isaac provide overview to Ken/cscf-mgmt/Daniel
Machine Room Interconnect
- Overall: 88134
- Finish group consultations (send email to all groups)- Lawrence / RSG staff
- move all research and client networks behind the firewall
CSCF Special Projects
Exam Management System - Isaac
- have a designated backup
- software/development/database - Daniel
- operation/PoC - Nick
- have Nick be the primary Point of Contact for EMS work for ISG - ST#104199
- Isaac to work with other groups on campus to find operational Points of Contact
Grad TA Evaluations - Isaac
Inventory - Daniel
- Inventory web application deployed 2009 for CSCF - ST#76671 - Daniel
- Project to review desired features/changes and decide on priority, eg:
- Desirable feature-add: plan to integrate ONA (likely won't happne this term)
- Explore Machine Room mapping / inventory (RackTables / MachineRoomMap) - will need input from Dave, Dan and MFCF
- need concept of "reserved" space for future planning of space (rack, room, etc.)
- in progress by Evan/Devon - ST#95545
- See also #Nagios below
- SALT-stack integration
- new equipment types
- "parent" records
OAT - Isaac
- Overhaul data import to use new data source and provide admissions data - ST#92317
- still waiting on IST Enterprise Architecture to provide all of the required data
OGSAS - Isaac
- Overhaul to use new datasource and deal with requests from Associate Director Grad Studies (Urs) - ST#?
- blocked on waiting for the OAT data (see above)
Research Subscription System - Daniel
- by end of W16: have decision about the future of ST / job-tracking in CSCF - ST#103960
- met with MFCF, received RT requirements spreadsheet
- updated spreadsheet to V2
CSCF internal services
Machine Rooms
DC 3556 (Research Machine Room)
DC 3558 (CS Infrastructure Room)
- Report when server hits predefined temperature limit - Gordon
- 97155
- Note: confirm with PLG that they're ok with them paying for it (confirmed - see ST#97155, original email from Peter asking about these sensors)
Research computing
Graduate student workstations
- Streamline the AD join process - Mike / Clayton
- 101333
- Milestone: revised script that works with current image for Winter 2016 post-install steps
- Future: (Winter 2016) Update for Spring/Fall 2016 image
- Mike says that he has done some work in improving the AD script. Need to document in above ST or note related ST
- ensure that these scripts are in a system directory (not under ~ctucker or ~magore)
generic cluster ("paper")
- Rack hardware we have in DC 3558 A1 - Lawrence/Lori/Mike/CSCF Coop
- 99112
- Rack existing hardware into A1, start building cluster - Lori
- Report time spent on this project
- Rack A1 has been emptied and we're now adding hardware to it
- snowballs and squall can be taken - coordinate with Ronaldo
Ganglia Portal - Wishlist - ST#98196
- VM to be created
- build initial ganglia system - Lori
Research Storage capacity
- catalogue existing storage systems, capacity and current use "swing space" - Lori/RSG
- part of the 10Gb initiative (see above)
Visitor loaner equipment / Researcher-owned machines
- Workstation imaging: Streamline via Clonezilla - Click'N'Go - Mike
- Milestone: coordinate with Phil (and Dave and Lawrence) to ensure this process is available to CSI
- Spring 2016 bills by end of May - ST#105574
- need to determine how to handle faculty refund based on labour refund ($46k)
CSCF Retreat
- Planning lead - Lawrence - ST#103743
- follow-up session May/June 2016 - discuss "meetings"
Research Groups
AI - Mike
BIF - Mike
- rebuild m160 File storage from 40TB and 60TB xfs to 100TB zfs - W16 - Mike / Lori - ST#104138
Boutaba - Ronaldo
- update Documentation of CN cluster - ST#93800
- remove reference to old servers/setup and replace with the new
- decommission the insurance-replaced servers - Lawrence/Ronaldo
Brecht - Lori
- Migrate rocket to new server - Ronaldo / Lori
Cabernet - Mike/Lori
- Upgrade cluster OS to 14.04 (priority)
- Upgrade GPUs? Lori/Mike
- Goals for W16:
- have Justin decide whether he will get updated hardware
- upgrade the OS to 14.04
- Status:
- GPU and OS updated on Node 16
- query sent to Justin
- cabernet has a bad drive and a bad PSU - is it worth spending time/money? (Lori)
Daytona - Lori
DB/DSG - Gordon
Games Institute - Lori
HCI - Ronaldo
- migrate hci-web to a newer machine (formerly snap-host) - ST#95233
- purchase and install new equipment - ST#102594
HI - Gordon
Himrod - Lori
ISS4E (Keshav)- Ronaldo
- Documentation - Ronaldo
- update to reference new systems and remove old (including MachineNotes)
- re-organize file space / backup of NAS, repurpose the backup server (tsunami&flood)
NPSG - Gordon
PLG - Gordon
Ripple - Lori
SWAG - Mike
Watform - Ronaldo
- migrate repository to new server - ST#90710
- Documentation - update to remove reference to old servers - ST#78634
ideas for RSG for Spring Term
- rebuild m160 File storage from 40TB and 60TB xfs to 100TB zfs - W16 - Mike / Lori - ST#104138
- Service Catalogue - Daniel / Lawrence - ST#104292
- CSCF Client survey - Lawrence/Omar/Dave
- From the retreat
- Create an index of our current ST items based on ST categories and keywords
- Develop / execute survey of faculty, staff, students about what services they would like
- document inventory testing
- analyse web logs for client searches/requests
- link index to documentation
- other possible service offerings
- direction of development tools / DBS
- Accounts management
- Isaac
- roll out Grad TA evaluations
- CrowdMark / Learn integration
- plans to be functional by the end of this term
- just needs to be pushed to CrowdMark for IMS-created exams, rather than created directly out of Learn
- CrowdMark just needs to read the data that Isaac is pushing out and then it can feed back into Learn
- Daniel
- Service Catalogue - Daniel / Lawrence - ST#104292 - investigative? * would be great to be ready at the end of Spring term to be ready to start implementing the catalogue for the Fall (or Winter) term * important - this implies getting a common agreement on what is our list of "services"
- document inventory testing - well underway, commit to having it well documented for Spring could be a good thing
- what does "success" look like?
- automated testing plan that will test each release, and will be in-sync with the manual testing plan, and how to add new tests
- discussion of development/DB direction for future - Isaac/Daniel/Ken
- ST -> RT - start developing a plan for what needs to be developed - how would we work with IST on co-development?
- migrating CS web site - coordination with IST and MFCF * uwaterloo.ca/computer-science vs cs.uwaterloo.ca reverse-proxy (?) * see ST# ?? for the costs outlined by Daniel of the implications of various options, and who runs what servers 1) cs.uwaterloo.ca/whatever -> cs.uwaterloo.ca/whaterver (on UW drupal server) - good 2) uwaterloo.ca/computer-science/whatever -> uwaterloo.ca/computer-science/whatever - bad 3) uwaterloo.ca/search -> [search result on cs.uwaterloo.ca] -> cs.uwaterloo.ca/whatever (on UW drupal server) - good
- accounts management - better UI tools (CSCF/MFCF/IST?) - investigation/planning only for Spring
- thinking about AssetMan - implications for our future ...
Other ideas for Spring 2016 - not necessarily RSG
- Course Master's lab - move/refresh - ST#104443
- linux.cs - two servers @ 14.04
- migrate CS web site
- migrate to IST absence management - ST#104369
- create a 3-year plan for all of CSCF services
- Unified job description - managers