Term Goals Spring 2016 - Research Support Group

Master ST#104569

Hardware Acquisition, Deployment, and Lifecycle Management

Objective: timely deployment of desktop/laptop equipment to incoming graduate students

  • S16 (INF/RSG) acquire and deploy equipment for incoming F16 students by 31 August, for equipment choices/locations identified by the grad office before 10 August - ST#106458
    • order in July based on grad office estimates and historical data

Machine Rooms

Machine rooms for School and research equipment.
  • Objective: redundant machine rooms to host CS on-line services
    • S16 (INF): improved bandwidth between CS machine rooms to support replicated services - ST#

File storage

  • Implement Distributed File Service - Lawrence/Lori w/Dave/Anthony (ST#98307)

Database services

  • Review the use of PostGRES vs MySQL for ongoing development projects
    • not a priority. Maybe decide which to favour for future development?
  • look at other applications that are not on MySQL cluster and decide if they need to move
  • understand Odyssey/PostGRES infrastructure - have Isaac provide overview to Ken/cscf-mgmt/Daniel

Networking

Machine Room Interconnect

  • 40 Gb networking

Firewall

    • Overall: 88134
      • Finish group consultations (send email to all groups)- Lawrence / RSG staff
        • move all research and client networks behind the firewall

CSCF Special Projects

Exam Management System - Isaac

  • have a designated backup
    • software/development/database - Daniel
    • operation/PoC - Nick
      • have Nick be the primary Point of Contact for EMS work for ISG - ST#104199
      • Isaac to work with other groups on campus to find operational Points of Contact

Grad TA Evaluations - Isaac

Inventory - Daniel

  • Inventory web application deployed 2009 for CSCF - ST#76671 - Daniel
    • Project to review desired features/changes and decide on priority, eg:
      • Desirable feature-add: plan to integrate ONA (likely won't happne this term)
      • Explore Machine Room mapping / inventory (RackTables / MachineRoomMap) - will need input from Dave, Dan and MFCF
        • need concept of "reserved" space for future planning of space (rack, room, etc.)
        • in progress by Evan/Devon - ST#95545
      • See also #Nagios below
      • SALT-stack integration
      • new equipment types
      • "parent" records

OAT - Isaac

  • Overhaul data import to use new data source and provide admissions data - ST#92317
    • still waiting on IST Enterprise Architecture to provide all of the required data

OGSAS - Isaac

  • Overhaul to use new datasource and deal with requests from Associate Director Grad Studies (Urs) - ST#?
    • blocked on waiting for the OAT data (see above)

Research Subscription System - Daniel

ST

  • by end of W16: have decision about the future of ST / job-tracking in CSCF - ST#103960
    • met with MFCF, received RT requirements spreadsheet
    • updated spreadsheet to V2

CSCF internal services

Machine Rooms

DC 3556 (Research Machine Room)

DC 3558 (CS Infrastructure Room)

Monitoring

  • Report when server hits predefined temperature limit - Gordon
    • 97155
    • Note: confirm with PLG that they're ok with them paying for it (confirmed - see ST#97155, original email from Peter asking about these sensors)

Research computing

Graduate student workstations

  • Streamline the AD join process - Mike / Clayton
    • 101333
    • Milestone: revised script that works with current image for Winter 2016 post-install steps
      • Future: (Winter 2016) Update for Spring/Fall 2016 image
      • Mike says that he has done some work in improving the AD script. Need to document in above ST or note related ST
      • ensure that these scripts are in a system directory (not under ~ctucker or ~magore)

HPC

generic cluster ("paper")

  • Rack hardware we have in DC 3558 A1 - Lawrence/Lori/Mike/CSCF Coop
    • 99112
    • Rack existing hardware into A1, start building cluster - Lori
    • Report time spent on this project
      • Rack A1 has been emptied and we're now adding hardware to it
      • snowballs and squall can be taken - coordinate with Ronaldo

Ganglia Portal - Wishlist - ST#98196

  • VM to be created
  • build initial ganglia system - Lori

Research Storage capacity

  • catalogue existing storage systems, capacity and current use "swing space" - Lori/RSG
    • part of the 10Gb initiative (see above)

Visitor loaner equipment / Researcher-owned machines

  • Workstation imaging: Streamline via Clonezilla - Click'N'Go - Mike
  • Milestone: coordinate with Phil (and Dave and Lawrence) to ensure this process is available to CSI

Administration

Billing

  • Spring 2016 bills by end of May - ST#105574
    • need to determine how to handle faculty refund based on labour refund ($46k)

CSCF Retreat

  • Planning lead - Lawrence - ST#103743
    • follow-up session May/June 2016 - discuss "meetings"

Research Groups

AI - Mike

BIF - Mike

  • rebuild m160 File storage from 40TB and 60TB xfs to 100TB zfs - W16 - Mike / Lori - ST#104138

Boutaba - Ronaldo

  • update Documentation of CN cluster - ST#93800
    • remove reference to old servers/setup and replace with the new
  • decommission the insurance-replaced servers - Lawrence/Ronaldo

Brecht - Lori

  • Migrate rocket to new server - Ronaldo / Lori
    • Lori - ST#101572

Cabernet - Mike/Lori

  • Upgrade cluster OS to 14.04 (priority)
  • Upgrade GPUs? Lori/Mike
  • Goals for W16:
    • have Justin decide whether he will get updated hardware
    • upgrade the OS to 14.04
  • Status:
    • GPU and OS updated on Node 16
    • query sent to Justin
    • cabernet has a bad drive and a bad PSU - is it worth spending time/money? (Lori)

Daytona - Lori

DB/DSG - Gordon

Games Institute - Lori

HCI - Ronaldo

  • migrate hci-web to a newer machine (formerly snap-host) - ST#95233
  • purchase and install new equipment - ST#102594

HI - Gordon

Himrod - Lori

ISS4E (Keshav)- Ronaldo

  • Documentation - Ronaldo
    • update to reference new systems and remove old (including MachineNotes)
  • re-organize file space / backup of NAS, repurpose the backup server (tsunami&flood)

NPSG - Gordon

PLG - Gordon

Ripple - Lori

SciCom - Mike

  • Upgrade elora.cs to 14.04 - Mike - ST#103776

SWAG - Mike

  • Install in DC 3558

Watform - Ronaldo

  • migrate repository to new server - ST#90710
  • Documentation - update to remove reference to old servers - ST#78634

ideas for RSG for Spring Term

  • rebuild m160 File storage from 40TB and 60TB xfs to 100TB zfs - W16 - Mike / Lori - ST#104138
  • Service Catalogue - Daniel / Lawrence - ST#104292
  • CSCF Client survey - Lawrence/Omar/Dave
  • From the retreat
    • Create an index of our current ST items based on ST categories and keywords
    • Develop / execute survey of faculty, staff, students about what services they would like
    • document inventory testing
    • analyse web logs for client searches/requests
    • link index to documentation
    • other possible service offerings
    • direction of development tools / DBS
    • Accounts management
  • Isaac
    • roll out Grad TA evaluations
    • CrowdMark / Learn integration
      • plans to be functional by the end of this term
      • just needs to be pushed to CrowdMark for IMS-created exams, rather than created directly out of Learn
      • CrowdMark just needs to read the data that Isaac is pushing out and then it can feed back into Learn
  • Daniel
    • Service Catalogue - Daniel / Lawrence - ST#104292 - investigative? * would be great to be ready at the end of Spring term to be ready to start implementing the catalogue for the Fall (or Winter) term * important - this implies getting a common agreement on what is our list of "services"
    • document inventory testing - well underway, commit to having it well documented for Spring could be a good thing
      • what does "success" look like?
      • automated testing plan that will test each release, and will be in-sync with the manual testing plan, and how to add new tests
    • discussion of development/DB direction for future - Isaac/Daniel/Ken
    • ST -> RT - start developing a plan for what needs to be developed - how would we work with IST on co-development?
    • migrating CS web site - coordination with IST and MFCF * uwaterloo.ca/computer-science vs cs.uwaterloo.ca reverse-proxy (?) * see ST# ?? for the costs outlined by Daniel of the implications of various options, and who runs what servers 1) cs.uwaterloo.ca/whatever -> cs.uwaterloo.ca/whaterver (on UW drupal server) - good 2) uwaterloo.ca/computer-science/whatever -> uwaterloo.ca/computer-science/whatever - bad 3) uwaterloo.ca/search -> [search result on cs.uwaterloo.ca] -> cs.uwaterloo.ca/whatever (on UW drupal server) - good
    • accounts management - better UI tools (CSCF/MFCF/IST?) - investigation/planning only for Spring
    • thinking about AssetMan - implications for our future ...

Other ideas for Spring 2016 - not necessarily RSG

  • Course Master's lab - move/refresh - ST#104443
  • linux.cs - two servers @ 14.04
  • migrate CS web site
  • migrate to IST absence management - ST#104369
  • create a 3-year plan for all of CSCF services
  • Unified job description - managers
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2016-08-15 - LawrenceFolland
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback