TechnicalOperations Status Report

Project Operations from 2017-10-01 to 2017-10-23

Help

Network Operations 178608 Disable Jenkins autodiscovery system Elaborated Done None
Network Operations 169643 codfw: rack frack refresh equipment In-Scope Done None
Network Operations 174397 Tracking task for network syslog messages In-Scope Done None
Network Operations 165584 Deploy pybal with BGP MED support (for primary/backup) in production In-Scope Done None
Network Operations 177332 vcp port down on fasw-c-codfw Screep Done None
Network Operations 170144 Evaluate NetBox as a Racktables replacement & IPAM In-Scope Open None
Network Operations 173698 Backfill librenms data in graphite with historical RRDs In-Scope Open None
Network Operations 83992 Juniper monitoring In-Scope Open None
Network Operations 167840 Merge AS14907 with AS43821 In-Scope Open None
Network Operations 98006 Anycast (Auth)DNS In-Scope Open None
Network Operations 150264 Icinga check for VRRP In-Scope Open None
Network Operations 173489 pmacct should be upgraded to 1.6.2 on Stretch In-Scope Open None
Network Operations 157435 Review ACLs for the Analytics VLAN In-Scope Open None
Network Operations 150256 Re-setup lvs1007-lvs1012, replace lvs1001-lvs1006 In-Scope Open None
Network Operations 120425 dumps.wikimedia.org seems to have poor throughput towards some destinations In-Scope Open None
Network Operations 167842 Find a new PIM RP IP In-Scope Open None
Network Operations 176175 connect second ethernet interface for fundraising codfw hosts In-Scope Open None
Network Operations 176337 esams: networking audit for support contract renewal In-Scope Open None
Network Operations 86541 setup wifi in codfw In-Scope Open None
Network Operations 176338 eqiad: networking audit for support contract renewal In-Scope Open None
Network Operations 133387 Enabling IGMP snooping on QFX switches breaks IPv6 (HTCP purges flood across codfw) In-Scope Open None
Network Operations 174637 Setup esams atlas anchor In-Scope Open None
Network Operations 167299 Upgrade BIOS/RBSU/etc on lvs1007 In-Scope Open None
Network Operations 172459 eqiad row D switch upgrade In-Scope Open None
Network Operations 174616 set up cr3-esams In-Scope Open None
Network Operations 167306 ospf link-protection In-Scope Open None
Network Operations 169644 eqiad: rack frack refresh equipment In-Scope Open None
Network Operations 171032 Investigate lvs IP pages during codfw row C switch upgrade In-Scope Open None
Network Operations 176975 connect second interface for each frack to opposite switch for each eqiad host In-Scope Open None
Network Operations 176427 unrack/decom pfw1-codfw and pfw2-codfw In-Scope Open None
Network Operations 163674 Frequent RST returned by appservers to LVS hosts In-Scope Open None
Network Operations 122406 Consider renumbering Labs to separate address spaces In-Scope Open None
Network Operations 166171 rack/setup/wire/deploy msw2-c1-eqiad In-Scope Open None
Network Operations 82038 create a test for multicast relay In-Scope Open None
Traffic 175803 Text eqiad varnish 503 spikes In-Scope Done None
Traffic 145661 varnish backends start returning 503s after ~6 days uptime In-Scope Done None
Traffic 133791 check_dns needs to be rewritten In-Scope Done None
Traffic 177233 Upgrade cache_misc to Varnish 5 Elaborated Done None
Traffic 177815 Alerts on LVS services with one single realserver Screep Done None
Traffic 176386 upload@ulsfo strange ethernet / power / switch issues, etc... In-Scope Done None
Traffic 166758 cp3032 ethernet link down (bnx2x dump in the dmesg) In-Scope Done None
Traffic 174932 Recurrent 'mailbox lag' critical alerts and 500s In-Scope Done None
Traffic 178149 RunCommandMonitoringProtocol throws an exception if runcommand.arguments is not specified Screep Done None
Traffic 171710 pybal: add prometheus metrics In-Scope Done None
Traffic 178078 RESTBase logs disappeared from logstash Screep Done None
Traffic 127387 Split slash decoding from general percent normalization in Varnish VCL In-Scope Open None
Traffic 144508 Point wikipedia.in to 205.147.101.160 instead of URL forward In-Scope Open None
Traffic 97051 adding new languages to DNS langs.tmpl doesn't work until zone template is edited as well In-Scope Open None
Traffic 156320 $wgServer with initial https:// does not force HTTPS (wgSecureLogin) In-Scope Open None
Traffic 167060 en.wiki domain owned by us, but isn't hosted by us?? In-Scope Open None
Traffic 129682 Look into solutions for replaying traffic to testing environment(s) In-Scope Open None
Traffic 102178 Fix RESTBase support for wikitech.wikimedia.org In-Scope Open None
Traffic 123854 Set up action API latency / error rate metrics & alerts In-Scope Open None
Traffic 162683 Network hardware purchasing for Asia Cache DC In-Scope Open None
Traffic 146832 Clarify caching to enable direct Wikidata Query Service access by <mapframe/link> In-Scope Open None
Traffic 166782 wikimediafoundation.org's language selector is confusing to most visitors who don't have accounts there In-Scope Open None
Traffic 94125 Central login notice appears on unencrypted API format=*fm pages, where reloading does not affect login status In-Scope Open None
Traffic 149873 CentralNotice: Review and update Varnish caching for Special:BannerLoader In-Scope Open 2.0
Traffic 120121 Improve Varnish XFF processing for trusted proxies In-Scope Open None
Traffic 128559 store.wikimedia.org HTTPS issues In-Scope Open None
Traffic 109325 Outbound HTTPS for varnish backend instances In-Scope Open None
Traffic 119038 Image cache issue when 'over-writing' an image on commons In-Scope Open None
Traffic 152622 Wikipedia.cz and other domains owned by WMCZ have invalid certificate In-Scope Open None
Traffic 167906 Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting In-Scope Open None
Traffic 82849 lvs servers report 'Memory allocation problem' on bootup In-Scope Open None
Traffic 127482 Enable VCL source-DC switching via confd In-Scope Open None
Traffic 150673 Thumb API: Varnish / CDN questions In-Scope Open None
Traffic 176875 Allow access to wdqs.svc.eqiad.wmnet on port 8888 In-Scope Open None
Traffic 144194 Varnish-triggered CN campaign about browser security In-Scope Open None
Traffic 164259 Add VSL error counters to Varnishkafka stats In-Scope Open None
Traffic 155314 Varnish does not cache Action API responses when logged in In-Scope Open None
Traffic 122867 Evaluate the feasibility of cache invalidation for the action API In-Scope Open None
Traffic 131894 Collect Backend-Timing in Graphite (or Prometheus) In-Scope Open None
Traffic 117826 TEST: redirect small portion of unauthenticated desktop users to mobile web In-Scope Open None
Traffic 159137 certspotter: Error retrieving STH from log In-Scope Open None
Traffic 163141 dbtree: make wasat a working backend and become active-active In-Scope Open None
Traffic 176366 Decom cp4005-8,13-16 (8 nodes) In-Scope Open None
Traffic 134404 Varnish support for active:active backend services In-Scope Open None
Traffic 78963 Support ESI for ResourceLoader In-Scope Open None
Traffic 111588 RFC: API-driven web front-end In-Scope Open None
Traffic 163251 Communicate dropping IE8-on-XP support (a security change) to affected editors and other community members In-Scope Open None
Traffic 171470 Monitor DNS delegations In-Scope Open None
Traffic 176905 Evaluate requesting a rate limit change from Letsencrypt In-Scope Open None
Traffic 137252 Redirect phabricator.mediawiki.org to phabricator.wikimedia.org In-Scope Open None
Traffic 171168 cp1050 apparently stuck while "Initializing firmware interfaces..." In-Scope Open None
Traffic 133410 Deploy TemplateStyles to WMF production In-Scope Open None
Traffic 74186 Varnish: Mobile site redirect interferes with OAuth authorization process In-Scope Open None
Traffic 154017 compile number of http uses for http://www.wikidata.org/entity In-Scope Open None
Traffic 118468 point wikilovesmonuments.org ns to wmf In-Scope Open None
Traffic 96844 Update TLS/HTTP documentation on wikitech In-Scope Open None
Traffic 133717 Letsencrypt all the prod things we can - planning In-Scope Open None
Traffic 165764 Fully-redundant LVS clusters using Pybal per-service MED feature In-Scope Open None
Traffic 54253 Protocol-relative URLs are poorly supported or unsupported by a number of HTTP clients In-Scope Open None
Traffic 109331 Deleted files sometimes remain visible to non-privileged users if permanently linked In-Scope Open None
Traffic 128358 Uploading 1.2GB ogv results in 503 In-Scope Open None
Traffic 152091 Block hotlinking In-Scope Open None
Traffic 144187 Better handling for one-hit-wonder objects In-Scope Open None
Traffic 177199 Add Prometheus client support for varnish/statsd metrics daemons Screep Open None
Traffic 164609 Merge cache_misc into cache_text functionally In-Scope Open None
Traffic 125170 Internal DNS resolver responds with NXDOMAIN for localhost AAAA In-Scope Open None
Traffic 178535 decommission lvs400[1-4].ulsfo.wmnet Screep Open None
Traffic 156256 Allocate address space for Singapore (APNIC) In-Scope Open None
Traffic 105657 Expires header for load.php should be relative to request time instead of cache time In-Scope Open None
Traffic 146332 Create short link for outreachdashboard.wmflabs.org In-Scope Open None
Traffic 164456 Build nginx without image filter support In-Scope Open None
Traffic 178173 Renew unified certificates 2017 Screep Open None
Traffic 153563 Consider switching to HTTPS for Wikidata query service links In-Scope Open None
Traffic 170518 Non zero rated LVS IPs In-Scope Open None
Traffic 172116 Improve OCSP fetching and monitoring strategies In-Scope Open None
Traffic 119396 Create globally-unique varnish cache cluster port/instancename mappings In-Scope Open None
Traffic 175203 Implement stateless TCP balancing in our LVS servers In-Scope Open None
Traffic 174891 cp4024 kernel errors In-Scope Open None
Traffic 174342 Missing IP addresses for Maroc Telecom In-Scope Open None
Traffic 159056 cp2017 froze and stopped serving traffic In-Scope Open None
Traffic 91820 Create HTTP verb and sticky cookie DC routing in VCL In-Scope Open None
Traffic 177228 Multiple systems in esams OE10 showing PSU failures Screep Open None
Traffic 66214 Define an official thumb API In-Scope Open None
Traffic 127573 wikiknihy.cz - transfer to Wikimedia Czech Republic? In-Scope Open None
Traffic 171966 setup/install cp402[34] In-Scope Open None
Traffic 101525 Set up LVS for current AuthDNS In-Scope Open None
Traffic 112765 Phabricator needs to expose notification daemon (websocket) In-Scope Open None
Traffic 155806 Add CAA records to our domains In-Scope Open None
Traffic 86915 nan and minnan subdomain redirects are a mess In-Scope Open None
Traffic 98165 Figure out an etcd deploy strategy that includes multi DC failure scenarios. In-Scope Open None
Traffic 161360 404 loading images from Virgin Media In-Scope Open None
Traffic 147202 Removing support for AES128-SHA TLS cipher In-Scope Open None
Traffic 171850 Backport ipvsadm In-Scope Open None
Traffic 120631 Security: Is it safe to enable Zero spoofing In-Scope Open None
Traffic 165560 Artificial spike in offset of unique devices from November to February 6th on wikidata In-Scope Open None
Traffic 178423 rack/setup/install cp40(29|3[012]).ulsfo.wmnet Elaborated Open None
Traffic 156032 Server hardware installation for Asia Cache DC In-Scope Open None
Traffic 137979 Support brotli compression In-Scope Open None
Traffic 164327 replace ulsfo aging servers In-Scope Open None
Traffic 89838 Move proxy IP lists to META for Varnish XFF decoding In-Scope Open None
Traffic 88861 wikipedia.lol In-Scope Open None
Traffic 128409 Detect tools.wmflabs.org tools which are HTTP-only In-Scope Open None
Traffic 127485 Enable VCL applayer datacenter-switch via confd In-Scope Open None
Traffic 170567 Support TLSv1.3 In-Scope Open None
Traffic 117435 Spike: CentralNotice: Verify that our Special:HideBanners cookie storm works as efficiently as possible In-Scope Open 2.0
Traffic 121561 Encrypt Kafka traffic, and restrict access via ACLs In-Scope Open 0.0
Traffic 104681 HTTPS Plans (tracking / high-level info) In-Scope Open None
Traffic 45250 Redo /beacon/impression system (formerly Special:RecordImpression) to remove extra round trips on all FR impressions (title was: S:RI should pyroperish) In-Scope Open None
Traffic 138093 Investigate query parameter normalization for MW/services In-Scope Open None
Traffic 178436 rack/setup/install lvs400[567].ulsfo.wmnet Elaborated Open None
Traffic 161517 Allow anonymous users to change interface language on Commons with ULS In-Scope Open None
Traffic 118181 Planning for phasing out non-Forward-Secret TLS ciphers In-Scope Open None
Traffic 107236 Switch port 80 to nginx on primary clusters In-Scope Open None
Traffic 178151 Add UDP monitor for pybal Screep Open None
Traffic 132629 Data passed to HHVM ($_SERVER variables) is a mixed bag of already-decoded and non-decoded nonsense In-Scope Open None
Traffic 78421 m.{project}.org portal/redirect consistency In-Scope Open None
Traffic 63782 Add varnish logs to logstash In-Scope Open None
Traffic 141480 mixed-content issues on planet.wikimedia.org In-Scope Open None
Traffic 172198 setup/install cp402[5-8].ulsfo.wmnet In-Scope Open None
Traffic 120486 add a https-only option to dynamicproxy In-Scope Open None
Traffic 152882 Many misc wikis lack mobile domains In-Scope Open None
Traffic 128374 Sort out analytics service dependency issues for cp* cache hosts In-Scope Open None
Traffic 154702 Fix broken referer categorization for visits from Safari browsers In-Scope Open None
Traffic 129839 restrict upload cache access for private wikis In-Scope Open None
Traffic 168699 Verify that the codfw lvs is configured correctly for Phabricator In-Scope Open None
Traffic 177961 Upgrade LVS servers to stretch Screep Open None
Traffic 162099 lvs2002 random shut down In-Scope Open None
Traffic 56783 Respect X-Forwarded-For only from trustworthy sources In-Scope Open None
Traffic 84543 more robust certificate chain creation in puppet In-Scope Open None
Traffic 131930 Set SPF (... -all) for toolserver.org In-Scope Open None
Traffic 102848 Split GeoIP into a new component In-Scope Open None
Traffic 99216 Please set up a CNAME for videoserver.wikimedia.org to Video Editing Server In-Scope Open None
Traffic 150022 thumb_handler.php should not set CC:no-cache on renderer 404 responses? In-Scope Open None
Traffic 119366 Disable caching on the main page for anonymous users In-Scope Open None
Traffic 99531 [Task] move wikiba.se webhosting to wikimedia misc-cluster In-Scope Open None
Traffic 159346 convert mail servers from GS to LE certificates In-Scope Open None
Traffic 159429 Allow setting varnish connection timeouts in puppet In-Scope Open None
Traffic 23027 Requests with utf-8 in the URL return a outdated page revision In-Scope Open None
Traffic 125938 PHP fatal errors causing Varnish to return 503 - "Junk after gzip data" In-Scope Open None
Traffic 109776 Tilerator should purge Varnish cache In-Scope Open None
Traffic 79730 Add pybal check to ensure service IP is bound In-Scope Open None
Traffic 143562 High number of failed inbound TFO connections in esams Mon-Fri In-Scope Open None
Traffic 134893 Unhandled pybal error causing services to be depooled in etcd but not in lvs In-Scope Open None
Traffic 178567 Server error (500) while trying to download files from Commons from PAWS Screep Open None
Traffic 106517 upload.wikimedia.org returns HTTP status code 503 for truncated urls, not 404 In-Scope Open None
Traffic 170605 ERR_RESPONSE_HEADERS_MULTIPLE_CONTENT_DISPOSITION In-Scope Open None
Traffic 133001 Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames In-Scope Open 0.0
Traffic 135762 A/B Testing solid framework In-Scope Open None
Traffic 133895 Varnish configuration for mobile domains should be coherent with Apache configuration In-Scope Open None
Traffic 96499 dbtree loads third party resources (from jquery.com and google.com) In-Scope Open None
Traffic 170847 Icinga check for pybal HTTP connections to etcd In-Scope Open None
Traffic 154801 Investigate varnishd child crashes when multiple nodes get depooled/pooled concurrently In-Scope Open None
Traffic 172103 IPVS issues with UDP services, pybal depooling strategy In-Scope Open None
Traffic 141266 letsencrypt puppetization: add parallel rsa+ecdsa cert support In-Scope Open None
Traffic 114104 pybal doesn't fully manage LVS table leaving stale services (on IP change) In-Scope Open None
Traffic 148422 cp3009: memory scrubbing error In-Scope Open None
Traffic 112316 Configure varnish to use "Unconfigured domain" page for 404 Not Served (instead of generic error) In-Scope Open None
Traffic 171498 Implement machine-local forwarding DNS caches In-Scope Open None
Traffic 147199 Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) In-Scope Open None
Traffic 108580 HTTPS for internal service traffic In-Scope Open None
Traffic 91372 $wgMFAnonymousEditing = true is sometimes not respected: cache? In-Scope Open None
Traffic 136737 Fix lvs1001-6 storage In-Scope Open None
Traffic 120509 Cache education dashboard pages In-Scope Open None
Traffic 104442 Investigate better DNS cache/lookup solutions In-Scope Open None
Traffic 83467 LVS testing needs to include internal services testing In-Scope Open None
Traffic 153468 Ferm/DNS library weirdness on deployment-mediawiki boxes In-Scope Open None
Traffic 162818 icinga alerts on nodejs services when a recdns server is depooled In-Scope Open None
Traffic 165765 Refactor pybal/LVS config for shared failover In-Scope Open None
Traffic 172148 Determine URL paths for Zim files In-Scope Open None
Traffic 172123 Determine how to upload Zim files to Swift infrastructure In-Scope Open None
Traffic 163541 cache hosts should auto-repool iff OCSP files are sane In-Scope Open None
Traffic 138546 Backend naming in VCL needs to use fqdn+port In-Scope Open None
Traffic 172124 PyBal Feature: progressive depooling strategy for monitored failures In-Scope Open None
Traffic 166965 Degraded RAID on lvs3001 In-Scope Open None
Traffic 133548 Create a secure redirect service for large count of non-canonical / junk domains In-Scope Open None
Traffic 124954 Decrease max object TTL in varnishes In-Scope Open None
Traffic 102367 Migrate tools.wmflabs.org to https only (and set HSTS) In-Scope Open None
Traffic 159411 Uniform cluster nomenclature across puppet In-Scope Open None
Traffic 177927 Refactor kafka_config.rb and and kafka_cluster_name.rb in puppet to avoid explicit hiera calls Screep Open None
Traffic 144626 Strong cipher preference ordering for cache terminators In-Scope Open None
Traffic 134447 letsencrypt puppetization: upgrade for scalability In-Scope Open None
Traffic 81305 Make PyBal respect advertised BGP capabilities In-Scope Open None
Traffic 174640 Invalid "wikimedia" family in unique devices data due to misplaced WMF-Last-Access-Global cookie In-Scope Open 3.0
Traffic 158599 Samsung Internet's desktop mode getting redirected to mobile site In-Scope Open None
Traffic 133149 Move californium to an internal host? In-Scope Open None
Traffic 128188 Make CI run Varnish VCL tests In-Scope Open None
Traffic 36670 Check all wikis for inclusions of http resources on https In-Scope Open None
Traffic 148976 Strongswan Icinga check: do not report issues about depooled hosts In-Scope Open None
Traffic 175636 prometheus -> grafana stats for per-numa-node meminfo In-Scope Open None
Traffic 171967 setup/install cp4022 In-Scope Open None
Traffic 159412 Convert all of our site.pp/roles to the role/profile paradigm In-Scope Open None
Traffic 141373 Age header reset to 0 after 24 hours on varnish frontends In-Scope Open None
Traffic 148134 OCSP Stapling for Intermediates In-Scope Open None
Traffic 173966 Like nan.wikipedia.org, redirect other nan.*.org to the proper zh-min-nan.*.org domains In-Scope Open None
Traffic 130904 Host rewrite for /static/ not applied to purges In-Scope Open None
Traffic 119372 Pybal IdleConnectionMonitor with TCP KeepAlive shows random fails if more than 100 servers are involved. In-Scope Open None
Traffic 161256 multi-component wmflabs.org subdomains doesn't work under simple wildcard TLS cert In-Scope Open None
Traffic 150479 Prometheus varnish metric churn due to VCL reloads In-Scope Open None
Traffic 164768 Explicitly limit varnishd transient storage In-Scope Open None
Traffic 178592 decommission/replace bast4001.wikimedia.org Elaborated Open None
Traffic 164868 SSL error for https://wikispecies.org/ In-Scope Open None
Traffic 128182 Server certificate is classified as invalid on government computers In-Scope Open None
Traffic 137990 Zero: Investigate removing the limit on carrier tagging to m-dot and zero-dot requests In-Scope Open None
Traffic 146619 DNS domains registered to WMF no longer redirecting In-Scope Open None
Traffic 133821 Content purges are unreliable In-Scope Open None
Traffic 167513 Redirect lzh.wikipedia to zh-classical.wikipedia In-Scope Open None
Traffic 168529 Upgrade to Varnish 5 In-Scope Open None
Traffic 148131 Deploy redundant unified certs In-Scope Open None
Traffic 149847 Use content hash based image / thumb URLs In-Scope Open None
Traffic 178778 Parsoid, VisualEditor not working with SSL / HTTPS Screep Open None
Traffic 124418 Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan In-Scope Open None
Traffic 75944 Monitor Varnish caches on beta cluster have two varnishd process running In-Scope Open None
DBA 174764 db2044 HW RAID failure In-Scope Done None
DBA 163778 Decommission db1022 (Was: db1022 broke while changing topology on s6- evaluate if to fix or directly decommission) In-Scope Done None
DBA 166486 Decommission db1023 In-Scope Done None
DBA 177720 db2038 two disks with predictive failure Screep Done None
DBA 164702 Decommission db1024 In-Scope Done None
DBA 177627 check db1052 power supply redundancy Screep Done None
DBA 175970 Lost access to x1-analytics-slave In-Scope Done None
DBA 177628 check db1054 power supply redundancy Screep Done None
DBA 174054 Test reliability of RAID configuration/database hosts on single disk failure In-Scope Done None
DBA 177171 Degraded RAID on db1056 Screep Done None
DBA 177630 check db1080 power supply redundancy Screep Done None
DBA 177264 Degraded RAID on db1092 Screep Done None
DBA 160731 Decom db1048 (BBU Faulty - slave lagging) In-Scope Open None
DBA 164834 In some database hosts, performance schema loses digest statistics In-Scope Open None
DBA 157702 Followup for TLS MariaDB server roll-out In-Scope Open None
DBA 161754 eqiad: (2) hardware access request for labsdb1004 & 5 refresh In-Scope Open None
DBA 135851 Preserve InnoDB table auto_increment on restart In-Scope Open None
DBA 148955 Puppetize tendril web user creation In-Scope Open None
DBA 165677 Create a backend check for pybal to monitor the MySQL protocol being up In-Scope Open None
DBA 107610 Setup separate logical External Store for Flow in production In-Scope Open None
DBA 152427 Create a check/calendar alert for MariaDB TLS certs In-Scope Open None
DBA 177779 Generate instance list of database hosts to be monitored automatically from exported resources Elaborated Open None
DBA 173570 Decommission db1015 In-Scope Open None
DBA 176311 decommission db1036 In-Scope Open None
DBA 174076 Decommission db1033 and db1028 In-Scope Open None
DBA 54932 Drop *_old database tables from Wikimedia wikis In-Scope Open None
DBA 119626 Eliminate SPOF at the main database infrastructure In-Scope Open None
DBA 119154 Move echo tables from local wiki databases onto extension1 cluster for mediawikiwiki, metawiki, and officewiki In-Scope Open None
DBA 175672 Make apache/maintenance hosts TLS connections to mariadb work In-Scope Open None
DBA 176931 Decommission db1035 In-Scope Open None
DBA 141547 Setup automatic failover for misc database servers In-Scope Open None
DBA 176754 Regularly purge expired temporary userrights from DB tables In-Scope Open 3.0
DBA 109179 Migrate MySQLs to use ROW-based replication In-Scope Open None
DBA 161755 eqiad: (2) hardware access request for labsdb1006 & 7 refresh In-Scope Open None
DBA 151491 Icinga MariaDB disk space check on silver checks the wrong partition In-Scope Open None
DBA 174370 Create elections committee private wiki In-Scope Open None
DBA 127570 Rename be_x_oldwiki database to be_taraskwiki In-Scope Open None
DBA 176532 Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working In-Scope Open None
DBA 171071 Perform testing for TLS effect on connection rate In-Scope Open None
DBA 175685 Decommission db2010 and move m1 codfw to db2078 In-Scope Open None
DBA 112282 Multiple pages with no revisions In-Scope Open None
DBA 134809 Apache <=> mariadb SSL/TLS for cross-datacenter writes In-Scope Open None
DBA 162070 Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases In-Scope Open None
DBA 174806 Decommission db1045 In-Scope Open None
DBA 149643 Review Icinga alarms with disabled notifications In-Scope Open None
DBA 141968 Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master In-Scope Open None
DBA 169501 Move some masters away from B6 In-Scope Open None
DBA 134476 Decommission old coredb machines (<=db1050) In-Scope Open None
DBA 112473 Better mysql monitoring for number of connections and processlist strange patterns In-Scope Open None
DBA 176215 decommission db1018 In-Scope Open None
DBA 163339 pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw In-Scope Open None
DBA 157359 labsdb1006/1007 (postgresql) maintenance In-Scope Open None
DBA 50930 Database replication problems - production and labs (tracking) In-Scope Open None
DBA 145072 Create a script to regenerate prometheus mysqld exporter listing that works with puppetdb In-Scope Open None
DBA 178383 db1101 crashed - memory errors Screep Open None
DBA 126252 Populate the wikishared db on all dbstores In-Scope Open None
DBA 148078 Decommission db1015, db1035, db1044 and db1038 In-Scope Open None
DBA 166108 x1 master db1031: Faulty BBU In-Scope Open None
DBA 174763 Decommission db1026 In-Scope Open None
DBA 165674 Investigate slow servermon updating queries on db1016 In-Scope Open None
DBA 141255 Separate host lookup from the sql shell script In-Scope Open None
DBA 151999 Create script to monitor db dumps for backups are successful (and if not, old backups are not deleted) In-Scope Open None
DBA 133523 [RFC] improve parsercache replication and sharding handling In-Scope Open None
DBA 143896 MySQL metrics monitoring In-Scope Open None
DBA 100501 mysql user and group should be a system user/group In-Scope Open None
DBA 175264 Decommission db1049 In-Scope Open None
DBA 165625 Evaluate future of wmf puppet module "mysql" In-Scope Open None
DBA 156844 Prep to decommission old dbstore hosts (db1046, db1047) In-Scope Open None
DBA 178460 db1082 storage crashed Screep Open None
DBA 173915 Decommission db1041 In-Scope Open None
DBA 178128 Access to raw database tables on labsdb* for wmcs-admin users Screep Open None
DBA 162789 Create less overhead on bacula jobs when dumping production databases In-Scope Open None
DBA 174902 Decommission db1037 In-Scope Open None
DBA 162699 Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) In-Scope Open None
DBA 172498 Switch databases to the future parser In-Scope Open None
DBA 104699 Firewall configurations for database hosts In-Scope Open None
DBA 175679 Decommission db1048 (was Move m3 slave to db1059) In-Scope Open None
DBA 153440 Create a full backup of all external storage records that would be easy to restore/setup a temporary delayed slave In-Scope Open None
DBA 83609 script & docs to rename wiki databases In-Scope Open None
Software Development 157133 Consider adding a --skip-conftool option to puppet-merge In-Scope Open None
Software Development 155705 confctl: log to SAL even if the selection doesn't match any host In-Scope Open None
Software Development 154776 Puppet compiler: order resources for easy comparison between hosts In-Scope Open None
Software Development 167504 New tool to track package updates/status for hosts and images (debmonitor) In-Scope Open None
Software Development 150560 More verbose messages from service-checker-swagger In-Scope Open None
Software Development 144169 Flake8 for python files without extension in puppet repo In-Scope Open None
Software Development 177385 Upgrade Cumin masters to stretch Screep Open None
Software Development 152950 E901 SyntaxError: invalid syntax is wrongly raised on using python's abc by jenkins python CI linter In-Scope Open None
Software Development 159045 Update Puppet repo code that uses maniphest.update and maniphest.createtask conduit api In-Scope Open None
Software Development 164587 cumin could use randomization/splay options In-Scope Open None
Software Development 157002 Puppet compiler: re-add the concurrency option NUM_THREADS In-Scope Open None
Software Development 157001 Puppet compiler: abort on git rebase conflict In-Scope Open None
Software Development 148494 Add shell scripts CI validations In-Scope Open None
Hardware Requests 171179 Decommisson restbase-dev100[1-3] In-Scope Done None
Hardware Requests 171018 decom netmon1001 In-Scope Done None
Hardware Requests 175595 decommission wdqs100[12] In-Scope Open None
Hardware Requests 172323 Decommission WMF3248 (old R510) In-Scope Open None
Hardware Requests 166341 SSDs for main Kafka clusters In-Scope Open None
Hardware Requests 170441 Decommission mw1196 In-Scope Open None
Hardware Requests 167377 Decommission cp4011, cp4012, cp4019, cp4020 In-Scope Open None
Hardware Requests 168559 decom silver (was silver has trouble rebooting) In-Scope Open None
Hardware Requests 130883 decom cp3011-22 (12 machines) In-Scope Open None
Hardware Requests 160986 Decommission ms-fe100[1-4] In-Scope Open None
Hardware Requests 95742 Decomission amssq31-62 (32 hosts) In-Scope Open None
Hardware Requests 159996 decom fluorine In-Scope Open None
Hardware Requests 167376 Decommission cp300[3456] In-Scope Open None
Hardware Requests 166489 Decommission ms-be1001 - ms-be1012 In-Scope Open None
Hardware Requests 175089 decommission mw1163 In-Scope Open None
Hardware Requests 175093 Decommission osmium.eqiad.wmnet In-Scope Open None
Hardware Requests 178392 Replacement hardware for cumin masters Screep Open None
Hardware Requests 173097 Decommission stat1002.eqiad.wmnet In-Scope Open None
Hardware Requests 159480 Decommission bast3001 In-Scope Open None
Hardware Requests 169020 Decommission cp400[1-4] In-Scope Open None
Hardware Requests 172487 decom iridium In-Scope Open None
Hardware Requests 177958 Decommission ocg1001-3 Screep Open None
Hardware Requests 170157 decommission rcs100[12] In-Scope Open 3.0
Other Operations 123147 Wikipedia.com warns about bad certificate Unknown None
Other Operations 177962 Upgrade Jenkins to 2.73.2 (security release) Screep Done None
Other Operations 177833 rsyslog on mw1180 seems to not use the logstash LVS endpoint Screep Done None
Other Operations 167992 rack/setup/install new kafka nodes kafka-jumbo100[1-6] In-Scope Done None
Other Operations 177852 mintaka disk space warning Screep Done None
Other Operations 175980 Upgrade grafana to 4.5.2 In-Scope Done None
Other Operations 177227 Multiple servers in eqiad D8 showing PSU failures Screep Done None
Other Operations 176430 api feature logs should be sent to both eqiad and codfw clusters In-Scope Done None
Other Operations 84148 fix disk space check on dataset1001 Screep Done None
Other Operations 177631 check elastic1022 power supply redundancy Screep Done None
Other Operations 178085 Check cp4026 power supply redundancy Screep Done None
Other Operations 176671 Logic problem in puppet.git tests In-Scope Done None
Other Operations 177854 alnitak disk space warning Screep Done None
Other Operations 165348 Check long-running screen/tmux sessions In-Scope Done None
Other Operations 176507 relabel WMF3083 as frdb1003 In-Scope Done None
Other Operations 167820 rack/setup/install labweb100[12].wikimedia.org In-Scope Done None
Other Operations 174081 mail.wikimedia.org SSL cert expiring Mon 23 Oct 2017 In-Scope Done None
Other Operations 155869 Fix permissions for systemd file In-Scope Done None
Other Operations 177493 IRC operator request for Freenode #wikimedia-operations for @Dereckson Screep Done None
Other Operations 168044 jobrunner / jobchron systemd services are in error state after a stop In-Scope Done None
Other Operations 173427 Review check_puppetrun frequency In-Scope Done None
Other Operations 169658 Improve database backups' coverage, monitoring and data recovery time (part 1) (tracking) In-Scope Done None
Other Operations 168644 Upgrade jenkins to 2.73.1 (new lts release) Screep Done None
Other Operations 176841 Create wikibase/wikiba.se-deploy repo In-Scope Done None
Other Operations 177843 Set up octocatalog-diff on host with access to puppetmasters and puppetdb Screep Done None
Other Operations 169939 End of August milestone: Cassandra 3 cluster in production In-Scope Done None
Other Operations 176090 wikitech-static sync failing In-Scope Done None
Other Operations 177834 Wikimedia Cloud (labs) dns is intermittingly failing Screep Done None
Other Operations 175689 Remove X-Content-Dimensions for multipage originals In-Scope Done None
Other Operations 178039 scap should not pull in HHVM on stretch hosts using PHP7 Screep Done None
Other Operations 177131 adjust flerovium power draw In-Scope Done None
Other Operations 173493 Tune Kafka logs to register clients connected In-Scope Done 8.0
Other Operations 177633 check kafka1022 power supply status Screep Done None
Other Operations 177963 Stretch installer "No kernel modules were found" error Screep Done None
Other Operations 167845 Migrate zuul-server behind systemd service In-Scope Done None
Other Operations 177889 Request public key change for a research fellow Screep Done None
Other Operations 175341 Review and fix PDU settings for syslog/ntp/email servers In-Scope Done None
Other Operations 177635 check mw1200 power supply redundancy Screep Done None
Other Operations 173311 Review check_raid_hpssacli frequency In-Scope Done None
Other Operations 171453 Integrate stretch 9.1 point release In-Scope Done None
Other Operations 177602 analytics-privatedata-users access for Jeff Green Screep Done None
Other Operations 109903 Add PDU redundancy server/router/switch checks in Icinga In-Scope Done None
Other Operations 176221 Upgrade OTRS to 5.0.23 In-Scope Done None
Other Operations 178406 Requesting access to analytics servers for cwdent Screep Done None
Other Operations 177637 check mw1203 power supply redundancy Screep Done None
Other Operations 175736 Give ores admins read access to /srv/log/ores/main.log* In-Scope Done None
Other Operations 178401 Add cobalt to grafana dashboard Elaborated Done None
Other Operations 175057 operations-puppet-tests-docker console output lacks color In-Scope Done None
Other Operations 177214 install2002 free disk space warning Screep Done None
Other Operations 176637 Diff page consistently produces 503 on beta cluster on first visit In-Scope Done None
Other Operations 177599 Request to be added to the ldap/wmde group Screep Done None
Other Operations 175296 Install Blubber on contint1001 In-Scope Done None
Other Operations 152791 Improvements to Ganglia-equivalent Prometheus dashboards In-Scope Done None
Other Operations 145659 Port application-specific metrics from ganglia to prometheus In-Scope Done None
Other Operations 177443 Missing .deb dependencies for appserver on Stretch Elaborated Done None
Other Operations 175381 Remove references to non-existent mfLazyLoadReferences cookies In-Scope Done None
Other Operations 164030 setup releases1001.eqiad.wmnet (was: setup mwreleases1001) In-Scope Done None
Other Operations 176506 rack/setup/install furud.codfw.wmnet In-Scope Done None
Other Operations 176505 rack/setup/install flerovium.eqiad.wmnet In-Scope Done None
Other Operations 94277 Convert snapshot hosts to use HHVM and trusty In-Scope Open None
Other Operations 163033 Create grafana dashboard for video scaler job runners In-Scope Open None
Other Operations 91404 Setup backups of elasticsearch indices In-Scope Open None
Other Operations 147923 Extract metrics from logs In-Scope Open None
Other Operations 174269 Two cases of local-multiwrite storage backend failure In-Scope Open None
Other Operations 163698 Add flood protection to the ircecho bot (icinga-wm) In-Scope Open None
Other Operations 118829 Automate the provisioning and management of MediaWiki clusters In-Scope Open None
Other Operations 165345 decommission indium In-Scope Open None
Other Operations 150672 Provide a /parsoid directory on releases.wikimedia.org In-Scope Open None
Other Operations 148017 lvs2002 repeated usb connect/disconnect message In-Scope Open None
Other Operations 177099 Large number of "A page you created was linked on Wikidata" emails to one recipient in short period of time In-Scope Open None
Other Operations 151048 Icinga monitoring for Yubikey components In-Scope Open None
Other Operations 142002 Clean up puppet & configs for ORES In-Scope Open None
Other Operations 117508 Make ops-l a list for humans again (no cheating) In-Scope Open None
Other Operations 105780 Create a doc explaining the SLA between services and the monitoring tool In-Scope Open None
Other Operations 125411 Diamond load averages do not contain scaled versions In-Scope Open None
Other Operations 148986 Firewall sets not being loaded post-reboot due to a @resolve race on jessie In-Scope Open None
Other Operations 170150 Evaluate Grafana's LDAP group options and deprecate grafana-admin if possible In-Scope Open None
Other Operations 116951 Reprepro should bail if it can't read and sign using the root keys In-Scope Open None
Other Operations 136312 Encrypt syslog traffic In-Scope Open None
Other Operations 164248 HTTP responses from app servers sometimes stall for >1s In-Scope Open None
Other Operations 133744 Epic: switch Maps to production status In-Scope Open None
Other Operations 129621 "internal_api_error_MWException: [dbf916b7] Exception Caught: Could not acquire lock for" for some uploads (during upload with Pywikibot OAuth) In-Scope Open None
Other Operations 177393 Implement authentication/authorization in Kubernetes clusters Screep Open None
Other Operations 176153 Create affcom-staff email account In-Scope Open None
Other Operations 156570 Investigate issues with wikitech-static.wikimedia.org In-Scope Open None
Other Operations 174916 electron/pdfrender hangs In-Scope Open None
Other Operations 111595 Do not apply spam headers on email assessed NOT to be spam In-Scope Open None
Other Operations 150466 publish kartotherian / tilerator metrics by cluster In-Scope Open None
Other Operations 123560 investigate rsync between dcs with encryption In-Scope Open None
Other Operations 171851 Reimage ores* hosts with Debian Stretch In-Scope Open None
Other Operations 152445 Move prometheus entry point off port 80 In-Scope Open None
Other Operations 119274 Check incoming requests to secure.wm.o In-Scope Open None
Other Operations 131326 smokeping config puppetization issue? In-Scope Open None
Other Operations 154026 On mobile, http://wikipedia.org/wiki/Foo redirects to https://www.m.wikipedia.org/wiki/Foo which does not exist In-Scope Open None
Other Operations 170817 Upgrade Thumbor servers to Stretch In-Scope Open None
Other Operations 129963 Update memcached package and configuration options In-Scope Open None
Other Operations 123276 URL parameters do not work with pages that have "?" in their names In-Scope Open None
Other Operations 168403 Aggregate prometheus functions yielding different results in grafana vs. prometheus console In-Scope Open None
Other Operations 177396 Design pod-level monitoring and service-level alerting Screep Open None
Other Operations 177394 Experiment with a TLS proxy/router for pods Screep Open None
Other Operations 126619 cassandra slow streaming during (de)commission In-Scope Open None
Other Operations 160941 Improve SSH access information in onboarding documentation In-Scope Open None
Other Operations 119719 Enforce a minimum refresh period for grafana dashboards hitting graphite In-Scope Open None
Other Operations 125015 Requests to (hard) redirect pages return their target's contents but are counted as pageviews to the redirect page In-Scope Open None
Other Operations 149589 Puppet tab in Horizon unusably slow In-Scope Open None
Other Operations 178457 nutcracker fails to start due to lack of /var/run/nutcracker (ex: deployment-videoscaler01 has memcached failures) Screep Open None
Other Operations 170740 PuppetDB misbehaving on 2017-07-15 In-Scope Open None
Other Operations 167091 Elasticsearch errors about BulkShardRequest In-Scope Open None
Other Operations 163288 Decide on /var/lib vs /home as locations of homedir for l10nupdate In-Scope Open None
Other Operations 94951 Enable the usage of `hhvm -m debug --debug-host ::1` from mw1017 so developers can step through code (think gdb) in production to see what is going wrong. In-Scope Open None
Other Operations 151310 create-dbusers service failing on labstore1004 In-Scope Open None
Other Operations 172479 Collect error logs from jobchron/jobrunner services in Logstash In-Scope Open None
Other Operations 138685 notebook1001 shown as DOWN in icinga, due to firewall rules In-Scope Open None
Other Operations 163667 Fix UIDs for deployment server users In-Scope Open None
Other Operations 172409 Copper root (/) 95% full In-Scope Open None
Other Operations 134458 status.wikimedia.org should use some Wikimedia favicon if possible In-Scope Open None
Other Operations 177395 Improve monitoring of the Kubernetes clusters Screep Open None
Other Operations 140442 reinstall rdb100[56] with RAID In-Scope Open None
Other Operations 159242 Segmentation fault creating thumbnail In-Scope Open None
Other Operations 150872 Replace OCG in collection extension with Electron In-Scope Open None
Other Operations 146627 Make deployment-prep puppetmaster more similar to Production puppetmaster In-Scope Open None
Other Operations 108985 Monitor MediaWiki sessions In-Scope Open None
Other Operations 165631 move gerrit.wm.org SSH service to private/behind LVS like phab-vcs In-Scope Open None
Other Operations 135991 Automated service restarts for common low-level system services In-Scope Open None
Other Operations 116750 2FA for SSH access to the production cluster In-Scope Open None
Other Operations 156924 Allow integration of data from etcd into the MediaWiki configuration In-Scope Open None
Other Operations 151317 stat user crontab on stat hosts for old file removal In-Scope Open None
Other Operations 157972 Puppet fails only once when restarting ferm is not successful In-Scope Open None
Other Operations 133164 Document eqiad/codfw transition plan for OCG In-Scope Open None
Other Operations 69015 m.wikipedia.org incorrectly redirects to en.m.wikipedia.org In-Scope Open None
Other Operations 116288 Install mailman-api for internal use In-Scope Open None
Other Operations 177521 Permissions to upload data to the analytics cluster from a machine at Drexel Screep Open None
Other Operations 161835 Convert labstore cluster configuration to hiera and profiles In-Scope Open None
Other Operations 149885 Investigate Swift as a storage backend for maps tiles In-Scope Open None
Other Operations 156955 Standardizing our partman recipes In-Scope Open None
Other Operations 141128 determine/process/document bios firmware tracking/updating policies In-Scope Open None
Other Operations 173056 Import Wiki Loves Monuments photos from Flickr to Commons In-Scope Open None
Other Operations 132856 Write documentation on how / when to use custom Diamond metrics collectors In-Scope Open None
Other Operations 160158 Make disabled accounts visible in the corp mirror LDAP replica In-Scope Open None
Other Operations 162857 Some Core availability Catchpoint tests might be more expensive than they need to be In-Scope Open None
Other Operations 151046 Fully puppetise yubikey-val In-Scope Open None
Other Operations 131966 Default gateway unreachable on baham.wikimedia.org after reboot In-Scope Open None
Other Operations 149543 Setup PAWS internal experimentally on notebook* nodes In-Scope Open None
Other Operations 174475 update firmware on scs consoles In-Scope Open None
Other Operations 134237 Graphoid returns a 400 on MW API time-out In-Scope Open None
Other Operations 55457 setup a DB backed parser cache In-Scope Open None
Other Operations 148693 Deploy IDS rendering engine to production In-Scope Open None
Other Operations 161004 Remove disabled users from internal mailing lists In-Scope Open None
Other Operations 95054 Move ircecho config file to be YAML In-Scope Open None
Other Operations 163393 Determine appropriate proxy_read_timeout setting for Tools Proxy In-Scope Open None
Other Operations 165618 Audit / document reasons for not enabling HT? In-Scope Open None
Other Operations 168619 Degraded RAID on lvs3001 In-Scope Open None
Other Operations 143931 Update ICU version to 55.1 In-Scope Open None
Other Operations 148843 GPU upgrade for stats machine In-Scope Open None
Other Operations 168767 Monitor PostgreSQL connection slots In-Scope Open None
Other Operations 140141 Install mscorefonts on scaling servers for SVG rendering In-Scope Open None
Other Operations 178690 Better organization for ops grafana dashboards Screep Open None
Other Operations 133091 Highest SSTables / read thresholds In-Scope Open None
Other Operations 165779 rack/setup/install labnet100[34] In-Scope Open None
Other Operations 133656 Have a paging check for Nova API accessible In-Scope Open None
Other Operations 158757 Puppet certificate missing subjectAltName In-Scope Open None
Other Operations 148061 Feasibility of hosting podcast setup on Wikimedia servers In-Scope Open None
Other Operations 172628 conf2002 etcdmirror-conftool-eqiad-wmnet died In-Scope Open None
Other Operations 150185 Deploy ElectronPdfService Extension to production In-Scope Open None
Other Operations 142984 Review lists of config/sysctl recommendations by "kernel self-protection project" In-Scope Open None
Other Operations 151047 Integrate Yubikey into data.yaml In-Scope Open None
Other Operations 141704 Storage backend errors on commons when deleting/restoring pages In-Scope Open None
Other Operations 158429 Switch to predictable network interface names? In-Scope Open None
Other Operations 127054 pinentry-gtk2 pulls in a lot of unneeded Gnome/GTK libs In-Scope Open None
Other Operations 169035 bast3002 sdb broken In-Scope Open None
Other Operations 148048 Store Wikimedia unified account name (SUL) in LDAP directory In-Scope Open None
Other Operations 126158 [RFC] Alert about *when* partitions will run out of space, not a percentage/absolute number In-Scope Open None
Other Operations 122917 Provide a good download service of dumps from Wikimedia In-Scope Open None
Other Operations 100777 expose hosts in maintenance state so we can prevent scap from running on them In-Scope Open None
Other Operations 153816 apache::static_site is not working In-Scope Open None
Other Operations 137176 catch-all apache vhost on the cluster should return 404 for non-existing sites In-Scope Open None
Other Operations 167292 Collate jessie-wikimedia/backports into jessie-wikimedia/main In-Scope Open None
Other Operations 152782 Kibana functionality missing after upgrade: histograms In-Scope Open None
Other Operations 170456 FY2017/18 Program 6 - Outcome 2 - Objective 3: Integrated, container-based development environment In-Scope Open None
Other Operations 170480 FY2017/18 Program 6 - Outcome 2: Developers are able to develop and test their applications through a unified pipeline towards production deployment. In-Scope Open None
Other Operations 85451 scale graphite deployment (tracking) In-Scope Open None
Other Operations 128590 Cassandra uses default ip address for outbound packets while bootstrapping In-Scope Open None
Other Operations 126989 MediaWiki logging & encryption In-Scope Open None
Other Operations 123106 PNG thumbnail preview of SVG misses some text In-Scope Open None
Other Operations 176445 Systematically test load speeds of Watchlist and Recent Changes In-Scope Open None
Other Operations 177397 Create scaffolding of services templates for deployment in production/staging Screep Open None
Other Operations 116747 Meta task "Revamp user authentication" In-Scope Open None
Other Operations 119718 Make it easier to ban misbehaving dashboards from graphite In-Scope Open None
Other Operations 120532 Use user-specific passwords for accessing EventLogging database In-Scope Open None
Other Operations 167225 Upload hhvm to stretch apt repo in apt.wikimedia.org In-Scope Open None
Other Operations 170108 Operations Q1 goal: Streamlined Service Delivery In-Scope Open None
Other Operations 167245 prometheus-node-exporter - invalid group: ‘prometheus:prometheus' In-Scope Open None
Other Operations 163336 kube-proxy pulls in docker and starts service even when it isnt needed In-Scope Open None
Other Operations 159536 Puppet constantly trying to stop the already stopped puppetmaster process on Trusty In-Scope Open None
Other Operations 133179 Redis monitoring needs to be improved In-Scope Open None
Other Operations 167422 Monitoring: add link to graph for Icinga timeseries alarms In-Scope Open None
Other Operations 151486 Silver anomalies In-Scope Open None
Other Operations 118746 Goal: Strengthen Incident monitoring infrastructure In-Scope Open None
Other Operations 87790 decom amslvs1-4 (dc work) In-Scope Open None
Other Operations 89829 bond eth interfaces on ms1001 In-Scope Open None
Other Operations 154619 Export ipsec counters as Prometheus metrics In-Scope Open None
Other Operations 136562 Audit/fix hosts with no RAID configured In-Scope Open None
Other Operations 122825 Service Ownership and Maintenance In-Scope Open None
Other Operations 116580 monitor postgresql replication status In-Scope Open None
Other Operations 158562 Manage apt sources via puppet? In-Scope Open None
Other Operations 174431 Migration of mw* servers to stretch In-Scope Open None
Other Operations 160677 Effects on adjusting Prometheus retention In-Scope Open None
Other Operations 170453 FY2017/18 Program 6: Streamlined Service delivery In-Scope Open None
Other Operations 86552 Monitor and alarm on SMART attributes In-Scope Open None
Other Operations 133392 save grafana dashboards in revision control / puppet In-Scope Open None
Other Operations 168407 rack/setup/install labnodepool1002.eqiad.wmnet In-Scope Open None
Other Operations 89808 wikitech instances list is blank In-Scope Open None
Other Operations 176437 puppet ca_server confusion In-Scope Open None
Other Operations 158915 Make sure replying to emails in gerrit 2.14 works In-Scope Open None
Other Operations 177498 Provide a forward port of ICU 52 for stretch / Investigate best ICU update strategy Elaborated Open None
Other Operations 160529 Sender email spoofing In-Scope Open None
Other Operations 146664 Limit resources used by ORES In-Scope Open None
Other Operations 151009 Provide authenticated access to Prometheus native web interface In-Scope Open None
Other Operations 124101 Specific revisions of multiple files missing from Swift - 404 Not Found returned In-Scope Open None
Other Operations 143552 Make elasticsearch configuration more robust to loss of network connectivity In-Scope Open None
Other Operations 173721 Track down the source of periodic increases in requests to swift eqiad In-Scope Open None
Other Operations 167412 host-vmem.erb is doing operations that make no sense In-Scope Open None
Other Operations 146285 Switch mwscript from Zend PHP5 to default php alternative (e.g. HHVM or PHP7) In-Scope Open None
Other Operations 106937 Monitor [[Special:ListFiles]] for non 200 HTTP statuses in thumbnails In-Scope Open None
Other Operations 158434 Phabricator: Make sure phabricator works properly including our puppet roles on jessie In-Scope Open None
Other Operations 101585 document redis upgrade/restart procedures In-Scope Open None
Other Operations 175799 port elasticsearch diamond collector to prometheus In-Scope Open None
Other Operations 177197 Export Prometheus-compatible JVM metrics from JVMs in production Screep Open None
Other Operations 137217 Clean up apt:pin of python modules used for Nodepool In-Scope Open None
Other Operations 89887 Clean up permissions for privatedata files on stat1002 - they should be group readable by statistics-privatedata-users In-Scope Open None
Other Operations 150875 Confirm attribution needs In-Scope Open None
Other Operations 165885 Create a cron to clean clientbucket every day or hour In-Scope Open None
Other Operations 161899 End self-service new Trusty instance creation in Cloud VPS; standardize on Debian base images In-Scope Open None
Other Operations 152078 Load balancing "external" traffic to the Kubernetes cluster in production In-Scope Open None
Other Operations 114801 operations-apache-config-lint replacement doesn't check syntax In-Scope Open None
Other Operations 150822 Internal PKI for secure communication - Barcelona Ops offsite 2016 In-Scope Open None
Other Operations 135385 investigate carbon-c-relay stalls/drops towards graphite2002 In-Scope Open None
Other Operations 175206 2017/18 Annual Plan Program 8: Multi-datacenter support In-Scope Open None
Other Operations 148567 Restrict outgoing network connections from Electron render service In-Scope Open None
Other Operations 160060 Icinga check for sysctl settings In-Scope Open None
Other Operations 158196 Reimage labstore1001 and labstore1002 for DRBD storage setup In-Scope Open None
Other Operations 130617 Collect metrics on pool counter usage In-Scope Open None
Other Operations 161528 incident 20170323-wikibase did not trigger Icinga paging In-Scope Open None
Other Operations 141897 Review new service 'pre-deployment to production' checklist In-Scope Open None
Other Operations 122144 Move most (all?) exim personal aliases to OIT In-Scope Open None
Other Operations 147872 Rename rhodium to puppetmaster1003 In-Scope Open None
Other Operations 163673 Some swift disks wrongly mounted on 5 ms-be hosts In-Scope Open None
Other Operations 136311 Monitor the BMC's event log for hardware errors In-Scope Open None
Other Operations 142205 use granularity (g=) restrictions for wikimedia.org fundraising DKIM records In-Scope Open None
Other Operations 150912 Class 'Memcached' not found when running mwscript eval.php on debug servers In-Scope Open None
Other Operations 136403 Move cp3030+ from OE14 to OE13 in racktables In-Scope Open None
Other Operations 116063 Hardware Automation Workflow - Overall Tracking In-Scope Open None
Other Operations 132104 Consider moving policy.wikimedia.org away from WordPress.com In-Scope Open None
Other Operations 130593 investigate slapd memory leak In-Scope Open None
Other Operations 161918 videoscalers (mw1168, mw1169) - high load / overheating In-Scope Open None
Other Operations 154915 Get rid of "import realm.pp" in manifests/site.pp In-Scope Open None
Other Operations 116805 DomainKeys Identified Mail (DKIM) for phabricator.wikimedia.org In-Scope Open None
Other Operations 150871 [EPIC] (Proposal) Replicate core OCG features and sunset OCG service In-Scope Open None
Other Operations 159830 Sanity check global-multiwrite logs for ConfirmEdit usage In-Scope Open None
Other Operations 140879 503 error raises again while trying to load a Wikidata page In-Scope Open None
Other Operations 176666 Qualtrics email-LDAP issue In-Scope Open None
Other Operations 161598 Monitor HHVM bytecode cache depletion on mediawiki app servers In-Scope Open None
Other Operations 169884 Jobrunners generate mediawiki exceptions upon calling Closure$RecentChange::save In-Scope Open None
Other Operations 163402 Ensure we can survive a loss of labservices1001 In-Scope Open None
Other Operations 167966 Look into feasibility of disabling sha-1 host keys on our ssh daemons In-Scope Open None
Other Operations 150356 Wikidata Query Service is overly verbose toward logstash In-Scope Open None
Other Operations 177225 Uninstall ganglia from the fleet Screep Open None
Other Operations 95801 Allow customizing the alert message from graphite In-Scope Open None
Other Operations 168460 Update certificates on productions replicas of corp.wikimedia.org LDAP In-Scope Open None
Other Operations 169287 etcd config depends on puppet certs, but puppet doesn't know In-Scope Open None
Other Operations 171758 Support git-lfs files in gerrit In-Scope Open None
Other Operations 178613 add missing asset tag and correct location in rack for cr1-eqdfw Screep Open None
Other Operations 172538 rack/setup/install labvirt10(19|20).eqiad.wmnet In-Scope Open None
Other Operations 113785 Make the Shinken IRC alert and icinga-wm bots use colors In-Scope Open None
Other Operations 175213 2017/18 Annual Plan Program 8: Multi-datacenter support, Q2 goals In-Scope Open None
Other Operations 151273 lvs4002 power supply failure In-Scope Open None
Other Operations 177371 Phase out DSA keys for SSH access (ssh-dss) Screep Open None
Other Operations 177826 Upgrade ci ssh key to ecdsa Screep Open None
Other Operations 169246 Stress/capacity test new ores* cluster In-Scope Open None
Other Operations 160229 Back up of Commons files In-Scope Open None
Other Operations 152767 Missing Labs hiera entry in labs-private repo In-Scope Open None
Other Operations 161145 Fix the general problem of randomly-bad puppet agent cron timings within redundant clusters In-Scope Open None
Other Operations 134811 Consider REST with SSL (HyperSwitch/Cassandra) for session storage In-Scope Open None
Other Operations 177821 Allow syslog-tls in analytics towards wezen/lithium Screep Open None
Other Operations 140594 svn.wikimedia.org redirects to Diffusion main page, hence hard to find e.g. "flexbisonparse" In-Scope Open None
Other Operations 149617 Integrating MediaWiki (and other services) with dynamic configuration In-Scope Open None
Other Operations 127797 document all puppet classes / defined types!? In-Scope Open None
Other Operations 67394 [EPIC] Performance testing environment In-Scope Open None
Other Operations 127825 Re-add intel-microcode In-Scope Open None
Other Operations 132256 Analytics hosts showed high temperature alarms In-Scope Open None
Other Operations 148614 Icinga check for Tor In-Scope Open None
Other Operations 126295 Spike: What do we have to package to run the Programs and Events dashboard on production? In-Scope Open None
Other Operations 175798 Port non-deprecated Diamond collectors to Prometheus In-Scope Open None
Other Operations 150020 Refactor puppet-postgresql module to use custom types In-Scope Open None
Other Operations 135128 Turn on etcd TLS for intra-cluster communications In-Scope Open None
Other Operations 164341 Decommission old memcached hosts - mc1001->mc1018 In-Scope Open None
Other Operations 161003 Cross-check disabled accounts from corp LDAP against data.yaml In-Scope Open None
Other Operations 169249 /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam In-Scope Open None
Other Operations 136735 create endowment.wm.org microsite In-Scope Open None
Other Operations 142815 Enhance account handling (meta bug) In-Scope Open None
Other Operations 162955 rebuild tools-grid-master as a large instance In-Scope Open None
Other Operations 157306 Fix config file handling for /etc/hhvm/php.ini In-Scope Open None
Other Operations 128716 Make icinga-wm report Tools homepage check at #wikimedia-labs, too In-Scope Open None
Other Operations 169286 labstore1005 A PCIe link training failure error on boot In-Scope Open None
Other Operations 175361 Upgrade mx1001/mx2001 to stretch In-Scope Open None
Other Operations 165781 rack/setup/install labcontrol100[34] In-Scope Open None
Other Operations 178570 [subtask] How should we get Chromium for use in puppeteer? Screep Open None
Other Operations 171923 thorium - failed git clone of geowiki-data-private In-Scope Open None
Other Operations 169318 Use multiple puppetdbs on puppet masters In-Scope Open None
Other Operations 142821 Synchronise groups defined in data.yaml to LDAP In-Scope Open None
Other Operations 170481 FY2017/18 Program 6 - Outcome 2 - Objective 2: Set up a continuous integration and deployment pipeline In-Scope Open None
Other Operations 127549 move travel related aliases to OIT In-Scope Open None
Other Operations 151314 logrotate failing with $FILE.1.gz: File exists In-Scope Open None
Other Operations 173374 Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." In-Scope Open None
Other Operations 158837 Consolidate performance website and related software In-Scope Open None
Other Operations 110240 [Discussion] Consider validating JSON schemas when running x-ample tests? In-Scope Open None
Other Operations 176774 Reimage cobalt as stretch In-Scope Open None
Other Operations 171157 Monitor internal CA expirations In-Scope Open None
Other Operations 119846 Redirect revisions from svn.wikimedia.org to https://phabricator.wikimedia.org/rSVN In-Scope Open None
Other Operations 162029 Migrate all jessie hosts to Linux 4.9 In-Scope Open None
Other Operations 123237 Provide production jessie image with node 4.2; use this for service-runner build command In-Scope Open None
Other Operations 150771 Secondary production Jenkins for CI In-Scope Open None
Other Operations 171452 Integrate jessie 8.9 point release In-Scope Open None
Other Operations 156136 Increase swift replication factor for accounts In-Scope Open None
Other Operations 86546 graphite-web logs are not rotated In-Scope Open None
Other Operations 135318 Document how to handle 'inconsistent state within the internal storage backends' issues In-Scope Open None
Other Operations 152073 Check concurrency/retry/timeout limits and syncronize those between services In-Scope Open None
Other Operations 124991 evaluate possibility for nscd use with useldap In-Scope Open None
Other Operations 120856 Remove all out of warranty unused cp10xx's from A2 In-Scope Open None
Other Operations 177055 Update docker image docker-registry.wikimedia.org/wikimedia-jessie In-Scope Open None
Other Operations 155761 DNS repo: add Jenkins job to ensure there are no duplicates In-Scope Open None
Other Operations 166937 Broken /a/refinery-source/guard/run_all_guards.sh script on stat1002 In-Scope Open None
Other Operations 177875 Degraded RAID on bast3002 Screep Open None
Other Operations 175210 Select candidate jobs for transferring to the new infrastucture In-Scope Open None
Other Operations 136094 Race condition in setting net.netfilter.nf_conntrack_tcp_timeout_time_wait In-Scope Open None
Other Operations 162245 Enable GC for HHVM CLI (at least for dump runners) In-Scope Open None
Other Operations 135124 Deploy etcddump (or another etcd dump & load tool) to production In-Scope Open None
Other Operations 177881 Degraded RAID on lvs3001 Screep Open None
Other Operations 150823 Puppet CA rollover In-Scope Open None
Other Operations 138136 MB Lateefi Fonts for Sindhi Wikipedia. In-Scope Open None
Other Operations 141520 "MediaWiki exceptions and fatals per minute" alarm is too slow (half an hour delay!) In-Scope Open None
Other Operations 133844 Improve Elasticsearch icinga alerting In-Scope Open None
Other Operations 175710 Add profiling for Varnish and VCL In-Scope Open None
Other Operations 129847 conftool-merge should report which node is setting attributes for In-Scope Open None
Other Operations 169518 Decommission esams ms-fe / ms-be In-Scope Open None
Other Operations 170995 Setup a mirror for R language dependencies (CRAN) In-Scope Open 10.0
Other Operations 148647 refresh swift hardware in codfw/eqiad In-Scope Open None
Other Operations 169290 New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS In-Scope Open None
Other Operations 151702 API cluster failure / OOM In-Scope Open None
Other Operations 165784 rack/setup/install labmon1002 In-Scope Open None
Other Operations 137397 revisit swift (sys)logging In-Scope Open None
Other Operations 165511 Change automatic shortlink in blog theme In-Scope Open None
Other Operations 174172 unused grafana-dashboard indices on elasticsearch / logstash In-Scope Open None
Other Operations 84163 Fix CirrusSearch monitoring In-Scope Open None
Other Operations 175850 Spike: Enumerate remaining unported stats In-Scope Open None
Other Operations 119660 Set up LVS for labs dns recursors In-Scope Open None
Other Operations 163068 More missing 'original' files on Commons In-Scope Open None
Other Operations 84700 Setup management switch in OE12 In-Scope Open None
Other Operations 123918 'swift' user/group IDs should be consistent across the fleet In-Scope Open None
Other Operations 164819 reprepro: Support for buildinfo files / dbgsym packages In-Scope Open None
Other Operations 162013 etcd cluster in codfw has raft consensus issues In-Scope Open None
Other Operations 146657 create notifications about user accounts that have not been used for a long time In-Scope Open None
Other Operations 166233 Update redis puppet class to support stretch In-Scope Open None
Other Operations 145065 Decrease time required to fully restart the Cirrus elasticsearch clusters In-Scope Open None
Other Operations 118154 determine hardware needs for dumps in eqiad and codfw In-Scope Open None
Other Operations 177405 rack and setup db1107 and db1108 Elaborated Open None
Other Operations 162612 codfw/eqiad hosts occasionally spend > 3 minutes starting networking.service with linux 4.9 In-Scope Open None
Other Operations 164042 Racktables: clearly show when hosts are decommissioned In-Scope Open None
Other Operations 152632 Explore hosting the multimedia commons use case In-Scope Open None
Other Operations 177931 Decommission OCG from production Elaborated Open None
Other Operations 176364 Request access to logstash (nda group) for @framawiki In-Scope Open None
Other Operations 138799 Create a simple puppet role for setting up a singlenode kubernetes install In-Scope Open None
Other Operations 141524 eventbus should send statsd in batches In-Scope Open None
Other Operations 177196 Port non-deprecated Diamond collectors to Prometheus Screep Open None
Other Operations 76306 Set warning thresholds for average cluster utilization In-Scope Open None
Other Operations 117673 labs precise and jessie instance not accessible after provisioning In-Scope Open None
Other Operations 178454 Icinga disk space alert when a Docker container is running on an host Screep Open None
Other Operations 158288 Unclean stop of jobrunner service via puppet In-Scope Open None
Other Operations 98831 Honor DNT header for access logs & varnish logs In-Scope Open None
Other Operations 132324 Tracking and Reducing cron-spam from root@ In-Scope Open None
Other Operations 165170 rack/setup/install ores2001-2009 In-Scope Open None
Other Operations 148637 Port redis statistics to Prometheus In-Scope Open None
Other Operations 101141 udp rcvbuferrors and inerrors on graphite1001 In-Scope Open None
Other Operations 124413 confctl should provide tags information after writing data In-Scope Open None
Other Operations 95052 Make ircecho much better In-Scope Open None
Other Operations 151489 silver: /dev/md2 mounted twice In-Scope Open None
Other Operations 95053 ircecho should accept input via unix sockets In-Scope Open None
Other Operations 158022 make apt.wikimedia.org HA In-Scope Open None
Other Operations 159687 etcd switchover/enhancements In-Scope Open None
Other Operations 142991 Enable "upload by url" feature at zhwiki In-Scope Open None
Other Operations 138017 Improve automation around Maps servers In-Scope Open None
Other Operations 156143 High CPU usage from swift-proxy on frontend machines In-Scope Open None
Other Operations 138866 Update & standardize Platform-specific_documentation for HP servers In-Scope Open None
Other Operations 158583 Restructure our internal repositories further In-Scope Open None
Other Operations 151050 Proper documentation for Yubico 2FA for production use In-Scope Open None
Other Operations 177208 Provide dedicated database resources for wikidata Screep Open None
Other Operations 171188 Move the main WMCS puppetmaster into the Labs realm In-Scope Open None
Other Operations 177387 Decomission mw1161-69 Elaborated Open None
Other Operations 164290 Set up external DNS record for wikitech-static In-Scope Open None
Other Operations 175637 End of September milestone: Migrate first production use case In-Scope Open None
Other Operations 87220 Minimize differences between beta and production (Tracking) In-Scope Open None
Other Operations 166291 Exim panics when spamd reaches maxchildren In-Scope Open None
Other Operations 155401 Integrate jessie 8.7 point release In-Scope Open None
Other Operations 177623 check lvs4002 power supply redundancy Screep Open None
Other Operations 98984 Check power supply balance settings on cp3030+ In-Scope Open None
Other Operations 172815 Improve stability and maintainability of our browser-based PDF render service In-Scope Open None
Other Operations 171191 Should puppet auto-restart slapd? In-Scope Open None
Other Operations 131748 Refresh the appservers puppet code/configs In-Scope Open None
Other Operations 132216 Setting up bulk proxies pointing to a multiwiki mediawiki-vagrant setup running on a labs vm In-Scope Open None
Other Operations 111934 Nutcracker stats monitoring should only listen on localhost In-Scope Open None
Other Operations 151493 silver: / partition low on space In-Scope Open None
Other Operations 166066 Integrate the puppet compiler in the puppet CI pipeline In-Scope Open None
Other Operations 162122 Swiftrepl was stuck in an infinite loop since days In-Scope Open None
Other Operations 161834 Undo special tools-home and tools-project share definitions for NFS In-Scope Open None
Other Operations 165105 Wiley requests for DOI and some other publishers don't work in production In-Scope Open None
Other Operations 151045 Extending Yubico 2FA for production use (meta bug) In-Scope Open None
Other Operations 149421 Long running mediawiki web requests impacts service availability, specially databases In-Scope Open None
Other Operations 177914 Switch labstore servers to default SSH configuration Screep Open None
Other Operations 171745 nscd does not cache localhost causing high CPU usage when localhost is often resolved In-Scope Open None
Other Operations 152724 Current state and next steps for RESTBase storage In-Scope Open None
Other Operations 156232 confctl SubjectAltNameWarning after python-urllib3 upgrade In-Scope Open None
Other Operations 146841 Reach out to Google about @yahoo.com emails not reaching gmail inboxes (when sent to mailing lists) In-Scope Open None
Other Operations 150486 Deploy federation for Prometheus In-Scope Open None
Other Operations 123818 setup YubiHSM and laptop at office In-Scope Open None
Other Operations 134271 Replace ircd-ratbox with something newer/maintained In-Scope Open None
Other Operations 178742 Possibly faulty BBU on analytics1029 Screep Open None
Other Operations 88997 Improve graphite failover In-Scope Open None
Other Operations 94215 decommission cp3001 & cp3002 In-Scope Open None
Other Operations 137616 Epic: cultivating the Maps garden In-Scope Open None
Other Operations 178628 Improve puppet alerting Screep Open None
Other Operations 170628 Jessie rsvg/cairo can't render specific SVG file on Commons In-Scope Open None
Other Operations 171482 Programmatic generation of grafana dashboards In-Scope Open None
Other Operations 167035 stretch acct monthly cron will spam when /var/log/wtmp.1 doesn't exist In-Scope Open None
Other Operations 165136 Ferm rules for labstore NFS hosts In-Scope Open None
Other Operations 142827 Enforce reference to Phabricator task for all commits to modules/admin/data/data.yaml In-Scope Open None
Other Operations 106346 setup an alertable threshold for Cassandra heap dumps In-Scope Open None
Other Operations 150396 Phabricator leaving old files in /tmp In-Scope Open None
Other Operations 97204 RFC: Request timeouts and retries In-Scope Open None
Other Operations 169570 nfs-manage failover script needs to be tested with real load and fixed In-Scope Open None
Other Operations 174720 letsencrypt::cert::integrated and non-http servers In-Scope Open None
Other Operations 128715 Add other Tools administrators to the Icinga notification group In-Scope Open None
Other Operations 176957 Deprecate host copper.eqiad.wmnet In-Scope Open None
Other Operations 166322 spam from phabricator in labs In-Scope Open None
Other Operations 177195 Reduce technical debt in metrics monitoring Screep Open None
Other Operations 152100 should we make privatewiki list available to puppet without maintaining two lists? In-Scope Open None
Other Operations 162123 Running swiftrepl is not puppetized In-Scope Open None
Other Operations 163823 During labservices1001 failover fqdn changed from foo.project.eqiad.wmflabs to foo.eqiad.wmflabs In-Scope Open None
Other Operations 146968 OTRS spam classification methods and systems In-Scope Open None
Other Operations 147040 Two recently uploaded files have disappeared (404) In-Scope Open None
Other Operations 136702 Increase time before alert for elasticsearch disk space issues In-Scope Open None
Other Operations 128615 Get rid of Tool Labs home page check from shinken In-Scope Open None
Other Operations 156398 Decommission or repair old asw-c2-eqiad In-Scope Open None
Other Operations 169548 Prepare for Puppet 4 In-Scope Open None
Other Operations 137791 libcglib3-java replaces libcglib-java in Jessie In-Scope Open None
Other Operations 175885 Toolforge's static webserver broken by Puppet changes and stale nginx packages In-Scope Open None
Other Operations 150532 Upgrade qemu on ganeti clusters to 2.7 In-Scope Open None
Other Operations 177622 Multiple systems in ulsfo 1.22 showing PSU failures Elaborated Open None
Other Operations 84845 improve cron spam visibility Screep Open None
Other Operations 115757 document debian packaging guidelines In-Scope Open None
Other Operations 132325 Weak digest algorithm (SHA1) used to sign InRelease on apt.wikimedia.org In-Scope Open None
Other Operations 161920 logrotate for ruthenium In-Scope Open None
Other Operations 151049 Run systematic availability tests In-Scope Open None
Other Operations 112774 solve mtp panel issue for row uplinks In-Scope Open None
Other Operations 178008 ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) Screep Open None
Other Operations 159661 Improve Terbium (and wasat) userland to process server side uploads In-Scope Open None
Other Operations 171619 ORES should use a git large file plugin for storing serialized binaries In-Scope Open None
Other Operations 141783 Add monitoring for detecting when logstash services are down In-Scope Open None
Other Operations 160071 Add slabinfo prometheus exporter In-Scope Open None
Other Operations 159354 Move coal from graphite machine(s) In-Scope Open None
Other Operations 162037 Use SSL certificates with discovery entry for elasticsearch In-Scope Open None
Other Operations 160644 Eventstreams graphite disk usage In-Scope Open None
Other Operations 115899 Move scap target configuration to etcd In-Scope Open None
Other Operations 124179 Improve access to and control over incident and metrics monitoring infrastructure In-Scope Open None
Other Operations 149057 Designate seems very slow to delete records? In-Scope Open None
Other Operations 107108 Flow notification links on mobile point to desktop In-Scope Open None
Other Operations 125442 es2009 degraded RAID In-Scope Open None
Other Operations 126574 puppet should try to mount all mountable swift filesystems In-Scope Open None
Other Operations 36947 Incorrect text positioning in SVG rasterization (scale/transform; font-size; kerning) In-Scope Open 0.0
Other Operations 173436 Delete graphite metrics for old CFs In-Scope Open None
Other Operations 122210 Security audit for tftp on Carbon In-Scope Open None
Other Operations 153416 docker-engine pulled into our repositories only keeps the latest version In-Scope Open None
Other Operations 163507 Intermittent DB connectivity problem on phabricator, needs investigation In-Scope Open None
Other Operations 163362 audit all codfw pdu tower draws In-Scope Open None
Other Operations 160412 Add lock_wait_timeout to maintain_views and maintain-meta_p In-Scope Open None
Other Operations 137939 Increase frequency of OSM replication In-Scope Open None
Other Operations 168445 Reboots of cloud servers In-Scope Open None
Other Operations 111540 Clean up labs graphite datapoints In-Scope Open None
Other Operations 144006 Move the MW Beta appservers to Debian In-Scope Open None
Other Operations 116742 Track amount of package updates on systems In-Scope Open None
Other Operations 160101 Upgrade php5-json .deb to at least 1.3.8 In-Scope Open None
Other Operations 109089 EPIC: Cultivating the Elasticsearch garden (operational lessons from 1.7.1 upgrade) In-Scope Open None
Other Operations 161864 404 error while accessing some images files e.g. djvu and jpg In-Scope Open None
Other Operations 166081 rack/setup/install conf1004-conf1006 In-Scope Open None
Other Operations 144539 Remove /srv/deployment/wdqs/wdqs/rules.log symlink In-Scope Open None
Other Operations 164993 archiva artifact links point to 127.0.0.1 In-Scope Open None
Other Operations 177624 check cp4007 power supply redundancy Screep Open None
Other Operations 170298 sshd stretch puppet support In-Scope Open None
Other Operations 153940 Logrotate fails for: "$FILE No such file or directory" In-Scope Open None
Other Operations 177501 Prometheus cluster attribute for new RESTBase Cassandra cluster Screep Open None
Other Operations 120159 Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module In-Scope Open None
Other Operations 133674 HHVM is leaking memory on the API appservers In-Scope Open None
Other Operations 138821 extend existing graphite whisper files retention to five years In-Scope Open None
Other Operations 92471 enable authenticated access to Cassandra JMX In-Scope Open None
Other Operations 102575 document graphite failover/backfill procedures In-Scope Open None
Other Operations 135338 On Trusty and Jessie PHP yields: PHP Deprecated: Comments starting with '#' are deprecated in /etc/php5/cli/conf.d/20-xhprof.ini on line 2 In-Scope Open None
Other Operations 178491 Make WDQS throttling more aggressive Screep Open None
Other Operations 164460 Use DNS discovery record for deployment CNAME In-Scope Open None
Other Operations 164238 move icinga contacts file to public repo In-Scope Open None
Other Operations 159750 E-mail for people in different OIT LDAP object unit In-Scope Open None
Other Operations 50029 crackling at start of OGG renditions of MIDI files (fixed in TiMidity++ 2.14.0) In-Scope Open None
Other Operations 110169 Monitor redis memory/disk usage In-Scope Open None
Other Operations 165519 rack and setup mw1307-1348 In-Scope Open None
Other Operations 170474 Decommisson and store old row D network gear. In-Scope Open None
Other Operations 143556 Setting up grafana should also setup Anonymous read-only access for the default org In-Scope Open None
Other Operations 138496 bring swift eqiad to one zone per row In-Scope Open None
Other Operations 175362 Split MXes into inbound and outbound In-Scope Open None
Other Operations 112257 rename cassandra cluster In-Scope Open None
Other Operations 165323 Add Prometheus machine metric to track core dumps In-Scope Open None
Other Operations 169680 NFS on dataset1001 overloaded, high load on the hosts that mount it In-Scope Open None
Other Operations 118812 Investigate mysterious_sysctl settings and figure out what to do with them In-Scope Open None
Other Operations 169849 Architecture and puppetize setup for dumpsdata boxes In-Scope Open None
Other Operations 130512 Grafana: Job Queue Health: Panel is displayed incorrectly In-Scope Open None
Other Operations 151275 cp4008 and cp4012 running on single PSU In-Scope Open None
Other Operations 135113 Rationalize our jobqueues redis topology In-Scope Open None
Other Operations 104352 Make scap able to depool/repool servers via the conftool API In-Scope Open None
Other Operations 120585 Make l10nupdate user a system user In-Scope Open None
Other Operations 131832 Unable to restore file that has a very large file size In-Scope Open None
Other Operations 132632 puppetize turning off reserved space for cassandra /srv In-Scope Open None
Other Operations 129180 Preserve SSH host key when re-imaging hosts In-Scope Open None
Other Operations 161904 decommission backup4001 In-Scope Open None
Other Operations 153279 labnet/ labtestnet2001 - disk space - nova-api.log needs rotation In-Scope Open None
Other Operations 133913 Completely port l10nupdate to scap In-Scope Open None
Other Operations 137229 Tune thread for osm2pgsql / postgres max connections for Maps In-Scope Open None
Other Operations 133093 Investigate idle appservers in codfw In-Scope Open None
Other Operations 148478 Investigate seemingly random Gerrit slow-downs In-Scope Open None
Other Operations 140942 Tracking: Monitoring and alerts for "business" metrics In-Scope Open None
Other Operations 114446 move human users out of UID range for system accounts In-Scope Open None
Other Operations 150811 Evaluate ScyllaDB as a near-term replacement to Cassandra In-Scope Open None
Other Operations 104671 Rename 'restricted' group? In-Scope Open None
Other Operations 177639 check mw2176 power supply redundancy Screep Open None
Other Operations 139971 access_new_install role vs. Labs vs. the future In-Scope Open None
Other Operations 167689 Add RIPE atlas data to Prometheus In-Scope Open None
Other Operations 178538 Bump PHP requirement to 5.6 in 1.31 Screep Open None
Other Operations 177276 Unify production and CI docker image build process Screep Open None
Other Operations 141756 audit / test / upgrade hp smartarray P840 firmware In-Scope Open None
Other Operations 146355 Replace etcd internal auth mechanism with a frontend proxy In-Scope Open None
Other Operations 177254 Upgrade to puppet 4 (4.8 or newer) Screep Open None
Other Operations 40860 security@mediawiki.org : Create a public key and publish it on the public key servers In-Scope Open None
Other Operations 153068 Consider mounting labs NFS labstore1003.eqiad.wmnet:/scratch for server-side uploads In-Scope Open None
Other Operations 176335 logs sent to logstash are lost when the elasticsearch cirrus cluster is unavailable In-Scope Open None
Other Operations 169322 Research whether it makes sense to have OTRS installation in an HA setup In-Scope Open None
Other Operations 150651 Information missing from racktables In-Scope Open None
Other Operations 152439 cronspam from labtestservices2001 /etc/dns-floating-ip-updater.py > /dev/null In-Scope Open None
Other Operations 93531 secure.wikimedia.org entries still showing up in Google search results In-Scope Open None
Other Operations 133476 Proposal: Centralize OTRS login methodology In-Scope Open None
Other Operations 46016 SVG fails to render properly due to several issues In-Scope Open None
Other Operations 164703 Integrate jessie 8.8 point release In-Scope Open None
Other Operations 175625 scs-c1-eqiad unresponsive In-Scope Open None
Other Operations 32716 Run our own Tor client for Tor block In-Scope Open None
Other Operations 153703 Option: Consider switching back to leveled compaction (LCS) In-Scope Open None
Other Operations 169937 Services Q1 2017/18 goal: Begin migrating job queue processing to multi-DC enabled eventbus infrastructure. In-Scope Open None
Other Operations 130209 Collect threaddumps from elasticsearch at regular intervals In-Scope Open None
Other Operations 149287 Heating alerts for mw servers in eqiad In-Scope Open None
Other Operations 147366 Setup automated topk wide row reporting In-Scope Open None
Other Operations 93138 Procure hardware for Sentry In-Scope Open None
Other Operations 155929 Create /community-beacon alternative entry point In-Scope Open None
Other Operations 169969 Regularly purge old ores graphite metrics In-Scope Open None
Other Operations 109090 Investigate the need for master only (non data nodes) in our ES cluster In-Scope Open None
Other Operations 147204 Update confd package In-Scope Open None
Other Operations 162850 CPU throttling on DELL PowerEdge R320 In-Scope Open None
Other Operations 177742 Investigate Chrony as a replacement for ISC ntpd Screep Open None
Other Operations 174449 tin has a failing hdd In-Scope Open None
Other Operations 41785 Create a labs SMTP smarthost In-Scope Open None
Other Operations 67270 Default license for operations/puppet In-Scope Open None
Other Operations 109606 Re-evaluate Limesurvey In-Scope Open None
Other Operations 173710 Job queue is increasing non-stop In-Scope Open None
Other Operations 135125 Install a second etcd cluster in codfw In-Scope Open None
Other Operations 177739 Integrate stretch 9.2 point release Screep Open None
Other Operations 106664 Set up role accounts and feedback loops (FBL) with all providers In-Scope Open None
Other Operations 46791 [[wikitech:Server_admin_log]] should not rely on freenode irc for logmsgbot entries In-Scope Open None
Other Operations 178575 Add require_package() variant with repository component to wmflib Screep Open None
Other Operations 114849 Log lines on flourine overflow at 8092 bytes. In-Scope Open None
Other Operations 17000 Special:Import error: "Import failed: Could not open import file" In-Scope Open None
Other Operations 121610 system users with UIDs > 500 In-Scope Open None
Other Operations 116767 limit the impact of heavy/large graphite queries In-Scope Open None
Other Operations 167549 Create Icinga alert when OSM replication lags on maps In-Scope Open None
Other Operations 119401 Untangle labs/production roles from labs/instance roles In-Scope Open None
Other Operations 177891 Update and use php-wikidiff2 to 1.5 in production Screep Open None
Other Operations 147905 investigate lead hardware issue In-Scope Open None
Other Operations 119679 Rewrite http://download.wikimedia.org/mediawiki/ -> https://releases.wikimedia.org/mediawiki in less than 3 redirects In-Scope Open None
Other Operations 134875 udpmxircecho spam/not working if unable to connect to irc server In-Scope Open None
Other Operations 121105 Mails from MediaWiki seem to get (partially) lost In-Scope Open None
Other Operations 149845 Something is wrong with installer root disk stuff In-Scope Open None
Other Operations 76203 Make ircecho run as its own user In-Scope Open None
Other Operations 154665 Look into behaviour of /etc/exim4/update-exim4.conf.conf related to updates In-Scope Open None
Other Operations 178625 puppetdb failures Screep Open None
Other Operations 163354 Find a way to verify mediawiki-config IPs ahead of datacenter switchovers In-Scope Open None
Other Operations 170152 mc2023 / mc2025 fail to mount root partition within 90 seconds using Linux 4.9 In-Scope Open None
Other Operations 140270 Determine a core set or a checklist of permissions for deployment purpose In-Scope Open None
Other Operations 118331 Alert when used_memory gets too high for redis queues In-Scope Open None
Other Operations 130590 Have dedicated master nodes for elasticsearch In-Scope Open None
Other Operations 159524 backup space is used unwisely In-Scope Open None
Other Operations 140536 Thumbnails of some specific images show unwanted black lines In-Scope Open None
Other Operations 175242 all log producers need to use the logstash LVS endpoint In-Scope Open None
Other Operations 150460 Configure maps cluster to send statsd metrics to the statsd endpoint in the same datacenter In-Scope Open None
Other Operations 125085 Split the API MediaWiki appserver pool into two external/internal pools In-Scope Open None
Other Operations 177253 Upgrade puppetDB to version 3.2 or newer Screep Open None
Other Operations 56713 Non-NDA users cannot access graphite.wikimedia.org In-Scope Open None
Other Operations 129188 mw2212 unresponsive In-Scope Open None
Other Operations 135122 Reduce etcd technical debt In-Scope Open None
Other Operations 175876 document all scs connections In-Scope Open None
Other Operations 97524 ocg alarm ocg_job_status_queue 'flapping' In-Scope Open None
Other Operations 166368 Wipe of spare/replacement disks In-Scope Open None
Other Operations 133318 High levels of PoolCounter errors should trigger alerts In-Scope Open None
Other Operations 157038 Make it possible to run the mediawiki testsuite against a staging repo of apt.wikimedia.org In-Scope Open None
Other Operations 156937 Provide cross-dc redundancy (active-active or active-passive) to all important misc services In-Scope Open None
Other Operations 167239 Redirect status.wikipedia.org to status.wikimedia.org In-Scope Open None
Other Operations 131928 Upgrade jessie systems from Linux 3.19 to 4.4 In-Scope Open None
Other Operations 161296 Upgrade mysqld_exporter to 0.10.0 In-Scope Open None
Other Operations 82350 update exim::listserve::private::mailing_lists value in puppet In-Scope Open None
Other Operations 144431 RESTBase k-r-v as Cassandra anti-pattern In-Scope Open None
Other Operations 168562 Reimage gerrit2001 as stretch In-Scope Open None
Other Operations 156475 Investigate spike in 500s during asw-c2-eqiad replacement In-Scope Open None
Other Operations 175150 Decommission stat1003.eqiad.wmnet In-Scope Open None
Other Operations 175830 decommission logstash100[1-3] In-Scope Open None
Other Operations 134326 udpmxircecho should write stats of messages processed and we should alert when that drops to zero In-Scope Open None
Other Operations 124185 Evaluate alternative web interfaces to icinga 1 core In-Scope Open None
Other Operations 144933 Cleanup debconf handling in mailman puppet setup In-Scope Open None
Other Operations 88730 Nutcracker needs to automatically recover from MC failure - rebalancing issues In-Scope Open None
Other Operations 136603 Update limit.sh to support systemd-based cgroup management In-Scope Open None
Other Operations 171048 Eventbus does not handle gracefully changes in DNS recursors In-Scope Open None
Other Operations 114337 Assign 3 more servers to video scaler duty In-Scope Open None
Other Operations 156544 Create backups of Wikimedia content in diverse geographic places In-Scope Open None
Other Operations 97909 Upgrade jobrunners to redis 2.8 In-Scope Open None
Other Operations 176370 Migrate to PHP 7 in WMF production In-Scope Open None
Other Operations 121240 Network isolation for production and semi-production services In-Scope Open None
Other Operations 103886 Translation cache exhaustion caused by changes to PHP code in file scope In-Scope Open None
Other Operations 113104 Set up a service IP for logstash In-Scope Open None
Other Operations 160146 jobrunner/jobchron services fail in codfw In-Scope Open None
Other Operations 133643 Upstream our Diamond PowerDNSRecursorCollector In-Scope Open None
Other Operations 123809 Module uwsgi doesn't allow passing multiple config params of same name In-Scope Open None
Other Operations 148968 Build Kubernetes for production use In-Scope Open None
Other Operations 155129 Create prometheus nutcracker exporter In-Scope Open None
Other Operations 164490 maintain-meta_p hangs on connecting to wikimedia.org.uk In-Scope Open None
Other Operations 168891 rack/setup/install labtestmetal2001.codfw.wmnet In-Scope Open None
Other Operations 126083 overhaul labstore setup [tracking] In-Scope Open None
Other Operations 130709 authoritative copy of 'root' files for upload.wikimedia.org is only in swift In-Scope Open None
Other Operations 163996 Icinga check for ipv6 host reachability In-Scope Open None
Other Operations 120377 labmon1001 graphite instance archiver keeps archiving the same instances In-Scope Open None
Other Operations 156140 Lots of hosts with hyperthreading disabled In-Scope Open None
Other Operations 120165 Implement role based hiera lookups for labs In-Scope Open None
Other Operations 113792 Change LDAP cn to something more useful (was Rename "Dzahn" to "Daniel Zahn" in Gerrit) In-Scope Open None
Other Operations 146090 High failure rate of account creation should trigger an alarm / page people In-Scope Open None
Other Operations 177374 decom wtp1001-wtp1024 Screep Open None
Other Operations 161096 confctl no longer logs a non-changing state change In-Scope Open None
Other Operations 175288 setup/install/deploy deploy1001 as deployment server In-Scope Open None
Other Operations 164123 tools-k8s-master-01 has two floating IPs In-Scope Open None
Other Operations 170365 move legal-tm-vio alias to OIT In-Scope Open None
Other Operations 150300 icinga notification if elevated writing to badpass.log In-Scope Open None
Other Operations 132532 rsync module doesnt work on trusty In-Scope Open None
Other Operations 151304 tmpreaper possible race condition In-Scope Open None
Other Operations 141038 implement icinga paging for non-ops teams In-Scope Open None
Other Operations 126281 [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") In-Scope Open None
Other Operations 94819 Audit racktables In-Scope Open None
Other Operations 140075 investigate swift used space spikes since June 2016 In-Scope Open None
Other Operations 171122 librenms: consider using Distributed Poller with multiple netmon servers In-Scope Open None
Other Operations 111838 Some files had disappeared from Commons after renaming In-Scope Open None
Other Operations 56515 Apply editing rate limits for all users In-Scope Open None
Other Operations 174465 Puppet admin module should support adding system users to managed groups In-Scope Open None
Other Operations 64987 librsvg misinterpret quoted font family names that contain whitespaces In-Scope Open None
Other Operations 154627 Production error message (when servers are down) points users to donate link which is likely to produce the same error message In-Scope Open None
Other Operations 174959 swift-recon-cron on ms-be203[34]: [Errno 17] File exists: '/var/lock/swift-recon-object-cron' In-Scope Open None
Other Operations 116627 Include 5xx numbers in fluorine fatalmonitor In-Scope Open None
Other Operations 173492 Tune Varnishkafka delivery errors to be more sensitive In-Scope Open None
Other Operations 94329 secure Cassandra/RESTBase cluster In-Scope Open None
Other Operations 135595 mod_deflate + mod_uwsgi causing mangled apache responses In-Scope Open None
Other Operations 168967 Upload shiny-server .deb to our Jessie apt repository In-Scope Open None
Other Operations 153246 Puppet failures with "Attempt to assign to a reserved variable name: 'trusted'" In-Scope Open None
Other Operations 149804 Review of ferm services without srange In-Scope Open None
Other Operations 83729 Fix monitoring of poolcounter service In-Scope Open None
Other Operations 178325 Operations 2017-18 Q2 Program 6 umbrella task Screep Open None
Other Operations 169564 MD RAID: remove mdadm daily check In-Scope Open None
Other Operations 178445 flapping monitoring for recommendation_api on scb Screep Open None
Other Operations 141959 Moving network::external to hiera broke much of labs In-Scope Open None
Other Operations 170353 Icinga: timeseries checks should have the link to a graph with the data In-Scope Open None
Other Operations 140316 Add granularity limiter (g=) to wikimedia.org DKIM record(s) In-Scope Open None
Other Operations 155209 Increase $wgHTTPImportTimeout to a higher value on WMF wikis In-Scope Open None
Other Operations 168490 upgrade planet instances to stretch In-Scope Open None
Other Operations 177638 check mw2160 power supply redundancy Screep Open None
Other Operations 150917 Remove deprecated features from book creator UI In-Scope Open None
Other Operations 140813 Protect sensitive user-related information with a UserData / auth / session service In-Scope Open None
Other Operations 110171 Alert when ES indexes are freezed for more than 30 minutes In-Scope Open None
Other Operations 177625 check cp4008 power supply redundancy Screep Open None
Other Operations 162090 Investigate alternative RAID strategies for labstore1001/2 In-Scope Open None
Other Operations 166038 Sync internal nutcracker package with Debian package In-Scope Open None
Other Operations 134551 Create functional cluster checks for all services (and have them page!) In-Scope Open None
Other Operations 165173 rack/setup/install dumpsdata100[12] In-Scope Open None
Other Operations 170640 reports.frdev.wm.o -- still in use? In-Scope Open None