TechnicalOperations Status Report

Project Operations from 2017-04-01 to 2017-07-01

Help

Network Operations 129136 Deploy libreNMS with scap3 In-Scope Done None
Network Operations 164911 LibreNMS improvements Screep Done None
Network Operations 159464 asw-a1-codfw spontaneous reboot In-Scope Done None
Network Operations 162152 esams higher than usual temperature Screep Done None
Network Operations 167281 Some BGP sessions to the SF Office down Screep Done None
Network Operations 164594 codfw: ganeti2007-ganeti2008 switch port configuration Screep Done None
Network Operations 167326 codfw: labtestneutron2002 switch port configuration Screep Done None
Network Operations 167322 codfw:labtestnet2002 switch port configuration Screep Done None
Network Operations 158647 cr2-knams<->asw-esams GBLX fiber down In-Scope Done None
Network Operations 159757 netmon1002 networking setup In-Scope Done None
Network Operations 162944 db20[7-9][0-9] switch ports configuration Screep Done None
Network Operations 163326 switchover oresrdb.svc.eqiad.wmnet from oresrdb1001 to oresrdb1002 and back (after T148506) Screep Done None
Network Operations 163324 switchover icinga.wikimedia.org from einsteinium to tegmen Screep Done None
Network Operations 163323 Interface errors on asw-c-codfw:xe-7/0/46 Screep Done None
Network Operations 163312 lvs2001: intermittent packet loss from Icinga checks Screep Done None
Network Operations 163243 ripe-atlas-eqiad is down Screep Done None
Network Operations 163002 Spread eqiad analytics Kafka nodes to multiple racks ans rows Screep Done None
Network Operations 164554 pfw-eqiad.wikimedia.org - 3 interfaces down (fundraising hosts) Screep Done None
Network Operations 167274 codfw row D switch upgrade Screep Done None
Network Operations 167261 Faulty link between cr2-codfw and cr1-eqdfw Screep Done None
Network Operations 154507 Slight packet loss observed on the network starting Nov 2016 In-Scope Done None
Network Operations 166564 codfw: labtestvirt2002 swith port configuration Screep Done None
Network Operations 80273 nagios monitor transit/peering links and alert on low/high traffic In-Scope Done None
Network Operations 166156 ores200[1-9] switch port configuration Screep Done None
Network Operations 162601 knams equipment move Screep Done None
Network Operations 165614 LLDP on cache hosts Screep Done None
Network Operations 167288 Rancid improvements Screep Done None
Network Operations 165288 Report of esams unreachable from Fastweb/Init7 Screep Done None
Network Operations 164495 Replace old Prometheus VMs addresses with baremetal in firewall configuration Screep Done None
Network Operations 168667 Temperature increase in esams Screep Done None
Network Operations 165008 Interface errors on asw-c-eqiad:xe-8/0/38 Screep Done None
Network Operations 164988 codfw: kubernetes200[1-4] switch port configuration Screep Done None
Network Operations 162239 cr2-esams FPC 0 is dead Screep Done None
Network Operations 168462 codfw row A switch upgrade Screep Done None
Network Operations 164970 mr1-ulsfo crashed Screep Done None
Network Operations 133852 analytics hosts frequently tripping 'port utilization threshold' librenms alerts In-Scope Done None
Network Operations 82038 create a test for multicast relay In-Scope Open None
Network Operations 120425 dumps.wikimedia.org seems to have poor throughput towards some destinations In-Scope Open None
Network Operations 169345 codfw row B switch upgrade Screep Open None
Network Operations 86541 setup wifi in codfw In-Scope Open None
Network Operations 167321 codfw: labtestpuppetmaster2001 switch port configuration Screep Open None
Network Operations 167299 Upgrade BIOS/RBSU/etc on lvs1007 Screep Open None
Network Operations 166171 rack/setup/wire/deploy msw2-c1-eqiad Screep Open None
Network Operations 133387 Enabling IGMP snooping on QFX switches breaks IPv6 (HTCP purges flood across codfw) In-Scope Open None
Network Operations 146391 eeden ethernet outage In-Scope Open None
Network Operations 98006 Anycast (Auth)DNS In-Scope Open None
Network Operations 83992 Juniper monitoring In-Scope Open None
Network Operations 150256 Re-setup lvs1007-lvs1012, replace lvs1001-lvs1006 In-Scope Open None
Network Operations 165584 Deploy pybal with BGP MED support (for primary/backup) in production Screep Open None
Network Operations 163674 Frequent RST returned by appservers to LVS hosts Screep Open None
Network Operations 122406 Consider renumbering Labs to separate address spaces In-Scope Open None
Network Operations 150264 Icinga check for VRRP In-Scope Open None
Network Operations 167306 ospf link-protection Screep Open None
Network Operations 157435 Review ACLs for the Analytics VLAN In-Scope Open None
Network Operations 167840 Merge AS14907 with AS43281 Screep Open None
Network Operations 148506 Rack and setup new eqiad row D switch stack (EX4300/QFX5100) In-Scope Open None
Traffic 118365 Increase request limits for GETs to /api/rest_v1/ In-Scope Done None
Traffic 92438 Changing the URL for the Wikimedia Shop Screep Done None
Traffic 147569 Evaluate/Deploy TCP BBR when available (kernel 4.9+) In-Scope Done None
Traffic 162976 dbtree broken (for some users?) Screep Done None
Traffic 162348 swift-object-server 1.13.1: Wrong Content-Type returned on 304 Not Modified responses Screep Done None
Traffic 162129 Someone is parsing all enwiki pages using the action api at a rate of ~2M pages/hour Screep Done None
Traffic 162111 Make WDQS active / active Screep Done None
Traffic 162073 Ops Onboarding for Arzhel Younsi Screep Done None
Traffic 151444 Huge increase in cache_upload 404s due to buggy client-side code from graphiq.com In-Scope Done None
Traffic 162035 Some PNG thumbnails and JPEG originals delivered as [text/html] content-type and hence not rendered in browser Screep Done None
Traffic 161819 Investigate 502 errors from nginx when backend returns 302 In-Scope Done None
Traffic 143078 Unhandled pybal ValueError: need more than 1 value to unpack In-Scope Done None
Traffic 126321 lvs2002 Embedded Flash/SD-CARD iLO errors In-Scope Done None
Traffic 74072 Disable SSL 3.0 on Wikimedia sites to mitigate POODLE attack (CVE-2014-3566) Screep Done None
Traffic 160156 Add node_exporter ipvs ipv6 support In-Scope Done None
Traffic 159870 baham (ns1) CPU-related issues In-Scope Done None
Traffic 160109 What happened 2017-03-09 04:00 - 06:00 UTC In-Scope Done None
Traffic 126206 Upgrade to Varnish 4: things to remember In-Scope Done None
Traffic 159956 Prepare pywikibot for http -> https switch in entity uri In-Scope Done None
Traffic 154558 Periodic 500s from piwik.wikimedia.org In-Scope Done None
Traffic 168019 runUpdate.sh script in wikidata stand-alone has abruptly started incurring numerous 429 errors. Screep Done None
Traffic 166229 Mediawiki replies with 500 on wrongly formatted CSP report Screep Done None
Traffic 166120 phab.wmfusercontent.org "homepage" yields a 500 Screep Done None
Traffic 131131 Canonical URL in Store points to HTTP address, should be HTTPS In-Scope Done None
Traffic 149217 cp1066.mgmt.eqiad.wmnet is unreachable In-Scope Done None
Traffic 165324 Can't upload large files with X-Wikimedia-Debug turned on Screep Done None
Traffic 92709 wikitech.wikimedia.org SSL certificate considered "outdated security" in Chrome Screep Done None
Traffic 118557 Replace Analytics XFF/client.ip data with X-Client-IP In-Scope Done 5.0
Traffic 165063 varnish frontend transient memory usage keeps growing Screep Done None
Traffic 70861 Performance review #2 of Hovercards (Popups extension) In-Scope Done None
Traffic 167920 Impending load test Screep Done None
Traffic 140658 Make sure we're not relying on HTTP_PROXY headers In-Scope Done None
Traffic 164610 Unprovision cache_misc @ ulsfo Screep Done None
Traffic 164608 Merge cache_maps into cache_upload functionally Screep Done None
Traffic 89838 Move proxy IP lists to META for Varnish XFF decoding In-Scope Open None
Traffic 153468 Ferm/DNS library weirdness on deployment-mediawiki boxes In-Scope Open None
Traffic 153563 Consider switching to HTTPS for Wikidata query service links In-Scope Open None
Traffic 104225 enwiki Main_Page timeouts In-Scope Open None
Traffic 136737 Fix lvs1001-6 storage In-Scope Open None
Traffic 137161 Fix nits in HTTPS/HSTS configs in externally-hosted fundraising domains In-Scope Open None
Traffic 154017 compile number of http uses for http://www.wikidata.org/entity In-Scope Open None
Traffic 54253 Protocol-relative URLs are poorly supported or unsupported by a number of HTTP clients In-Scope Open None
Traffic 70528 stream.wikimedia.org - redirect http(s) to docs In-Scope Open None
Traffic 154702 Fix broken referer categorization for visits from Safari browsers In-Scope Open None
Traffic 154759 Pybal not happy with DNS delays In-Scope Open None
Traffic 154801 Investigate varnishd child crashes when multiple nodes get depooled/pooled concurrently In-Scope Open None
Traffic 122867 Evaluate the feasibility of cache invalidation for the action API In-Scope Open None
Traffic 154954 Purge Varnish cache when a banner is saved In-Scope Open 2.0
Traffic 137252 Redirect phabricator.mediawiki.org to phabricator.wikimedia.org In-Scope Open None
Traffic 155314 Varnish does not cache Action API responses when logged in In-Scope Open None
Traffic 104442 Investigate better DNS cache/lookup solutions In-Scope Open None
Traffic 104681 HTTPS Plans (tracking / high-level info) In-Scope Open None
Traffic 155806 Add CAA records to our domains In-Scope Open None
Traffic 137979 Support brotli compression In-Scope Open None
Traffic 156028 Name Asia Cache DC site In-Scope Open None
Traffic 156030 Select site vendor for Asia Cache Datacenter In-Scope Open None
Traffic 156032 Server hardware installation for Asia Cache DC In-Scope Open None
Traffic 137990 Zero: Investigate removing the limit on carrier tagging to m-dot and zero-dot requests In-Scope Open None
Traffic 105657 Expires header for load.php should be relative to request time instead of cache time In-Scope Open None
Traffic 138093 Investigate query parameter normalization for MW/services In-Scope Open None
Traffic 156320 $wgServer with initial https:// does not force HTTPS (wgSecureLogin) In-Scope Open None
Traffic 138546 Backend naming in VCL needs to use fqdn+port In-Scope Open None
Traffic 123854 Set up action API latency / error rate metrics & alerts In-Scope Open None
Traffic 106517 upload.wikimedia.org returns HTTP status code 503 for truncated urls, not 404 In-Scope Open None
Traffic 107236 Switch port 80 to nginx on primary clusters In-Scope Open None
Traffic 124418 Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan In-Scope Open None
Traffic 124954 Decrease max object TTL in varnishes In-Scope Open None
Traffic 108580 HTTPS for internal service traffic In-Scope Open None
Traffic 86915 nan and minnan subdomain redirects are a mess In-Scope Open None
Traffic 125170 Internal DNS resolver responds with NXDOMAIN for localhost AAAA In-Scope Open None
Traffic 109325 Outbound HTTPS for varnish backend instances In-Scope Open None
Traffic 109331 Deleted files sometimes remain visible to non-privileged users if permanently linked In-Scope Open None
Traffic 158599 Samsung Internet's desktop mode getting redirected to mobile site In-Scope Open None
Traffic 141266 letsencrypt puppetization: add parallel rsa+ecdsa cert support In-Scope Open None
Traffic 159056 cp2017 froze and stopped serving traffic In-Scope Open None
Traffic 159137 certspotter: Error retrieving STH from log In-Scope Open None
Traffic 141373 Age header reset to 0 after 24 hours on varnish frontends In-Scope Open None
Traffic 159346 convert mail servers from GS to LE certificates In-Scope Open None
Traffic 159411 Uniform cluster nomenclature across puppet In-Scope Open None
Traffic 159412 Convert all of our site.pp/roles to the role/profile paradigm In-Scope Open None
Traffic 159429 Allow setting varnish connection timeouts in puppet In-Scope Open None
Traffic 141480 mixed-content issues on planet.wikimedia.org In-Scope Open None
Traffic 109776 Tilerator should purge Varnish cache In-Scope Open None
Traffic 125938 PHP fatal errors causing Varnish to return 503 - "Junk after gzip data" In-Scope Open None
Traffic 88861 wikipedia.lol In-Scope Open None
Traffic 160616 Enable HTTPS for swift clients In-Scope Open None
Traffic 111588 RFC: API-driven web front-end In-Scope Open None
Traffic 74186 Varnish: Mobile site redirect interferes with OAuth authorization process In-Scope Open None
Traffic 102848 Split GeoIP into a new component In-Scope Open None
Traffic 161101 Performance impact evaluation of enabling nginx-lua and nginx-lua-prometheus on tlsproxy In-Scope Open None
Traffic 161256 multi-component wmflabs.org subdomains doesn't work under simple wildcard TLS cert In-Scope Open None
Traffic 161360 404 loading images from Virgin Media In-Scope Open None
Traffic 161517 Allow anonymous users to change interface language on Commons with ULS In-Scope Open None
Traffic 127387 Split slash decoding from general percent normalization in Varnish VCL In-Scope Open None
Traffic 143562 High number of failed inbound TFO connections in esams Mon-Fri In-Scope Open None
Traffic 127482 Enable VCL source-DC switching via confd In-Scope Open None
Traffic 127485 Enable VCL applayer datacenter-switch via confd In-Scope Open None
Traffic 144187 Better handling for one-hit-wonder objects In-Scope Open None
Traffic 144194 Varnish-triggered CN campaign about browser security In-Scope Open None
Traffic 112316 Configure varnish to use "Unconfigured domain" page for 404 Not Served (instead of generic error) In-Scope Open None
Traffic 144508 Point wikipedia.in to 205.147.101.160 instead of URL forward In-Scope Open None
Traffic 144626 Strong cipher preference ordering for cache terminators In-Scope Open None
Traffic 127573 wikiknihy.cz - transfer to Wikimedia Czech Republic? In-Scope Open None
Traffic 162099 lvs2002 random shut down Screep Open None
Traffic 112765 Phabricator needs to expose notification daemon (websocket) In-Scope Open None
Traffic 145661 varnish backends start returning 503s after ~6 days uptime In-Scope Open None
Traffic 162362 Make maps active / active Screep Open None
Traffic 162683 Network hardware purchasing for Asia Cache DC Screep Open None
Traffic 128182 Server certificate is classified as invalid on government computers In-Scope Open None
Traffic 128188 Make CI run Varnish VCL tests In-Scope Open None
Traffic 128358 Uploading 1.2GB ogv results in 503 In-Scope Open None
Traffic 128374 Sort out analytics service dependency issues for cp* cache hosts In-Scope Open None
Traffic 146332 Create short link for outreachdashboard.wmflabs.org In-Scope Open None
Traffic 162818 icinga alerts on nodejs services when a recdns server is depooled Screep Open None
Traffic 128409 Detect tools.wmflabs.org tools which are HTTP-only In-Scope Open None
Traffic 146619 DNS domains registered to WMF no longer redirecting In-Scope Open None
Traffic 128559 store.wikimedia.org HTTPS issues In-Scope Open None
Traffic 91372 $wgMFAnonymousEditing = true is sometimes not respected: cache? In-Scope Open None
Traffic 75944 Monitor Varnish caches on beta cluster have two varnishd process running In-Scope Open None
Traffic 163141 dbtree: make wasat a working backend and become active-active Screep Open None
Traffic 163233 Implement Varnish-level rough ratelimiting Screep Open None
Traffic 163251 Communicate dropping IE8-on-XP support (a security change) to affected editors and other community members Screep Open None
Traffic 146832 Clarify caching to enable direct Wikidata Query Service access by <mapframe/link> In-Scope Open None
Traffic 91820 Create HTTP verb and sticky cookie DC routing in VCL In-Scope Open None
Traffic 147199 Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) In-Scope Open None
Traffic 163541 cache hosts should auto-repool iff OCSP files are sane Screep Open None
Traffic 147202 Removing support for AES128-SHA TLS cipher In-Scope Open None
Traffic 114104 pybal doesn't fully manage LVS table leaving stale services (on IP change) In-Scope Open None
Traffic 164173 Cache invalidations coming from the JobQueue are causing lag on several wikis Screep Open None
Traffic 129682 Look into solutions for replaying traffic to testing environment(s) In-Scope Open None
Traffic 164259 Add VSL error counters to Varnishkafka stats Screep Open None
Traffic 148131 Deploy redundant unified certs In-Scope Open None
Traffic 164327 replace ulsfo aging servers Screep Open None
Traffic 148134 OCSP Stapling for Intermediates In-Scope Open None
Traffic 164456 Build nginx without image filter support Screep Open None
Traffic 164579 Investigate nginx reload behavior Screep Open None
Traffic 164609 Merge cache_misc into cache_text functionally Screep Open None
Traffic 148422 cp3009: memory scrubbing error In-Scope Open None
Traffic 164768 Explicitly limit varnishd transient storage Screep Open None
Traffic 164868 SSL error for https://wikispecies.org/ Screep Open None
Traffic 129839 restrict upload cache access for private wikis In-Scope Open None
Traffic 45250 Redo /beacon/impression system (formerly Special:RecordImpression) to remove extra round trips on all FR impressions (title was: S:RI should pyroperish) In-Scope Open None
Traffic 78421 m.{project}.org portal/redirect consistency In-Scope Open None
Traffic 148976 Strongswan Icinga check: do not report issues about depooled hosts In-Scope Open None
Traffic 148983 cp3021 failed disk sdb In-Scope Open None
Traffic 165560 Artificial spike in offset of unique devices from November to February 6th on wikidata Screep Open None
Traffic 94125 Central login notice appears on unencrypted API format=*fm pages, where reloading does not affect login status In-Scope Open None
Traffic 165765 Refactor pybal/LVS config for shared failover Screep Open None
Traffic 78963 Support ESI for ResourceLoader In-Scope Open None
Traffic 130904 Host rewrite for /static/ not applied to purges In-Scope Open None
Traffic 79730 Add pybal check to ensure service IP is bound In-Scope Open None
Traffic 81305 Make PyBal respect advertised BGP capabilities In-Scope Open None
Traffic 131930 Set SPF (... -all) for toolserver.org In-Scope Open None
Traffic 166695 Substantive HTTP and mediawiki/database traffic coming from a single ip Screep Open None
Traffic 166758 cp3032 ethernet link down (bnx2x dump in the dmesg) Screep Open None
Traffic 149847 Use content hash based image / thumb URLs In-Scope Open None
Traffic 149873 CentralNotice: Review and update Varnish caching for Special:BannerLoader In-Scope Open 2.0
Traffic 166965 Degraded RAID on lvs3001 Screep Open None
Traffic 167060 en.wiki domain owned by us, but isn't hosted by us?? Screep Open None
Traffic 82747 pybal health checks are ipv4 even for ipv6 vips In-Scope Open None
Traffic 150022 thumb_handler.php should not set CC:no-cache on renderer 404 responses? In-Scope Open None
Traffic 117435 Spike: CentralNotice: Verify that our Special:HideBanners cookie storm works as efficiently as possible In-Scope Open 2.0
Traffic 82849 lvs servers report 'Memory allocation problem' on bootup In-Scope Open None
Traffic 132521 Enforce HTTPS+HSTS on remaining one-off sites in wikimedia.org that don't use standard cache cluster termination In-Scope Open None
Traffic 132629 Data passed to HHVM ($_SERVER variables) is a mixed bag of already-decoded and non-decoded nonsense In-Scope Open None
Traffic 118181 Planning for phasing out non-Forward-Secret TLS ciphers In-Scope Open None
Traffic 133001 Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames In-Scope Open 0.0
Traffic 167513 Add an redirect lzh.wikipedia to zh-classical.wikipedia Screep Open None
Traffic 83467 LVS testing needs to include internal services testing In-Scope Open None
Traffic 150479 Prometheus varnish metric churn due to VCL reloads In-Scope Open None
Traffic 96499 dbtree loads third party resources (from jquery.com and google.com) In-Scope Open None
Traffic 133149 Move californium to an internal host? In-Scope Open None
Traffic 118468 point wikilovesmonuments.org ns to wmf In-Scope Open None
Traffic 167906 Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting Screep Open None
Traffic 150673 Thumb API: Varnish / CDN questions In-Scope Open None
Traffic 168013 Remove disableImages handling from VCL Screep Open None
Traffic 96844 Update TLS/HTTP documentation on wikitech In-Scope Open None
Traffic 97051 adding new languages to DNS langs.tmpl doesn't work until zone template is edited as well In-Scope Open None
Traffic 133410 Deploy TemplateStyles to WMF production In-Scope Open None
Traffic 119038 Image cache issue when 'over-writing' an image on commons In-Scope Open None
Traffic 133548 Create a secure redirect service for large count of non-canonical / junk domains In-Scope Open None
Traffic 119366 Disable caching on the main page for anonymous users In-Scope Open None
Traffic 119372 Pybal IdleConnectionMonitor with TCP KeepAlive shows random fails if more than 100 servers are involved. In-Scope Open None
Traffic 133717 Letsencrypt all the prod things we can - planning In-Scope Open None
Traffic 119396 Create globally-unique varnish cache cluster port/instancename mappings In-Scope Open None
Traffic 133791 check_dns needs to be rewritten In-Scope Open None
Traffic 36670 Check all wikis for inclusions of http resources on https In-Scope Open None
Traffic 168699 Verify that the codfw lvs is configured correctly for Phabricator Screep Open None
Traffic 133895 Varnish configuration for mobile domains should be coherent with Apache configuration In-Scope Open None
Traffic 56783 Respect X-Forwarded-For only from trustworthy sources In-Scope Open None
Traffic 134404 Varnish support for active:active backend services In-Scope Open None
Traffic 168919 stream.wikimedia.org: remove legacy rcstream/socket.io HTTPS redirect hole punches Screep Open None
Traffic 134447 letsencrypt puppetization: upgrade for scalability In-Scope Open None
Traffic 98165 Figure out an etcd deploy strategy that includes multi DC failure scenarios. In-Scope Open None
Traffic 84543 more robust certificate chain creation in puppet In-Scope Open None
Traffic 151643 python-varnishapi daemons seeing "Log overrun" constantly In-Scope Open None
Traffic 120121 Improve Varnish XFF processing for trusted proxies In-Scope Open None
Traffic 63782 Add varnish logs to logstash In-Scope Open None
Traffic 134893 Unhandled pybal error causing services to be depooled in etcd but not in lvs In-Scope Open None
Traffic 99216 Please set up a CNAME for videoserver.wikimedia.org to Video Editing Server In-Scope Open None
Traffic 152091 Block hotlinking In-Scope Open None
Traffic 120486 add a https-only option to dynamicproxy In-Scope Open None
Traffic 120509 Cache education dashboard pages In-Scope Open None
Traffic 66214 Define an official thumb API In-Scope Open None
Traffic 23027 Requests with utf-8 in the URL return a outdated page revision In-Scope Open None
Traffic 152622 Wikipedia.cz and other domains owned by WMCZ have invalid certificate In-Scope Open None
Traffic 101525 Set up LVS for current AuthDNS In-Scope Open None
Traffic 135762 A/B Testing solid framework In-Scope Open None
Traffic 152882 Many misc wikis lack mobile domains In-Scope Open None
Traffic 102178 Fix RESTBase support for wikitech.wikimedia.org In-Scope Open None
Traffic 102367 Migrate tools.wmflabs.org to https only (and set HSTS) In-Scope Open None
Traffic 121561 Encrypt Kafka traffic, and restrict access via ACLs In-Scope Open 0.0
DBA 121207 Implement slave_run_triggers_for_rbr at sanitarium for labs filtering In-Scope Cut None
DBA 166568 Error (Wikimedia\Rdbms\DBQueryError) when creating a SecurePoll poll on testwiki Screep Done None
DBA 164107 db1063 thermal issues (was: db1063 io (s5 master eqiad) performance is bad) Screep Done None
DBA 159266 db1047 BBU RAID issues (was: Investigate db1047 replication lag) In-Scope Done None
DBA 164092 db1062 (s7 master eqiad) in a reboot cycle Screep Done None
DBA 166267 dbtree: dbtree.wikimedia.org currently returns a 500 error Screep Done None
DBA 167667 Degraded RAID on db2070 Screep Done None
DBA 157714 Better mysql command prompt info for Beta In-Scope Done None
DBA 169294 File space alert for db1028 Screep Done None
DBA 166935 db1089: update RAID controller firwmare Screep Done None
DBA 164968 remove mira wikitech grants Screep Done None
DBA 116903 Adapt wmf-mariadb101 package for stretch and adapt its service to systemd In-Scope Done None
DBA 168356 Prepare mysql hosts for stretch Screep Done None
DBA 167961 Terbium cronjobs attempting to connect to labstestweb2001 Screep Done None
DBA 163351 codfw API slaves overloaded during the 2017-04-19 codfw switch Screep Done None
DBA 161712 codfw: (1) spare pool system for temp allocation as database failover In-Scope Done None
DBA 166518 Degraded BBU on db1094 (was: Degraded RAID on db1094) Screep Done None
DBA 160731 db1048 BBU Faulty - slave lagging In-Scope Done None
DBA 167597 Renaming Neoalpha: supervision needed Screep Done None
DBA 151746 fstrim: Operation not supported on Labs DBs In-Scope Done None
DBA 166141 db1046 BBU looks faulty Screep Done None
DBA 127219 Drop wikilove_image_log table from Wikimedia wikis In-Scope Done None
DBA 162290 setup tempdb2001(WMF6407) Screep Done None
DBA 169355 Degraded RAID on db1052 Screep Done None
DBA 166028 Rename user "Mlpearc" to "FlightTime" on Central Auth Screep Done None
DBA 162212 dbstore1002 in bad shape Screep Done None
DBA 124307 Improve eventlogging replication procedure In-Scope Done 5.0
DBA 123509 "db1047/eventlogging_sync processes" icinga alert is flaky since at least early January In-Scope Done None
DBA 162183 tendril cert expiry alerts on dbmonitor hosts Screep Done None
DBA 162159 codfw rack/setup 22 DB servers Screep Done None
DBA 166853 Degraded RAID on db2049 Screep Done None
DBA 168109 Global rename of Smuconlaw → Sgconlaw: supervision needed Screep Done None
DBA 168776 Global rename of Green Cardamom → GreenC: supervision needed Screep Done None
DBA 166933 db1089 - possibly RAID controller broken Screep Done None
DBA 155769 Spurious completely empty `image` table row on commonswiki In-Scope Done None
DBA 165629 Degraded RAID on db2058 Screep Done None
DBA 162135 Decommission db1057 Screep Done None
DBA 130702 Several es20XX servers keep crashing (es2017, es2019, es2015, es2014) since 23 March In-Scope Done None
DBA 166422 Degraded RAID on db1046 Screep Done None
DBA 163413 Pool new server db2071 Screep Done None
DBA 163895 Move masters away from D1 in eqiad Screep Done None
DBA 162789 Create less overhead on bacula jobs when dumping production databases Screep Open None
DBA 167031 Global rename of Idh0854 → Garam: supervision needed Screep Open None
DBA 141255 Separate host lookup from the sql shell script In-Scope Open None
DBA 141252 icinga hp raid check timeout on busy ms-be and db machines In-Scope Open None
DBA 119626 Eliminate SPOF at the main database infrastructure In-Scope Open None
DBA 135851 Preserve InnoDB table auto_increment on restart In-Scope Open None
DBA 148955 Puppetize tendril web user creation In-Scope Open None
DBA 100501 mysql user and group should be a system user/group In-Scope Open None
DBA 134476 Decommission old coredb machines (<=db1050) In-Scope Open None
DBA 153440 Create a full backup of all external storage records that would be easy to restore/setup a temporary delayed slave In-Scope Open None
DBA 168584 Labsdb* servers need to be rebooted Screep Open None
DBA 165625 Evaluate future of wmf puppet module "mysql" Screep Open None
DBA 104699 Firewall configurations for database hosts In-Scope Open None
DBA 155764 Gerrit: Schedule downtime to migrate db to utf8mb4 In-Scope Open None
DBA 165674 Investigate slow servermon updating queries on db1016 Screep Open None
DBA 141968 Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master In-Scope Open None
DBA 165677 Create a backend check for pybal to monitor the MySQL protocol being up Screep Open None
DBA 126252 Populate the wikishared db on all dbstores In-Scope Open None
DBA 160392 Reset db1070 idrac In-Scope Open None
DBA 133523 [RFC] improve parsercache replication and sharding handling In-Scope Open None
DBA 163339 pdu phase inbalances: ps1-a3-codfw, ps1-c6-codfw, & ps1-d6-codfw Screep Open None
DBA 164702 Decommission db1024 Screep Open None
DBA 151999 Create script to monitor db dumps for backups are successful (and if not, old backups are not deleted) In-Scope Open None
DBA 165977 Create CoC committee private wiki Screep Open None
DBA 54932 Drop *_old database tables from Wikimedia wikis In-Scope Open None
DBA 166108 x1 master db1031: Faulty BBU Screep Open None
DBA 164834 In some database hosts, performance schema loses digest statistics Screep Open None
DBA 143896 MySQL monitoring with prometheus In-Scope Open None
DBA 161754 eqiad: (2) hardware access request for labsdb1004 & 5 refresh In-Scope Open None
DBA 161755 eqiad: (2) hardware access request for labsdb1006 & 7 refresh In-Scope Open None
DBA 119154 Move echo tables from local wiki databases onto extension1 cluster for mediawikiwiki, metawiki, and officewiki In-Scope Open None
DBA 50930 Database replication problems - production and labs (tracking) In-Scope Open None
DBA 164073 Puppetize Piwik's Database and set up periodical backups Screep Open None
DBA 107610 Setup separate logical External Store for Flow in production In-Scope Open None
DBA 112282 Multiple pages with no revisions In-Scope Open None
DBA 141547 Setup automatic failover for misc database servers In-Scope Open None
DBA 127570 Rename be_x_oldwiki database to be_taraskwiki In-Scope Open None
DBA 149643 Review Icinga alarms with disabled notifications In-Scope Open None
DBA 157702 Followup for TLS MariaDB server roll-out In-Scope Open None
DBA 162070 Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases Screep Open None
DBA 109179 Migrate MySQLs to use ROW-based replication In-Scope Open None
DBA 112473 Better mysql monitoring for number of connections and processlist strange patterns In-Scope Open None
DBA 163143 dbtree: don't return 200 on error pages Screep Open None
DBA 145072 Create a script to regenerate prometheus mysqld exporter listing that works with puppetdb In-Scope Open None
DBA 166486 Decommission db1023 Screep Open None
DBA 162233 eqiad rack/setup 11 new DB servers Screep Open None
DBA 152427 Create a check/calendar alert for MariaDB TLS certs In-Scope Open None
DBA 134809 Apache <=> mariadb SSL/TLS for cross-datacenter writes In-Scope Open None
DBA 156844 Prep to decommission old dbstore hosts (db1046, db1047) In-Scope Open None
DBA 148078 Decommission db1015, db1035, db1044 and db1038 In-Scope Open None
DBA 115982 Drop the tables old_growth, hitcounter, click_tracking, click_tracking_user_properties from enwiki, maybe other schemas In-Scope Open None
DBA 163303 Increase timeout for mariadb replication check Screep Open None
DBA 158893 dbstore1001 troubleshoot IPMI issue In-Scope Open None
DBA 168764 Reopen Wikinews Dutch Screep Open None
DBA 163778 Decommission db1022 (Was: db1022 broke while changing topology on s6- evaluate if to fix or directly decommission) Screep Open None
DBA 162699 Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) Screep Open None
DBA 157359 labsdb1006/1007 (postgresql) maintenance In-Scope Open None
DBA 145885 Gerrit shows HTTP 500 error when pasting extended unicode characters In-Scope Open None
DBA 151491 Icinga MariaDB disk space check on silver checks the wrong partition In-Scope Open None
Software Development 166371 Monitoring: create an alert for daemonized puppet Screep Done None
Software Development 165583 Puppet compiler: sync facts from all workers Screep Done None
Software Development 157052 Puppet compiler: sync newest facts only In-Scope Done None
Software Development 164177 switchdc: Improve wgReadOnly message Screep Done None
Software Development 157133 Consider adding a --skip-conftool option to puppet-merge In-Scope Open None
Software Development 144264 wmf-reimage and handling of "-n" option In-Scope Open None
Software Development 148494 Add shell scripts CI validations In-Scope Open None
Software Development 159045 Update Puppet repo code that uses maniphest.update and maniphest.createtask conduit api In-Scope Open None
Software Development 86556 monitor SSD wear levels In-Scope Open None
Software Development 166300 Remove Salt from wmf-auto-reimage / wmf-reimage Screep Open None
Software Development 150560 More verbose messages from service-checker-swagger In-Scope Open None
Software Development 152950 E901 SyntaxError: invalid syntax is wrongly raised on using python's abc by jenkins python CI linter In-Scope Open None
Software Development 164587 cumin could use randomization/splay options Screep Open None
Software Development 143536 Upgrade all mw* servers to debian jessie In-Scope Open None
Software Development 154776 Puppet compiler: order resources for easy comparison between hosts In-Scope Open None
Software Development 157002 Puppet compiler: re-add the concurrency option NUM_THREADS In-Scope Open None
Software Development 157001 Puppet compiler: abort on git rebase conflict In-Scope Open None
Software Development 167504 New tool to track package updates/status for hosts and images (debmonitor) Screep Open None
Software Development 164780 Sunset our use of Salt Screep Open None
Software Development 164817 Migrate debdeploy to cumin Screep Open None
Software Development 155705 confctl: log to SAL even if the selection doesn't match any host In-Scope Open None
Software Development 144169 Flake8 for python files without extension in puppet repo In-Scope Open None
Hardware Requests 164959 Decomission mw2098 Screep Done None
Hardware Requests 83340 decom arsenic: (was: reclaim arsenic as spare) Screep Done None
Hardware Requests 162928 decommision nembus Screep Done None
Hardware Requests 147309 Decommission db1019 In-Scope Done None
Hardware Requests 164036 decommission gadolinium Screep Done None
Hardware Requests 162897 spare pool allocation of WMF6406 to replace mira Screep Done None
Hardware Requests 161606 Additional ram quote for Prometheus baremetal In-Scope Done None
Hardware Requests 164515 codfw: (1) labtest puppetmaster Screep Done None
Hardware Requests 158204 Eqiad: (2) hardware access request for labnet1003/1004 In-Scope Done None
Hardware Requests 158207 Eqiad: (2) hardware access request for labcontrol1003/1004 In-Scope Done None
Hardware Requests 161700 CODFW: (4) hardware access request for kubernetes In-Scope Done None
Hardware Requests 156970 replacement hardware for iridium (phabricator) In-Scope Done None
Hardware Requests 154664 codfw: (2) hardware access request for labtest In-Scope Done None
Hardware Requests 159839 EQIAD: stat1003 replacement In-Scope Done None
Hardware Requests 141415 decom ytterbium (datacenter) In-Scope Done None
Hardware Requests 161750 eqiad: (1) hardware access request for dedicated labmon1002 In-Scope Done None
Hardware Requests 142578 codfw/eqiad:(9+9) hardware access request for ORES In-Scope Done None
Hardware Requests 164513 reclaim tempdb2001(WMF6407) to spares Screep Done None
Hardware Requests 161753 eqiad: (1) hardware access request for labnodepool1002 In-Scope Done None
Hardware Requests 162257 EQIAD: 2 hardware access request for kubernetes-staging Screep Done None
Hardware Requests 147053 eqiad: (2) hardware access request for dedicated Labs puppetmasters In-Scope Done None
Hardware Requests 154706 Codfw: (1) hardware access request for labtestneutron refresh In-Scope Done None
Hardware Requests 161764 Codfw: (1) hardware access request for labtestnet2003 [region 2] In-Scope Done None
Hardware Requests 159838 EQIAD: stat1002 replacement In-Scope Done None
Hardware Requests 161765 Codfw: (1) hardware access request for labtestvirt2003 [region 2] In-Scope Done None
Hardware Requests 161766 Codfw: (2) hardware access request for labtest [region 2] In-Scope Done None
Hardware Requests 159413 Decomission ms-fe2001-4 In-Scope Done None
Hardware Requests 162785 Decommission ms-be2001 - ms-be2012 Screep Open None
Hardware Requests 166341 SSDs for main Kafka clusters Screep Open None
Hardware Requests 95742 Decomission amssq31-62 (32 hosts) In-Scope Open None
Hardware Requests 167376 Decommission cp300[3456] Screep Open None
Hardware Requests 159996 decom fluorine In-Scope Open None
Hardware Requests 159480 Decommission bast3001 In-Scope Open None
Hardware Requests 167377 Decommission cp4011, cp4012, cp4019, cp4020 Screep Open None
Hardware Requests 130883 decom cp3011-22 (12 machines) In-Scope Open None
Hardware Requests 164588 decom mira Screep Open None
Hardware Requests 166489 Decommission ms-be1001 - ms-be1012 Screep Open None
Hardware Requests 160986 Decommission ms-fe100[1-4] In-Scope Open None
Hardware Requests 169020 Decommission cp400[1-4] Screep Open None
Hardware Requests 168271 Decommission mw1170-mw1179 Screep Open None
Other Operations 123147 Wikipedia.com warns about bad certificate Unknown None
Other Operations 161529 Create Wikipedia Doteli In-Scope Cut None
Other Operations 167048 Services need external monitoring Screep Done None
Other Operations 154473 Update logstash on wikimedia to 5.x In-Scope Done None
Other Operations 167116 Request access to analytics-privatedata-users for GoranSMilovanovic Screep Done None
Other Operations 153918 decom europium (was: reclaim europium) In-Scope Done None
Other Operations 98084 transient failures of wiki page saves Screep Done None
Other Operations 167131 Upgrade OTRS to 5.0.20 Screep Done None
Other Operations 167159 rack/setup/install labtestnet2002 Screep Done None
Other Operations 167160 rack/setup/install labtestneutron2002 Screep Done None
Other Operations 86633 Unify ::production / ::beta roles for *oid Screep Done None
Other Operations 156284 decommission the old pay-lvs1001/pay-lvs1002 boxes In-Scope Done None
Other Operations 167264 Degraded RAID on ms-be1016 Screep Done None
Other Operations 167266 upgrade kibana to v5.3.3 Screep Done None
Other Operations 150316 decom gallium (data center) In-Scope Done None
Other Operations 156100 DNS: dynamically generate entries for service discovery In-Scope Done None
Other Operations 167286 Package latest version of Thumbor and deploy it Screep Done None
Other Operations 167287 Backport python-schedule and add it to jessie-wikimedia Screep Done None
Other Operations 156023 Check the size of every cluster in codfw to see if it matches eqiad's capacity In-Scope Done None
Other Operations 103499 puppetmaster self: Could not find dependency Class[Role::Access_new_install] for File[/usr/local/sbin/install-console] at /etc/puppet/modules/puppetmaster/manifests/scripts.pp:62 Screep Done None
Other Operations 73922 HHVM and PCRE v8.31 gives incorrect results for certain PCRE patterns In-Scope Done None
Other Operations 141815 Define tile usage policy In-Scope Done None
Other Operations 159887 rack and cable frdev1001 In-Scope Done None
Other Operations 88761 Miscellaneous servers to track in eqiad for possible inclusion in codfw misc virt cluster Screep Done None
Other Operations 159886 rack and cable frdb1002 In-Scope Done None
Other Operations 160122 Remove user gerrit2 from ldap In-Scope Done None
Other Operations 160178 MediaWiki Datacenter Switchover automation In-Scope Done None
Other Operations 160333 Reimage the Hadoop Cluster to Debian Jessie In-Scope Done 8.0
Other Operations 159835 Labvirt1001 has insanely slow IO In-Scope Done None
Other Operations 160581 track the ops juniper kit in OIT den In-Scope Done None
Other Operations 160597 Import new kibana and logstash .debs to wikimedia experimental repository In-Scope Done None
Other Operations 160640 Rack and Setup ms-be1028-ms-1039 In-Scope Done None
Other Operations 95781 EL graphite data missing from 24/3 to 7/4 Screep Done None
Other Operations 160886 Investigate recent Kafka Burrow alarms for EventLogging In-Scope Done None
Other Operations 160908 Instance creation fails before first puppet run around 1% of the time In-Scope Done None
Other Operations 75516 Puppet failure on instance udplog due to udp2log class depends on ganglia/gmond Screep Done None
Other Operations 88183 Replag on labsdb Screep Done None
Other Operations 161095 Uninitialized string offset warnings with HHVM 3.18 in LanguageAz.php and LanguageKk.php In-Scope Done None
Other Operations 151324 [epic] System level upgrade for cirrus / elasticsearch In-Scope Done None
Other Operations 161158 Degraded RAID on ocg1001 In-Scope Done None
Other Operations 151328 move data to /srv for the cirrus / elasticsearch clusters In-Scope Done None
Other Operations 161343 mediawiki.org DNS TXT entry for Google Search Console site verification In-Scope Done None
Other Operations 167493 Switch CI tests back to HHVM 3.18 Screep Done None
Other Operations 161488 Reclaim/Decommission mw2090->mw2096 (OOW) In-Scope Done None
Other Operations 161520 HHVM 3.18 crashes when Cirrus tries to fetch another wiki config via maint script In-Scope Done None
Other Operations 136670 rack/setup/deploy mw22[1-5][0-9] switch configuration In-Scope Done None
Other Operations 161538 Plug in ex-graphite2001 SSDs to recover coal data In-Scope Done None
Other Operations 161597 Decom/Reclaim analytics1027 In-Scope Done 3.0
Other Operations 161684 HHVM 3.18 deadlocks after 4-6 hours (stuck in in HPHP::Treadmill::getAgeOldestRequest() ) In-Scope Done None
Other Operations 161703 Add performance-team contact group to private.git In-Scope Done None
Other Operations 159771 import_waterlines is broken In-Scope Done None
Other Operations 167638 scb2005 eth0 interface gets renamed to eth2 Screep Done None
Other Operations 89912 HHVM lock-ups In-Scope Done None
Other Operations 161836 404 error while accessing some images files (e.g. djvu, jpg, png, webm) on Commons and other sites In-Scope Done None
Other Operations 161870 Recurrent Postgres replication lag for codfw maps hosts In-Scope Done None
Other Operations 161893 Puppet fails with "Could not find init script for 'postgresql@9.4-main'" on maps / labs server In-Scope Done None
Other Operations 167643 Reboot irc.wikimedia.org for kernel upgrades Screep Done None
Other Operations 135647 Configure monitoring / alerting of Postgresql / redis / ... cluster for maps In-Scope Done None
Other Operations 167685 performance regression after upgrading elasticsearch to v5.3.2 Screep Done None
Other Operations 146125 MediaWiki load time regression should trigger an alarm / page people In-Scope Done None
Other Operations 162040 Eliminate SPOFs in the existing eqiad kubernetes infrastructure Screep Done None
Other Operations 162041 Expand the infrastructure to codfw Screep Done None
Other Operations 162042 Prepare and maintain base container images Screep Done None
Other Operations 167705 Rack/setup codfw spare systems Screep Done None
Other Operations 162045 Design and implement a Kubernetes-based staging environment. (stretch) Screep Done None
Other Operations 162046 analytics1030 stuck in console while booting Screep Done None
Other Operations 141522 Physically decommission mw1001-mw1148 (except mw1017 and mw1099) In-Scope Done None
Other Operations 162080 Reset admin password for wikimania-program mailing list Screep Done None
Other Operations 162085 labvirt-star.eqiad.wmnet.crt expiring soon Screep Done None
Other Operations 155692 ocg1001.eqiad.wmnet ipmi error In-Scope Done None
Other Operations 159615 [spec] Active-active setup for ORES across datacenters (eqiad, codfw) In-Scope Done None
Other Operations 90466 Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs Screep Done None
Other Operations 72181 Setup a dedicated mediawiki host in Beta Cluster that we can use for security scanning Screep Done None
Other Operations 162200 Upload of Parsoid deb package 0.7.0 failed Screep Done None
Other Operations 162216 rack and set up analyics1058-1069 Screep Done None
Other Operations 162298 rack and setup boron replacement frpm1001 Screep Done None
Other Operations 167797 analytics1067: Broken BBU Screep Done None
Other Operations 162345 Reduce number of false positive alerts on postgresql lag for maps Screep Done None
Other Operations 162347 Degraded RAID on ms-be1006 Screep Done None
Other Operations 162354 Frequent TCP RST on connections between HHVM and Redis Screep Done None
Other Operations 167801 Deploy thumbor in codfw Screep Done None
Other Operations 162404 create a 'root' group with bdavis strictly for labs/cloud services infrastructure Screep Done None
Other Operations 162462 Standalone puppet masters are broken (uninstallable packages) Screep Done None
Other Operations 162469 rack/setup frbackup2001 Screep Done None
Other Operations 162586 HHVM segfault in memory cleanup Screep Done None
Other Operations 162609 Swift version and distro upgrade Screep Done None
Other Operations 162612 codfw/eqiad hosts occasionally spend > 3 minutes starting networking.service with linux 4.9 Screep Done None
Other Operations 162614 RAID-0 volume not mounted on restbase-dev1001.eqiad.wmnet Screep Done None
Other Operations 167809 New analytic hosts with BBU learning cycle enabled Screep Done 3.0
Other Operations 162745 allow paging to work properly in ldap Screep Done None
Other Operations 155691 es1019.eqiad.wmnet drac unresponsive In-Scope Done None
Other Operations 167704 Make donate.wikimedia.org SPF more strict Screep Done None
Other Operations 162793 Rate limit swift operations Screep Done None
Other Operations 167871 Refactor maps puppet code to the role / profile paradigm Screep Done None
Other Operations 167878 mwlog1001.eqiad.wmnet ‘/srv/mw-log/hhvm.log’ for reading: No such file or directory Screep Done None
Other Operations 162822 remove/fix jenkins icinga monitoring on contint2001 Screep Done None
Other Operations 162854 ms-be2023 freeze Screep Done None
Other Operations 159163 PuppetDB is auto-deactivating hosts In-Scope Done None
Other Operations 146464 hhvm root:adm owned log files cause failures for logrotate In-Scope Done None
Other Operations 162900 setup naos/WMF6406 as new codfw deployment server Screep Done None
Other Operations 155689 ms-be2002.codfw.wmnet has drac issues In-Scope Done None
Other Operations 162952 decom barium.frack.eqiad.wmnet Screep Done None
Other Operations 167952 analytics-privatedata-users access for ema Screep Done None
Other Operations 159004 Set up Logstash behind LVS In-Scope Done None
Other Operations 158982 Decide what to do with maps-test cluster In-Scope Done None
Other Operations 163087 Degraded RAID on heze Screep Done None
Other Operations 163127 rack and cable frlog1001 Screep Done None
Other Operations 167975 mc2036 eth0 link down Screep Done None
Other Operations 163158 acpi_pad consuming 100% CPU on tin Screep Done None
Other Operations 163194 Backfill restored coal whisper files with current data Screep Done None
Other Operations 163196 Puppet facts around the primary network interface and IPv4/IPv6 address Screep Done None
Other Operations 163209 Degraded RAID on ms-be1002 Screep Done None
Other Operations 163220 Add Icinga check for CPU frequency on Dell R320 Screep Done None
Other Operations 155411 Reimage achernar and acamar to jessie In-Scope Done None
Other Operations 141277 Investigate failover failure of LDAP servers In-Scope Done None
Other Operations 96618 just in case: set up a new oauth consumer on mediawiki.org that has oauth callback url checkbox enabled Screep Done None
Other Operations 163278 Four different PHP/HHVM versions on the cluster Screep Done None
Other Operations 163286 Tegmen: process spawn loop + failed icinga + failing puppet Screep Done None
Other Operations 146746 investigate shared inbox options In-Scope Done None
Other Operations 163292 Failed disk / degraded RAID arrays: restbase1018.eqiad.wmnet Screep Done None
Other Operations 158664 elasticsearch logs are duplicated in journald In-Scope Done None
Other Operations 76086 Create labs ldap user and group "apache" to be used for NFS4 access in deployment-prep project Screep Done None
Other Operations 137345 Rack/Setup new memcache servers mc1019-36 In-Scope Done None
Other Operations 158649 firejail for mediawiki converter leaks to stderr: "Reading profile /etc/firejail/mediawiki-converters.profile" In-Scope Done None
Other Operations 163368 Revisit paging strategy for frack servers Screep Done None
Other Operations 163385 upgrade memory in prometheus100[34] Screep Done None
Other Operations 163386 upgrade memory in prometheus200[34] Screep Done None
Other Operations 163390 Update documentation for Tools Proxy failover Screep Done None
Other Operations 163391 Ensure kubelet is stopped on Tools Proxy hosts Screep Done None
Other Operations 129152 Deploy servermon with scap3 In-Scope Done None
Other Operations 163405 appserver fatals - intermittent failed connections to rdb2005 Screep Done None
Other Operations 163408 (icinga/grafana) webpagetest-alerts: slow page rendering for Internet Explorer Screep Done None
Other Operations 163432 Access request to Icinga control panel to acknowledge Performance alerts Screep Done None
Other Operations 163555 Logrotate fails on mediawiki maintenance servers on jessie Screep Done None
Other Operations 163565 Install conftool on deployment masters Screep Done None
Other Operations 163568 Production Shell access denied (update SSH key for jmorgan) Screep Done None
Other Operations 154934 Package the next LTS kernel (4.9) In-Scope Done None
Other Operations 92239 cron's puppet-run fails silently if apt-get update fails Screep Done None
Other Operations 163687 Re-enable ORES data in action API Screep Done None
Other Operations 163690 Degraded RAID on ms-be1039 Screep Done None
Other Operations 147211 Tons of OCG jobs caused a massive increase in queue length In-Scope Done None
Other Operations 163721 Update wikitech-static and develop procedures to keep it maintained Screep Done None
Other Operations 163736 2fa disable on Wikitech Account for Zppix Screep Done None
Other Operations 163743 New ganeti VM for MW release pipeline work Screep Done None
Other Operations 163777 Debug HP raid cache disabled errors on ms-be1019/20/21 Screep Done None
Other Operations 163795 Nutcracker doesn't start at boot Screep Done None
Other Operations 163889 cp1066 troubleshoot IPMI issue Screep Done None
Other Operations 163892 codfw: VM request for poolcounter2001, poolcounter2002 Screep Done None
Other Operations 71604 acct (process and login accounting) fill up instances /var/ partition Screep Done None
Other Operations 163939 Enable keyholder for ORES deployments Screep Done None
Other Operations 163940 apply new hostname label for wmf4747/phab1001 Screep Done None
Other Operations 163942 Ops pwstore currently read-only Screep Done None
Other Operations 163960 phab1001 hdd port a failure Screep Done None
Other Operations 147860 Decommission wmf3096 In-Scope Done None
Other Operations 164011 codfw: ganeti2007-ganeti2008 racking and onsite setup task Screep Done None
Other Operations 92389 Give mobrovac production access for citoid Screep Done 0.0
Other Operations 164060 Request to add phuedx to "researchers" group Screep Done None
Other Operations 148016 ms-be1017 repeated usb connect/disconnect message In-Scope Done None
Other Operations 164145 Investigate why firejails break PdfHandler Screep Done None
Other Operations 168348 logmsgbot needs restarting Screep Done None
Other Operations 164202 Degraded RAID on restbase1018 Screep Done None
Other Operations 158337 codfw: ms-be2028-ms-be2039 rack/setup In-Scope Done None
Other Operations 1295 Make ::mediawiki::syslog and ::mediawiki::php logging destinations configurable via hiera Screep Done None
Other Operations 168360 Gerrit constantly throws HTTP 500 error when reviewing patches (due to "Too many open files") Screep Done None
Other Operations 168378 IPMI console not working on ms-be1014 / ms-be1015 Screep Done None
Other Operations 152525 setup/install gerrit2001/WMF6408 In-Scope Done None
Other Operations 151676 gerrit jgit gc caused mediawiki/core repo problems In-Scope Done None
Other Operations 164395 (don't) decom promethium Screep Done None
Other Operations 164398 decommission lutetium Screep Done None
Other Operations 164444 Installer assumes eth0 is the used interface Screep Done None
Other Operations 168439 Access request for Daniel Worley to analytics / hadoop Screep Done None
Other Operations 148186 Build warm slave for Gerrit in Dallas In-Scope Done None
Other Operations 148391 ms-be1025 network down In-Scope Done None
Other Operations 54717 libvips-tools, libtiff etc install Screep Done None
Other Operations 98962 git fat/git deploy doesn't always unstub files [Trebuchet] In-Scope Done None
Other Operations 158176 Build / migrate to HHVM 3.18 In-Scope Done None
Other Operations 158131 hard-reset DRAC gadolinium.mgmt.eqiad.wmnet In-Scope Done None
Other Operations 148408 Put prometheus baremetal servers in service In-Scope Done None
Other Operations 164725 mw1264 inaccessible after reboot Screep Done None
Other Operations 164748 configure RAID on frlog1001 Screep Done None
Other Operations 148478 Investigate seemingly random Gerrit slow-downs In-Scope Done None
Other Operations 164851 codfw: kubernetes200[1-4] racking and onsite setup task Screep Done None
Other Operations 157975 decommission ms1003 In-Scope Done None
Other Operations 164915 ruthenium is going to run out of space on /srv and stop working Screep Done None
Other Operations 165024 Upgrade calico to 2.2, document build process. Screep Done None
Other Operations 165043 HHVM 3.18 crashes in realloc() as exposed by luasandbox Screep Done None
Other Operations 165051 HHVM 3.18 segfault on jobrunner / string handling Screep Done None
Other Operations 165054 add Arzhel Younsi to datacenter access lists Screep Done None
Other Operations 157537 Broken IPMI/drac on cp3038 and cp3045 In-Scope Done None
Other Operations 165074 CI tests failing with segfault Screep Done None
Other Operations 165101 hhvm-gdb on mediawiki vagrant gives ImportError: No module named unwinder Screep Done None
Other Operations 157526 Upgrade bootstrap-vz version for tools docker builder Screep Done None
Other Operations 168516 Reboot snapshot hosts Screep Done None
Other Operations 157429 Upload Jenkins LTS v2.46.2 to jessie-wikimedia/third-party In-Scope Done None
Other Operations 97192 HHVM request timeouts not working; support lowering the API request timeout per request In-Scope Done None
Other Operations 165211 Disable keystone admin_token usage Screep Done None
Other Operations 165220 Degraded RAID on labstore1003 Screep Done None
Other Operations 165284 Upgrade OTRS to 5.0.19 Screep Done None
Other Operations 165291 Set up kubernetes masters for codfw cluster Screep Done None
Other Operations 93428 Streamline our service development and deployment process In-Scope Done None
Other Operations 130317 setup automatic deletion of old l10nupdate In-Scope Done None
Other Operations 93530 Crontabs broken on toolslabs Screep Done None
Other Operations 165467 Create an etcd cluster in codfw for kubernetes usage Screep Done None
Other Operations 168540 Understand APC size increase after HHVM upgrade/restart Screep Done None
Other Operations 122134 url-downloader should be set up more redundantly In-Scope Done None
Other Operations 116335 Deploy RESTBase with scap3 Screep Done None
Other Operations 165529 analytics1030 failed bbu Screep Done None
Other Operations 149006 elastic2020 is powered off and does not want to restart In-Scope Done None
Other Operations 165586 Upload the scap package to stretch-wikimedia Screep Done None
Other Operations 165620 Upload gerrit package to stretch apt.wm.org repo Screep Done None
Other Operations 165669 HHVM: Crash in server worker Screep Done None
Other Operations 165732 Assigning IP space for kubernetes IPs Screep Done None
Other Operations 165733 Supersede RT tickets references Screep Done None
Other Operations 165739 db2049 cannot install jessie - let's try upgrading the firmware first Screep Done None
Other Operations 151971 Move logstash ingestion behind LVS In-Scope Done None
Other Operations 149185 Decommission mw1152 In-Scope Done None
Other Operations 86674 Migrate leftover old svn content to a readonly repo somewhere Screep Done None
Other Operations 149228 Rename stat100x machines to have misc element names In-Scope Done None
Other Operations 165934 Degraded RAID on db1024 Screep Done None
Other Operations 165943 Access to search logs for Jan Dittrich Screep Done None
Other Operations 166021 Degraded RAID on ms-be2029 Screep Done None
Other Operations 166076 rack/setup/install ganeti1005-ganeti1008 Screep Done None
Other Operations 166140 Review Megacli Analytics Hadoop workers settings Screep Done 5.0
Other Operations 166154 replace es-tool with elasticsearch-curator for standard elasticsearch operations Screep Done None
Other Operations 86668 Make all ldap users have a sane shell (/bin/bash) In-Scope Done None
Other Operations 166165 Requesting access to stat1002 for kaldari Screep Done None
Other Operations 166177 Degraded RAID on ms-be1008 Screep Done None
Other Operations 149451 Get 5xx logs into kibana/logstash In-Scope Done None
Other Operations 168670 Setup maintenance date to reindex gerrit (offline reindex) Screep Done None
Other Operations 166203 Upgrade facter to version 2.4.6 Screep Done None
Other Operations 86655 Decommission svn.wikimedia.org server (import SVN into Phabricator) Screep Done None
Other Operations 138810 Set databases as read-only or switchover to secondary datacenter In-Scope Done None
Other Operations 166237 rack/setup/install labtestvirt2003 Screep Done None
Other Operations 152024 remove/fix "Check for gridmaster host resolution" Icinga check for "labtest" In-Scope Done None
Other Operations 166264 rack/setup/install kubestage100[12] Screep Done None
Other Operations 166310 Grant root access for Bryan Davis on labstore* and admin for maintain scripts for labsdb* Screep Done None
Other Operations 94926 vanadium failed disk /dev/sda Screep Done None
Other Operations 166345 wmf/1.30.0-wmf.2 performance issue for Wikipedias Screep Done None
Other Operations 166372 Puppet: test non stringified facts across the fleet Screep Done None
Other Operations 166389 Error while enabling symlinked units on stretch systemd Screep Done None
Other Operations 166391 Analytics access request Screep Done None
Other Operations 166482 Persistent failure of TMH to transcode videos at specific resolutions Screep Done None
Other Operations 166490 Commons UploadWizard fails after repeated attempt to upload a .FLAC audio file Screep Done None
Other Operations 166524 high replication lag on wdqs1002 Screep Done None
Other Operations 166550 HHVM segfault with mediawiki/core / Scribunto Screep Done None
Other Operations 166552 Switch etcd back to eqiad, document switchover procedure Screep Done None
Other Operations 106477 Reduce rpcbind use Screep Done None
Other Operations 166598 correct label on labtestvirt2002/WMF3810 Screep Done None
Other Operations 166653 ferm broken in stretch Screep Done None
Other Operations 151065 Implement DC-local cache failure limiter in Thumbor In-Scope Done None
Other Operations 166776 Geoshape and geoline subservices need monitoring Screep Done None
Other Operations 166806 Server side upload for Yann Screep Done None
Other Operations 101377 LDAP TLS failing on some instances due to inconsistent state Screep Done None
Other Operations 166962 Degraded RAID on terbium Screep Done None
Other Operations 149993 sync bohrium and apt.wikimedia.org piwik versions In-Scope Done None
Other Operations 169360 Unresponsive/misconfigured iDRACs over the host-BMC interface Screep Open None
Other Operations 10217 Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) In-Scope Open None
Other Operations 17000 Special:Import error: "Import failed: Could not open import file" In-Scope Open None
Other Operations 32716 Run our own Tor client for Tor block In-Scope Open None
Other Operations 36947 Incorrect text positioning in SVG rasterization (scale/transform; font-size; kerning) Screep Open 0.0
Other Operations 40860 security@mediawiki.org : Create a public key and publish it on the public key servers In-Scope Open None
Other Operations 46016 SVG fails to render properly due to several issues In-Scope Open None
Other Operations 46791 [[wikitech:Server_admin_log]] should not rely on freenode irc for logmsgbot entries In-Scope Open None
Other Operations 55457 setup a DB backed parser cache In-Scope Open None
Other Operations 56515 Apply editing rate limits for all users In-Scope Open None
Other Operations 56713 Non-NDA users cannot access graphite.wikimedia.org In-Scope Open None
Other Operations 67270 Default license for operations/puppet In-Scope Open None
Other Operations 67394 [EPIC] Performance testing environment In-Scope Open None
Other Operations 69015 m.wikipedia.org incorrectly redirects to en.m.wikipedia.org In-Scope Open None
Other Operations 71336 VIPS scaled thumbnails don't have a comment with a link to the file description page In-Scope Open None
Other Operations 76203 Make ircecho run as its own user In-Scope Open None
Other Operations 76306 Set warning thresholds for average cluster utilization In-Scope Open None
Other Operations 78342 Create a basic RSpec unit test for operations/puppet In-Scope Open None
Other Operations 83811 ms-be3003 sdk (bay 11) broken In-Scope Open None
Other Operations 84163 Fix CirrusSearch monitoring In-Scope Open None
Other Operations 84700 Setup management switch in OE12 In-Scope Open None
Other Operations 85451 scale graphite deployment (tracking) In-Scope Open None
Other Operations 85459 Kill manifests/realm.pp In-Scope Open None
Other Operations 86081 Complete the use of HHVM over Zend PHP on the Wikimedia cluster In-Scope Open None
Other Operations 86546 graphite-web logs are not rotated In-Scope Open None
Other Operations 86552 monitor and alarm on SMART attributes In-Scope Open None
Other Operations 86784 setup the 2 new esams ms-be systems In-Scope Open None
Other Operations 86971 Decide on /var/lib vs /home as locations of homedir for mwdeploy In-Scope Open None
Other Operations 87220 Minimize differences between beta and production (Tracking) In-Scope Open None
Other Operations 87790 decom amslvs1-4 (dc work) In-Scope Open None
Other Operations 88730 Nutcracker needs to automatically recover from MC failure - rebalancing issues In-Scope Open None
Other Operations 88997 Improve graphite failover In-Scope Open None
Other Operations 89808 wikitech instances list is blank Screep Open None
Other Operations 89829 bond eth interfaces on ms1001 In-Scope Open None
Other Operations 91404 Setup backups of elasticsearch indices In-Scope Open None
Other Operations 91637 Rack ms-be3006 and ms-be3007 In-Scope Open None
Other Operations 92304 Separate citoid service for beta that runs off master instead of deploy In-Scope Open 0.0
Other Operations 92471 enable authenticated access to Cassandra JMX In-Scope Open None
Other Operations 93138 Procure hardware for Sentry In-Scope Open None
Other Operations 93531 secure.wikimedia.org entries still showing up in Google search results In-Scope Open None
Other Operations 94215 decommission cp3001 & cp3002 In-Scope Open None
Other Operations 94277 Convert snapshot hosts to use HHVM and trusty In-Scope Open None
Other Operations 94329 secure Cassandra/RESTBase cluster In-Scope Open None
Other Operations 94819 Audit racktables In-Scope Open None
Other Operations 94951 Enable the usage of `hhvm -m debug --debug-host ::1` from mw1017 so developers can step through code (think gdb) in production to see what is going wrong. In-Scope Open None
Other Operations 95052 Make ircecho much better In-Scope Open None
Other Operations 95053 ircecho should accept input via unix sockets In-Scope Open None
Other Operations 95054 Move ircecho config file to be YAML In-Scope Open None
Other Operations 95801 Allow customizing the alert message from graphite In-Scope Open None
Other Operations 97204 RFC: Request timeouts and retries In-Scope Open None
Other Operations 97524 ocg alarm ocg_job_status_queue 'flapping' In-Scope Open None
Other Operations 97909 Upgrade jobrunners to redis 2.8 In-Scope Open None
Other Operations 98831 Honor DNT header for access logs & varnish logs In-Scope Open None
Other Operations 98984 Check power supply balance settings on cp3030+ In-Scope Open None
Other Operations 100777 expose hosts in maintenance state so we can prevent scap from running on them In-Scope Open None
Other Operations 101141 udp rcvbuferrors and inerrors on graphite1001 In-Scope Open None
Other Operations 101585 document redis upgrade/restart procedures In-Scope Open None
Other Operations 102575 document graphite failover/backfill procedures In-Scope Open None
Other Operations 104258 Create instrumentation to monitor load on geoiplookup.wikimedia.org In-Scope Open None
Other Operations 104352 Make scap able to depool/repool servers via the conftool API In-Scope Open None
Other Operations 105780 Create a doc explaining the SLA between services and the monitoring tool In-Scope Open None
Other Operations 106346 setup an alertable threshold for Cassandra heap dumps In-Scope Open None
Other Operations 106664 Set up role accounts and feedback loops (FBL) with all providers In-Scope Open None
Other Operations 106937 Monitor [[Special:ListFiles]] for non 200 HTTP statuses in thumbnails In-Scope Open None
Other Operations 107108 Flow notification links on mobile point to desktop In-Scope Open None
Other Operations 107267 Backport python-shade from debian/testing to jessie-wikimedia In-Scope Open None
Other Operations 108985 Monitor MediaWiki sessions In-Scope Open None
Other Operations 109089 EPIC: Cultivating the Elasticsearch garden (operational lessons from 1.7.1 upgrade) In-Scope Open None
Other Operations 109090 Investigate the need for master only (non data nodes) in our ES cluster In-Scope Open None
Other Operations 109606 Re-evaluate Limesurvey In-Scope Open None
Other Operations 110169 Monitor redis memory/disk usage In-Scope Open None
Other Operations 110171 Alert when ES indexes are freezed for more than 30 minutes In-Scope Open None
Other Operations 110240 [Discussion] Consider validating JSON schemas when running x-ample tests? In-Scope Open None
Other Operations 111301 pmtpa remnants in trebuchet redis In-Scope Open None
Other Operations 111540 Clean up labs graphite datapoints In-Scope Open None
Other Operations 111595 Do not apply spam headers on email assessed NOT to be spam In-Scope Open None
Other Operations 111838 Some files had disappeared from Commons after renaming In-Scope Open None
Other Operations 111934 Nutcracker stats monitoring should only listen on localhost In-Scope Open None
Other Operations 112257 rename cassandra cluster In-Scope Open None
Other Operations 112648 enable restbase syslog/file logging In-Scope Open None
Other Operations 112774 solve mtp panel issue for row uplinks In-Scope Open None
Other Operations 113104 Set up a service IP for logstash Screep Open None
Other Operations 113733 column family cassandra metrics size In-Scope Open None
Other Operations 113785 Make the Shinken IRC alert and icinga-wm bots use colors In-Scope Open None
Other Operations 113792 Change LDAP cn to something more useful (was Rename "Dzahn" to "Daniel Zahn" in Gerrit) In-Scope Open None
Other Operations 114337 Assign 3 more servers to video scaler duty In-Scope Open None
Other Operations 114446 move human users out of UID range for system accounts In-Scope Open None
Other Operations 114801 operations-apache-config-lint replacement doesn't check syntax In-Scope Open None
Other Operations 114849 Log lines on flourine overflow at 8092 bytes. Screep Open None
Other Operations 115757 document debian packaging guidelines In-Scope Open None
Other Operations 115899 Move scap target configuration to etcd In-Scope Open None
Other Operations 115983 Old salt grains not removed if a role changes In-Scope Open None
Other Operations 116063 Hardware Automation Workflow - Overall Tracking In-Scope Open None
Other Operations 116288 Install mailman-api for internal use In-Scope Open None
Other Operations 116580 monitor postgresql replication status In-Scope Open None
Other Operations 116627 Include 5xx numbers in fluorine fatalmonitor In-Scope Open None
Other Operations 116742 Track amount of package updates on systems In-Scope Open None
Other Operations 116747 Meta task "Revamp user authentication" In-Scope Open None
Other Operations 116750 2FA for SSH access to the production cluster In-Scope Open None
Other Operations 116767 limit the impact of heavy/large graphite queries In-Scope Open None
Other Operations 116805 DomainKeys Identified Mail (DKIM) for phabricator.wikimedia.org In-Scope Open None
Other Operations 116951 Reprepro should bail if it can't read and sign using the root keys In-Scope Open None
Other Operations 117508 Make ops-l a list for humans again (no cheating) In-Scope Open None
Other Operations 117673 labs precise and jessie instance not accessible after provisioning In-Scope Open None
Other Operations 117826 TEST: redirect small portion of unauthenticated desktop users to mobile web In-Scope Open None
Other Operations 118154 determine hardware needs for dumps in eqiad and codfw In-Scope Open None
Other Operations 118331 Alert when used_memory gets too high for redis queues In-Scope Open None
Other Operations 118380 slow salt-call invocation on minions In-Scope Open None
Other Operations 118746 Goal: Strengthen Incident monitoring infrastructure In-Scope Open None
Other Operations 118812 Investigate mysterious_sysctl settings and figure out what to do with them In-Scope Open None
Other Operations 118829 Automate the provisioning and management of MediaWiki clusters In-Scope Open None
Other Operations 119274 Check incoming requests to secure.wm.o In-Scope Open None
Other Operations 119401 Untangle labs/production roles from labs/instance roles In-Scope Open None
Other Operations 119541 Self hosted puppetmaster is broken In-Scope Open None
Other Operations 119660 Set up LVS for labs dns recursors In-Scope Open None
Other Operations 119679 Rewrite http://download.wikimedia.org/mediawiki/ -> https://releases.wikimedia.org/mediawiki in less than 3 redirects In-Scope Open None
Other Operations 119718 Make it easier to ban misbehaving dashboards from graphite In-Scope Open None
Other Operations 119719 Enforce a minimum refresh period for grafana dashboards hitting graphite In-Scope Open None
Other Operations 119846 Redirect revisions from svn.wikimedia.org to https://phabricator.wikimedia.org/rSVN In-Scope Open None
Other Operations 120159 Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module In-Scope Open None
Other Operations 120165 Implement role based hiera lookups for labs In-Scope Open None
Other Operations 120377 labmon1001 graphite instance archiver keeps archiving the same instances In-Scope Open None
Other Operations 120379 restbase unable to start after machine reimage In-Scope Open None
Other Operations 120532 Use user-specific passwords for accessing EventLogging database In-Scope Open None
Other Operations 120585 Make l10nupdate user a system user In-Scope Open None
Other Operations 120631 Security: Is it safe to enable Zero spoofing In-Scope Open None
Other Operations 120683 logrotate/disk space on silver for nutcracker log In-Scope Open None
Other Operations 120856 Remove all out of warranty unused cp10xx's from A2 In-Scope Open None
Other Operations 121105 Mails from MediaWiki seem to get (partially) lost In-Scope Open None
Other Operations 121240 Network isolation for production and semi-production services In-Scope Open None
Other Operations 121583 setup/deploy tegmen/WMF6381 as monitoring host In-Scope Open None
Other Operations 121610 system users with UIDs > 500 In-Scope Open None
Other Operations 122069 jobrunner memory leaks In-Scope Open None
Other Operations 122144 Move most (all?) exim personal aliases to OIT In-Scope Open None
Other Operations 122210 Security audit for tftp on Carbon In-Scope Open None
Other Operations 122825 Service Ownership and Maintenance In-Scope Open None
Other Operations 122917 Provide a good download service of dumps from Wikimedia In-Scope Open None
Other Operations 123106 PNG thumbnail preview of SVG misses some text In-Scope Open None
Other Operations 123237 Provide production jessie image with node 4.2; use this for service-runner build command In-Scope Open None
Other Operations 123276 https://test.wikipedia.org/wiki/Bug%3F?action=history doesn't show the history page, unlike https://test.wikipedia.org/w/index.php?title=Bug%3F&action=history Screep Open None
Other Operations 123560 investigate rsync between dcs with encryption In-Scope Open None
Other Operations 123809 Module uwsgi doesn't allow passing multiple config params of same name In-Scope Open None
Other Operations 123818 setup YubiHSM and laptop at office In-Scope Open None
Other Operations 123918 'swift' user/group IDs should be consistent across the fleet In-Scope Open None
Other Operations 124101 Specific revisions of multiple files missing from Swift - 404 Not Found returned In-Scope Open None
Other Operations 124179 Improve access to and control over incident and metrics monitoring infrastructure In-Scope Open None
Other Operations 124185 Evaluate alternative web interfaces to icinga 1 core In-Scope Open None
Other Operations 124413 confctl should provide tags information after writing data In-Scope Open None
Other Operations 124646 Salt minions randomly crashing when the deployment server grain gets changed In-Scope Open None
Other Operations 124991 evaluate possibility for nscd use with useldap In-Scope Open None
Other Operations 125015 Requests to (hard) redirect pages return their target's contents but are counted as pageviews to the redirect page In-Scope Open None
Other Operations 125085 Split the API MediaWiki appserver pool into two external/internal pools In-Scope Open None
Other Operations 125205 Monitor hardware thermal issues In-Scope Open None
Other Operations 125411 Diamond load averages do not contain scaled versions In-Scope Open None
Other Operations 125442 es2009 degraded RAID In-Scope Open None
Other Operations 125629 Depool proxies temporarily while scap is ongoing to avoid taxing those nodes In-Scope Open None
Other Operations 125735 Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out In-Scope Open None
Other Operations 125752 setup/deploy sarin(WMF5851) as a salt master in codfw In-Scope Open None
Other Operations 125906 Evaluate Brotli compression for Cassandra In-Scope Open None
Other Operations 126083 overhaul labstore setup [tracking] In-Scope Open None
Other Operations 126158 [RFC] Alert about *when* partitions will run out of space, not a percentage/absolute number In-Scope Open None
Other Operations 126221 Evaluate efficacy of DateTieredCompactionStrategy In-Scope Open None
Other Operations 126281 [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") In-Scope Open None
Other Operations 126295 Spike: What do we have to package to run the Programs and Events dashboard on production? In-Scope Open None
Other Operations 126574 puppet should try to mount all mountable swift filesystems In-Scope Open None
Other Operations 126619 cassandra slow streaming during (de)commission In-Scope Open None
Other Operations 126989 MediaWiki logging & encryption In-Scope Open None
Other Operations 127054 pinentry-gtk2 pulls in a lot of unneeded Gnome/GTK libs In-Scope Open None
Other Operations 127488 (re)move problemsdonating aliases In-Scope Open None
Other Operations 127549 move travel related aliases to OIT In-Scope Open None
Other Operations 127550 status of studentgroups@ and studentclubs@ mail aliases? In-Scope Open None
Other Operations 127602 Reboot during puppet run causes /var/lib/puppet/state/agent_catalog_run.lock to be left and puppet to not start running again In-Scope Open None
Other Operations 127797 document all puppet classes / defined types!? In-Scope Open None
Other Operations 127825 Re-add intel-microcode In-Scope Open None
Other Operations 128590 Cassandra uses default ip address for outbound packets while bootstrapping In-Scope Open None
Other Operations 128615 Get rid of Tool Labs home page check from shinken In-Scope Open None
Other Operations 128715 Add other Tools administrators to the Icinga notification group In-Scope Open None
Other Operations 128716 Make icinga-wm report Tools homepage check at #wikimedia-labs, too In-Scope Open None
Other Operations 129180 Preserve SSH host key when re-imaging hosts In-Scope Open None
Other Operations 129188 mw2212 unresponsive In-Scope Open None
Other Operations 129222 Icinga disk space check should also check inode usage In-Scope Open None
Other Operations 129224 on labcontrol1001, /var/cache/salt has too many files! In-Scope Open None
Other Operations 129621 "internal_api_error_MWException: [dbf916b7] Exception Caught: Could not acquire lock for" for some uploads (during upload with Pywikibot OAuth) In-Scope Open None
Other Operations 129645 Remove sca100x from the list of Mathoid's minions In-Scope Open None
Other Operations 129841 Many minions fail to connect to salt master since 10:39 In-Scope Open None
Other Operations 129847 conftool-merge should report which node is setting attributes for In-Scope Open None
Other Operations 129963 Update memcached package and configuration options In-Scope Open None
Other Operations 130209 Collect threaddumps from elasticsearch at regular intervals In-Scope Open None
Other Operations 130512 Grafana: Job Queue Health: Panel is displayed incorrectly In-Scope Open None
Other Operations 130590 Have dedicated master nodes for elasticsearch In-Scope Open None
Other Operations 130593 investigate slapd memory leak In-Scope Open None
Other Operations 130617 Collect metrics on pool counter usage In-Scope Open None
Other Operations 130709 authoritative copy of 'root' files for upload.wikimedia.org is only in swift In-Scope Open None
Other Operations 131326 smokeping config puppetization issue? In-Scope Open None
Other Operations 131748 Refresh the appservers puppet code/configs In-Scope Open None
Other Operations 131832 Unable to restore file that has a very large file size In-Scope Open None
Other Operations 131928 Upgrade jessie systems from Linux 3.19 to 4.4 In-Scope Open None
Other Operations 131966 Default gateway unreachable on baham.wikimedia.org after reboot In-Scope Open None
Other Operations 132104 Consider moving policy.wikimedia.org away from WordPress.com In-Scope Open None
Other Operations 132216 Setting up bulk proxies pointing to a multiwiki mediawiki-vagrant setup running on a labs vm In-Scope Open None
Other Operations 132256 Analytics hosts showed high temperature alarms In-Scope Open None
Other Operations 132324 Tracking and Reducing cron-spam from root@ In-Scope Open None
Other Operations 132325 Weak digest algorithm (SHA1) used to sign InRelease on apt.wikimedia.org In-Scope Open None
Other Operations 132532 rsync module doesnt work on trusty In-Scope Open None
Other Operations 132856 Write documentation on how / when to use custom Diamond metrics collectors In-Scope Open None
Other Operations 132921 Unable to delete file pages on commons: MWException/LocalFileLockError: "Could not acquire lock" In-Scope Open None
Other Operations 133091 Highest SSTables / read thresholds In-Scope Open None
Other Operations 133093 Investigate idle appservers in codfw In-Scope Open None
Other Operations 133110 Check for an oversized exim4 queue indicating mail delivery failures In-Scope Open None
Other Operations 133164 Document eqiad/codfw transition plan for OCG In-Scope Open None
Other Operations 133179 Redis monitoring needs to be improved In-Scope Open None
Other Operations 133191 Make SPF for wikimedia.org more strict In-Scope Open None
Other Operations 133318 High levels of PoolCounter errors should trigger alerts In-Scope Open None
Other Operations 133392 save grafana dashboards in revision control / puppet In-Scope Open None
Other Operations 133476 Proposal: Centralize OTRS login methodology In-Scope Open None
Other Operations 133643 Upstream our Diamond PowerDNSRecursorCollector In-Scope Open None
Other Operations 133656 Have a paging check for Nova API accessible In-Scope Open None
Other Operations 133674 HHVM is leaking memory on the API appservers In-Scope Open None
Other Operations 133696 NIH db misbehaviour causing problems to Citoid In-Scope Open 1.0
Other Operations 133744 Epic: switch Maps to production status In-Scope Open None
Other Operations 133844 Improve Elasticsearch icinga alerting In-Scope Open None
Other Operations 133913 Completely port l10nupdate to scap In-Scope Open None
Other Operations 133979 puppet compiler error on catalog with non-ascii output In-Scope Open None
Other Operations 134237 Graphoid returns a 400 on MW API time-out In-Scope Open None
Other Operations 134271 Replace ircd-ratbox with something newer/maintained In-Scope Open None
Other Operations 134326 udpmxircecho should write stats of messages processed and we should alert when that drops to zero In-Scope Open None
Other Operations 134458 status.wikimedia.org should use some Wikimedia favicon if possible In-Scope Open None
Other Operations 134551 Create functional cluster checks for all services (and have them page!) In-Scope Open None
Other Operations 134811 Consider REST with SSL (HyperSwitch/Cassandra) for session storage In-Scope Open None
Other Operations 134875 udpmxircecho spam/not working if unable to connect to irc server In-Scope Open None
Other Operations 134890 check status of multiple systemd units In-Scope Open None
Other Operations 135113 Rationalize our jobqueues redis topology In-Scope Open None
Other Operations 135122 Reduce etcd technical debt In-Scope Open None
Other Operations 135124 Deploy etcddump (or another etcd dump & load tool) to production In-Scope Open None
Other Operations 135125 Install a second etcd cluster in codfw In-Scope Open None
Other Operations 135128 Turn on etcd TLS for intra-cluster communications In-Scope Open None
Other Operations 135318 Document how to handle 'inconsistent state within the internal storage backends' issues In-Scope Open None
Other Operations 135338 On Trusty and Jessie PHP yields: PHP Deprecated: Comments starting with '#' are deprecated in /etc/php5/cli/conf.d/20-xhprof.ini on line 2 In-Scope Open None
Other Operations 135385 investigate carbon-c-relay stalls/drops towards graphite2002 In-Scope Open None
Other Operations 135595 mod_deflate + mod_uwsgi causing mangled apache responses In-Scope Open None
Other Operations 135723 Restarts of ganglia-monitor are unreliable In-Scope Open None
Other Operations 135991 Automated service restarts for common low-level system services In-Scope Open None
Other Operations 136094 Race condition in setting net.netfilter.nf_conntrack_tcp_timeout_time_wait In-Scope Open None
Other Operations 136311 Monitor the BMC's event log for hardware errors In-Scope Open None
Other Operations 136312 encrypt syslog traffic In-Scope Open None
Other Operations 136403 Move cp3030+ from OE14 to OE13 in racktables In-Scope Open None
Other Operations 136429 [EPIC] Migrate base image to Debian Jessie In-Scope Open None
Other Operations 136562 Audit/fix hosts with no RAID configured In-Scope Open None
Other Operations 136603 Update limit.sh to support systemd-based cgroup management In-Scope Open None
Other Operations 136702 Increase time before alert for elasticsearch disk space issues In-Scope Open None
Other Operations 137176 catch-all apache vhost on the cluster should return 404 for non-existing sites In-Scope Open None
Other Operations 137181 Update restbase catchpoint metric In-Scope Open None
Other Operations 137217 Clean up apt:pin of python modules used for Nodepool In-Scope Open None
Other Operations 137229 Tune thread for osm2pgsql / postgres max connections for Maps In-Scope Open None
Other Operations 137397 revisit swift (sys)logging In-Scope Open None
Other Operations 137616 Epic: cultivating the Maps garden In-Scope Open None
Other Operations 137791 libcglib3-java replaces libcglib-java in Jessie In-Scope Open None
Other Operations 137939 Increase frequency of OSM replication In-Scope Open None
Other Operations 138017 Improve automation around Maps servers In-Scope Open None
Other Operations 138136 MB Lateefi Fonts for Sindhi Wikipedia. In-Scope Open None
Other Operations 138314 mobileapps 500s following reboot of restbase1007 In-Scope Open None
Other Operations 138496 bring swift eqiad to one zone per row In-Scope Open None
Other Operations 138685 notebook1001 shown as DOWN in icinga, due to firewall rules In-Scope Open None
Other Operations 138758 diamond: certain counters always calculated as 0 In-Scope Open None
Other Operations 138799 Create a simple puppet role for setting up a singlenode kubernetes install In-Scope Open None
Other Operations 138821 extend existing graphite whisper files retention to five years In-Scope Open None
Other Operations 138866 Update & standardize Platform-specific_documentation for HP servers In-Scope Open None
Other Operations 139971 access_new_install role vs. Labs vs. the future In-Scope Open None
Other Operations 140075 investigate swift used space spikes since June 2016 In-Scope Open None
Other Operations 140141 Install mscorefonts on scaling servers for SVG rendering In-Scope Open None
Other Operations 140270 Determine a core set or a checklist of permissions for deployment purpose In-Scope Open None
Other Operations 140316 Add granularity limiter (g=) to wikimedia.org DKIM record(s) In-Scope Open None
Other Operations 140441 reinstall rcs100[12] with RAID In-Scope Open None
Other Operations 140442 reinstall rdb100[56] with RAID In-Scope Open None
Other Operations 140536 Thumbnails of some specific images show unwanted black lines In-Scope Open None
Other Operations 140594 svn.wikimedia.org redirects to Diffusion main page, hence hard to find e.g. "flexbisonparse" In-Scope Open None
Other Operations 140813 Protect sensitive user-related information with a UserData / auth / session service In-Scope Open None
Other Operations 140879 503 error raises again while trying to load a Wikidata page In-Scope Open None
Other Operations 140942 Tracking: Monitoring and alerts for "business" metrics In-Scope Open None
Other Operations 141038 implement icinga paging for non-ops teams In-Scope Open None
Other Operations 141128 determine/process/document bios firmware tracking/updating policies In-Scope Open None
Other Operations 141186 Map caches metrics look broken In-Scope Open None
Other Operations 141520 "MediaWiki exceptions and fatals per minute" alarm is too slow (half an hour delay!) In-Scope Open None
Other Operations 141524 eventbus should send statsd in batches In-Scope Open None
Other Operations 141704 Storage backend errors on commons when deleting/restoring pages In-Scope Open None
Other Operations 141756 audit / test / upgrade hp smartarray P840 firmware In-Scope Open None
Other Operations 141783 Add monitoring for detecting when logstash services are down In-Scope Open None
Other Operations 141897 Review new service 'pre-deployment to production' checklist In-Scope Open None
Other Operations 141959 Moving network::external to hiera broke much of labs In-Scope Open None
Other Operations 142002 Clean up puppet & configs for ORES In-Scope Open None
Other Operations 142205 use granularity (g=) restrictions for wikimedia.org fundraising DKIM records In-Scope Open None
Other Operations 142815 Enhance account handling (meta bug) In-Scope Open None
Other Operations 142821 Synchronise groups defined in data.yaml to LDAP In-Scope Open None
Other Operations 142827 Enforce reference to Phabricator task for all commits to modules/admin/data/data.yaml In-Scope Open None
Other Operations 142984 Review lists of config/sysctl recommendations by "kernel self-protection project" In-Scope Open None
Other Operations 142991 Enable "upload by url" feature at zhwiki In-Scope Open None
Other Operations 143405 Move labs 'instances' data to graphite labs In-Scope Open None
Other Operations 143552 Make elasticsearch configuration more robust to loss of network connectivity In-Scope Open None
Other Operations 143556 Setting up grafana should also setup Anonymous read-only access for the default org In-Scope Open None
Other Operations 143695 Upgrade mc1* cluster to Linux 4.4 In-Scope Open None
Other Operations 143931 Update ICU version to 55.1 In-Scope Open None
Other Operations 144006 Move the MW Beta appservers to Debian In-Scope Open None
Other Operations 144431 RESTBase k-r-v as Cassandra anti-pattern In-Scope Open None
Other Operations 144479 Ensure thumbor container access is preserved by mw filebackend setzoneaccess In-Scope Open None
Other Operations 144539 Remove /srv/deployment/wdqs/wdqs/rules.log symlink In-Scope Open None
Other Operations 144933 Cleanup debconf handling in mailman puppet setup In-Scope Open None
Other Operations 145065 Decrease time required to fully restart the Cirrus elasticsearch clusters In-Scope Open None
Other Operations 145293 E-mails not being received by OTRS In-Scope Open None
Other Operations 145659 Port application-specific metrics from ganglia to prometheus In-Scope Open None
Other Operations 145742 Migrate video scalers to jessie In-Scope Open None
Other Operations 146090 High failure rate of account creation should trigger an alarm / page people In-Scope Open None
Other Operations 146113 wtp2019 - hardware (RAM) check In-Scope Open None
Other Operations 146285 Switch mwscript from Zend PHP5 to default php alternative (egHHVM) In-Scope Open None
Other Operations 146355 Replace etcd internal auth mechanism with a frontend proxy In-Scope Open None
Other Operations 146627 Make deployment-prep puppetmaster more similar to Production puppetmaster In-Scope Open None
Other Operations 146657 create notifications about user accounts that have not been used for a long time In-Scope Open None
Other Operations 146664 Limit resources used by ORES In-Scope Open None
Other Operations 146841 Reach out to Google about @yahoo.com emails not reaching gmail inboxes (when sent to mailing lists) In-Scope Open None
Other Operations 146914 grain-ensure erroneous mismatch with (bool)True vs (str)true In-Scope Open None
Other Operations 146968 OTRS spam classification methods and systems In-Scope Open None
Other Operations 147040 Two recently uploaded files have disappeared (404) In-Scope Open None
Other Operations 147204 Update confd package In-Scope Open None
Other Operations 147872 Rename rhodium to puppetmaster1003 In-Scope Open None
Other Operations 147905 investigate lead hardware issue In-Scope Open None
Other Operations 147923 Extract metrics from logs In-Scope Open None
Other Operations 148017 lvs2002 repeated usb connect/disconnect message In-Scope Open None
Other Operations 148048 Store Wikimedia unified account name (SUL) in LDAP directory In-Scope Open None
Other Operations 148061 Feasibility of hosting podcast setup on Wikimedia servers In-Scope Open None
Other Operations 148541 Replace Torrus with Prometheus snmp_exporter for PDUs monitoring In-Scope Open None
Other Operations 148567 Restrict outgoing network connections from Electron render service In-Scope Open None
Other Operations 148614 Icinga check for Tor In-Scope Open None
Other Operations 148637 Port redis statistics from ganglia to prometheus In-Scope Open None
Other Operations 148647 refresh swift hardware in codfw/eqiad In-Scope Open None
Other Operations 148693 Deploy IDS rendering engine to production In-Scope Open None
Other Operations 148843 GPU upgrade for stats machine In-Scope Open None
Other Operations 148968 Build Kubernetes for production use In-Scope Open None
Other Operations 148986 Firewall sets not being loaded post-reboot due to a @resolve race on jessie In-Scope Open None
Other Operations 149057 Designate seems very slow to delete records? In-Scope Open None
Other Operations 149180 Trebuchet targets for test/testrepo are out of date In-Scope Open None
Other Operations 149287 Heating alerts for mw servers in eqiad In-Scope Open None
Other Operations 149298 Reimage/rename codfw pool counters In-Scope Open None
Other Operations 149421 Long running mediawiki web requests impacts service availability, specially databases In-Scope Open None
Other Operations 149432 puppet compiler claims "no change" when catalogs are actually different In-Scope Open None
Other Operations 149543 Setup PAWS internal experimentally on notebook* nodes In-Scope Open None
Other Operations 149589 Puppet tab in Horizon unusably slow In-Scope Open None
Other Operations 149617 Integrating MediaWiki (and other services) with dynamic configuration In-Scope Open None
Other Operations 149804 Review of ferm services without srange In-Scope Open None
Other Operations 149812 Set up docker building environment for production In-Scope Open None
Other Operations 149845 Something is wrong with installer root disk stuff In-Scope Open None
Other Operations 149885 Investigate Swift as a storage backend for maps tiles In-Scope Open None
Other Operations 150020 Refactor puppet-postgresql module to use custom types In-Scope Open None
Other Operations 150108 fix partition scheme for logstash ingester hosts In-Scope Open None
Other Operations 150160 Remote IPMI doesn't work for ~2% of the fleet In-Scope Open None
Other Operations 150185 Deploy ElectronPdfService Extension to production In-Scope Open None
Other Operations 150206 ms-be1016 controller cache failure In-Scope Open None
Other Operations 150300 icinga notification if elevated writing to badpass.log In-Scope Open None
Other Operations 150356 Wikidata Query Service is overly verbose toward logstash In-Scope Open None
Other Operations 150396 Phabricator leaving old files in /tmp In-Scope Open None
Other Operations 150456 puppet compiler fails with modules using puppetdb In-Scope Open None
Other Operations 150460 Configure maps cluster to send statsd metrics to the statsd endpoint in the same datacenter In-Scope Open None
Other Operations 150466 publish kartotherian / tilerator metrics by cluster In-Scope Open None
Other Operations 150486 Deploy federation for Prometheus In-Scope Open None
Other Operations 150532 Upgrade qemu on ganeti clusters to 2.7 In-Scope Open None
Other Operations 150651 Information missing from racktables In-Scope Open None
Other Operations 150672 Provide a /parsoid directory on releases.wikimedia.org In-Scope Open None
Other Operations 150771 Secondary production Jenkins for CI In-Scope Open None
Other Operations 150811 Evaluate ScyllaDB as a near-term replacement to Cassandra In-Scope Open None
Other Operations 150822 Internal PKI for secure communication - Barcelona Ops offsite 2016 In-Scope Open None
Other Operations 150823 Puppet CA rollover In-Scope Open None
Other Operations 150871 [EPIC] (Proposal) Replicate core OCG features and sunset OCG service In-Scope Open None
Other Operations 150872 Replace OCG in collection extension with Electron In-Scope Open None
Other Operations 150874 Collate wikimedia pages into a single html wikimedia page that can then be rendered into a single pdf In-Scope Open None
Other Operations 150875 Confirm attribution needs In-Scope Open None
Other Operations 150912 Class 'Memcached' not found when running mwscript eval.php on debug servers In-Scope Open None
Other Operations 150917 Remove deprecated features from book creator UI In-Scope Open None
Other Operations 151009 Provide authenticated access to Prometheus native web interface In-Scope Open None
Other Operations 151045 Extending Yubico 2FA for production use (meta bug) In-Scope Open None
Other Operations 151046 Fully puppetise yubikey-val In-Scope Open None
Other Operations 151047 Integrate Yubikey into data.yaml In-Scope Open None
Other Operations 151048 Icinga monitoring for Yubikey components In-Scope Open None
Other Operations 151049 Run systematic availability tests In-Scope Open None
Other Operations 151050 Proper documentation for Yubico 2FA for production use In-Scope Open None
Other Operations 151075 setup/install restbase-dev100[123] In-Scope Open None
Other Operations 151273 lvs4002 power supply failure In-Scope Open None
Other Operations 151275 cp4008 and cp4012 running on single PSU In-Scope Open None
Other Operations 151304 tmpreaper possible race condition In-Scope Open None
Other Operations 151310 create-dbusers service failing on labstore1004 In-Scope Open None
Other Operations 151314 logrotate failing with $FILE.1.gz: File exists In-Scope Open None
Other Operations 151317 stat user crontab on stat hosts for old file removal In-Scope Open None
Other Operations 151322 labstore systemd state Icinga alarms In-Scope Open None
Other Operations 151486 Silver anomalies In-Scope Open None
Other Operations 151489 silver: /dev/md2 mounted twice In-Scope Open None
Other Operations 151493 silver: / partition low on space In-Scope Open None
Other Operations 151554 Track incoming HTTP request count on the Thumbor boxes In-Scope Open None
Other Operations 151632 Fix Icinga checks for test/decom servers In-Scope Open None
Other Operations 151648 Implement storage policies for swift In-Scope Open None
Other Operations 151702 API cluster failure / OOM In-Scope Open None
Other Operations 151748 Cron conflict for kafkatee logrotate on oxygen In-Scope Open None
Other Operations 152073 Check concurrency/retry/timeout limits and syncronize those between services In-Scope Open None
Other Operations 152078 Load balancing "external" traffic to the Kubernetes cluster in production In-Scope Open None
Other Operations 152100 should we make privatewiki list available to puppet without maintaining two lists? In-Scope Open None
Other Operations 152129 reinstall iridium (phabricator) as phab1001 with jessie In-Scope Open None
Other Operations 152340 labservices1001 down, suspected overheating In-Scope Open None
Other Operations 152439 cronspam from labtestservices2001 /etc/dns-floating-ip-updater.py > /dev/null In-Scope Open None
Other Operations 152445 Move prometheus entry point off port 80 In-Scope Open None
Other Operations 152562 Port fundraising stats off Ganglia In-Scope Open None
Other Operations 152632 Explore hosting the multimedia commons use case In-Scope Open None
Other Operations 152724 Current state and next steps for RESTBase storage In-Scope Open None
Other Operations 152767 Missing Labs hiera entry in labs-private repo In-Scope Open None
Other Operations 152782 Kibana functionality missing after upgrade: histograms In-Scope Open None
Other Operations 152791 Improvements to Ganglia-equivalent Prometheus dashboards In-Scope Open None
Other Operations 153068 Consider mounting labs NFS labstore1003.eqiad.wmnet:/scratch for server-side uploads In-Scope Open None
Other Operations 153083 Investigate I/O limits on elasticsearch servers In-Scope Open None
Other Operations 153099 Initial OpenStack Neutron PoC deployment in Labtest In-Scope Open None
Other Operations 153246 Puppet failures with "Attempt to assign to a reserved variable name: 'trusted'" In-Scope Open None
Other Operations 153248 restbase-test100[13] lost power redundancy In-Scope Open None
Other Operations 153279 labnet/ labtestnet2001 - disk space - nova-api.log needs rotation In-Scope Open None
Other Operations 153416 docker-engine pulled into our repositories only keeps the latest version In-Scope Open None
Other Operations 153703 Option: Consider switching back to leveled compaction (LCS) In-Scope Open None
Other Operations 153770 Error on RCStream server startup for the "flash policy server" In-Scope Open None
Other Operations 153772 Upstream prematurely closed connection In-Scope Open None
Other Operations 153773 rcstream service - gevent dependency incompatibility In-Scope Open None
Other Operations 153816 apache::static_site is not working In-Scope Open None
Other Operations 153940 Logrotate fails for: "$FILE No such file or directory" In-Scope Open None
Other Operations 154627 Production error message (when servers are down) points users to donate link which is likely to produce the same error message In-Scope Open None
Other Operations 154658 Prepare and improve the datacenter switchover procedure In-Scope Open None
Other Operations 154665 Look into behaviour of /etc/exim4/update-exim4.conf.conf related to updates In-Scope Open None
Other Operations 154915 Get rid of "import realm.pp" in manifests/site.pp In-Scope Open None
Other Operations 155129 Create prometheus nutcracker exporter In-Scope Open None
Other Operations 155209 Increase $wgHTTPImportTimeout to a higher value on WMF wikis In-Scope Open None
Other Operations 155401 Integrate jessie 8.7 point release In-Scope Open None
Other Operations 155690 troubleshoot drac on ms-be2010.codfw.wmnet In-Scope Open None
Other Operations 155761 DNS repo: add Jenkins job to ensure there are no duplicates In-Scope Open None
Other Operations 155869 Fix permissions for systemd file In-Scope Open None
Other Operations 155929 Create /community-beacon alternative entry point In-Scope Open None
Other Operations 156136 Increase swift replication factor for accounts In-Scope Open None
Other Operations 156140 Lots of hosts with hyperthreading disabled In-Scope Open None
Other Operations 156143 High CPU usage from swift-proxy on frontend machines In-Scope Open None
Other Operations 156232 confctl SubjectAltNameWarning after python-urllib3 upgrade In-Scope Open None
Other Operations 156398 Decommission or repair old asw-c2-eqiad In-Scope Open None
Other Operations 156475 Investigate spike in 500s during asw-c2-eqiad replacement In-Scope Open None
Other Operations 156495 Upgrade nginx on notebook* servers In-Scope Open None
Other Operations 156544 Create backups of Wikimedia content in diverse geographic places In-Scope Open None
Other Operations 156570 Investigate issues with wikitech-static.wikimedia.org In-Scope Open None
Other Operations 156924 Allow integration of data from etcd into the MediaWiki configuration In-Scope Open None
Other Operations 156937 Provide cross-dc redundancy (active-active or active-passive) to all important misc services In-Scope Open None
Other Operations 156955 Standardizing our partman recipes In-Scope Open None
Other Operations 156982 Cleanup tools nfs share on labstore1004/5 In-Scope Open None
Other Operations 157038 Make it possible to run the mediawiki testsuite against a staging repo of apt.wikimedia.org In-Scope Open None
Other Operations 157089 Add storage to Change-Prop for deduplication In-Scope Open None
Other Operations 157306 Fix config file handling for /etc/hhvm/php.ini In-Scope Open None
Other Operations 157496 A few hosts never get clean puppet compiler runs In-Scope Open None
Other Operations 157761 use htpasswd instead of htdigest for arbcom archive passwords In-Scope Open None
Other Operations 157807 Reinstall Analytics Hadoop Cluster with Debian Jessie In-Scope Open 0.0
Other Operations 157853 Replace nrpe 2.15 (& evaluate alternatives) In-Scope Open None
Other Operations 157972 Puppet fails only once when restarting ferm is not successful In-Scope Open None
Other Operations 158022 make apt.wikimedia.org HA In-Scope Open None
Other Operations 158029 luasandbox profiler doesn't sort results due to an HHVM bug In-Scope Open None
Other Operations 158196 Reimage labstore1001 and labstore1002 for DRBD storage setup In-Scope Open None
Other Operations 158288 Unclean stop of jobrunner service via puppet In-Scope Open None
Other Operations 158429 Switch to predictable network interface names? In-Scope Open None
Other Operations 158434 Phabricator: Make sure phabricator works properly including our puppet roles on jessie In-Scope Open None
Other Operations 158562 Manage apt sources via puppet? In-Scope Open None
Other Operations 158583 Restructure our internal repositories further In-Scope Open None
Other Operations 158757 Puppet certificate missing subjectAltName In-Scope Open None
Other Operations 158837 Consolidate performance website and related software In-Scope Open None
Other Operations 158913 Move labstore1002 and labstore1002-array1 and labstore1002-array2 to different rack (currently in C3) In-Scope Open None
Other Operations 158915 Make sure replying to emails in gerrit 2.14 works In-Scope Open None
Other Operations 159242 Segmentation fault creating thumbnail In-Scope Open None
Other Operations 159354 Move coal from graphite machine(s) In-Scope Open None
Other Operations 159524 backup space is used unwisely In-Scope Open None
Other Operations 159536 Puppet constantly trying to stop the already stopped puppetmaster process on Trusty In-Scope Open None
Other Operations 159661 Improve Terbium (and wasat) userland to process server side uploads In-Scope Open None
Other Operations 159687 etcd switchover/enhancements In-Scope Open None
Other Operations 159750 E-mail for people in different OIT LDAP object unit In-Scope Open None
Other Operations 159756 setup netmon1002.wikimedia.org In-Scope Open None
Other Operations 159830 Sanity check global-multiwrite logs for ConfirmEdit usage In-Scope Open None
Other Operations 159922 pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 In-Scope Open None
Other Operations 160060 Icinga check for sysctl settings In-Scope Open None
Other Operations 160071 Add slabinfo prometheus exporter In-Scope Open None
Other Operations 160097 audit spare disk levels for codfw & eqiad utlized storage in servers In-Scope Open None
Other Operations 160101 Upgrade php5-json .deb to at least 1.3.8 In-Scope Open None
Other Operations 160146 jobrunner/jobchron services fail in codfw In-Scope Open None
Other Operations 160158 Make disabled accounts visible in the corp mirror LDAP replica In-Scope Open None
Other Operations 160229 Back up of Commons files In-Scope Open None
Other Operations 160412 Add lock_wait_timeout to maintain_views and maintain-meta_p In-Scope Open None
Other Operations 160529 Sender email spoofing In-Scope Open None
Other Operations 160677 Effects on adjusting Prometheus retention In-Scope Open None
Other Operations 160941 Improve SSH access information in onboarding documentation In-Scope Open None
Other Operations 161003 Cross-check disabled accounts from corp LDAP against data.yaml In-Scope Open None
Other Operations 161004 Remove disabled users from internal mailing lists In-Scope Open None
Other Operations 161096 confctl no longer logs a non-changing state change In-Scope Open None
Other Operations 161145 Fix the general problem of randomly-bad puppet agent cron timings within redundant clusters In-Scope Open None
Other Operations 161296 Upgrade mysqld_exporter to 0.10.0 In-Scope Open None
Other Operations 161528 incident 20170323-wikibase did not trigger Icinga paging In-Scope Open None
Other Operations 161566 add support to offboard-user to support mailman list removal In-Scope Open None
Other Operations 161598 Monitor HHVM bytecode cache depletion on mediawiki app servers In-Scope Open None
Other Operations 161717 Point swiftrepl to swift HTTPS In-Scope Open None
Other Operations 161834 Undo special tools-home and tools-project share definitions for NFS In-Scope Open None
Other Operations 161835 Convert labstore cluster configuration to hiera and profiles In-Scope Open None
Other Operations 161864 404 error while accessing some images files e.g. djvu and jpg In-Scope Open None
Other Operations 161899 Investigate ceasing self-service new Trusty instance creation in Labs In-Scope Open None
Other Operations 161904 decommission backup4001 In-Scope Open None
Other Operations 161918 videoscalers (mw1168, mw1169) - high load / overheating In-Scope Open None
Other Operations 161920 logrotate for ruthenium In-Scope Open None
Other Operations 162013 etcd cluster in codfw has raft consensus issues Screep Open None
Other Operations 162029 Migrate all jessie hosts to Linux 4.9 Screep Open None
Other Operations 162037 Use SSL certificates with discovery entry for elasticsearch Screep Open None
Other Operations 162039 Prepare to service applications from kubernetes Screep Open None
Other Operations 162043 Define a process to keep images up-to-date on similar standards as the rest of production Screep Open None
Other Operations 162090 Investigate alternative RAID strategies for labstore1001/2 Screep Open None
Other Operations 162122 Swiftrepl was stuck in an infinite loop since days Screep Open None
Other Operations 162123 Running swiftrepl is not puppetized Screep Open None
Other Operations 162245 Enable GC for HHVM CLI (at least for dump runners) Screep Open None
Other Operations 162327 certspotter on einsteinium has issues talking to external Screep Open None
Other Operations 162735 Hyperthreading disabled on restbase2002.codfw.wmnet & restbase1015.codfw.wmnet Screep Open None
Other Operations 162765 Set up grafana alerting for services Screep Open None
Other Operations 162770 SATA errors for stat1004 in the dmesg Screep Open None
Other Operations 162780 ocg1003 partitions are severely misconfigured Screep Open None
Other Operations 162792 Reduce Swift technical debt Screep Open None
Other Operations 162796 Delete non-used and/or non-requested thumbnail sizes periodically Screep Open None
Other Operations 162815 Videoscalers overloaded once in a while triggering alarms Screep Open None
Other Operations 162850 CPU throttling on DELL PowerEdge R320 Screep Open None
Other Operations 162857 Some Core availability Catchpoint tests might be more expensive than they need to be Screep Open None
Other Operations 162875 Write graceful rolling restart script for Thumbor Screep Open None
Other Operations 162949 hosts with puppet compiler failures on every run Screep Open None
Other Operations 162955 rebuild tools-grid-master as a large instance Screep Open None
Other Operations 163033 Create grafana dashboard for video scaler job runners Screep Open None
Other Operations 163068 More missing 'original' files on Commons Screep Open None
Other Operations 163288 Decide on /var/lib vs /home as locations of homedir for l10nupdate Screep Open None
Other Operations 163336 kube-proxy pulls in docker and starts service even when it isnt needed Screep Open None
Other Operations 163346 mw2256 - hardware issue Screep Open None
Other Operations 163354 Find a way to verify mediawiki-config IPs ahead of datacenter switchovers Screep Open None
Other Operations 163362 audit all codfw pdu tower draws Screep Open None
Other Operations 163393 Determine appropriate proxy_read_timeout setting for Tools Proxy Screep Open None
Other Operations 163402 Ensure we can survive a loss of labservices1001 Screep Open None
Other Operations 163507 Intermittent DB connectivity problem on phabricator, needs investigation Screep Open None
Other Operations 163667 Fix UIDs for deployment server users Screep Open None
Other Operations 163673 Some swift disks wrongly mounted on 5 ms-be hosts Screep Open None
Other Operations 163698 Add flood protection to the ircecho bot (icinga-wm) Screep Open None
Other Operations 163823 During labservices1001 failover fqdn changed from foo.project.eqiad.wmflabs to foo.eqiad.wmflabs Screep Open None
Other Operations 163938 setup/install phab1001.eqiad.wmnet Screep Open None
Other Operations 163996 Icinga check for ipv6 host reachability Screep Open None
Other Operations 163998 check_hpssacli should report on battery failures and cache disabled Screep Open None
Other Operations 164030 setup releases1001.eqiad.wmnet (was: setup mwreleases1001) Screep Open None
Other Operations 164042 Racktables: clearly show when hosts are decommissioned Screep Open None
Other Operations 164123 tools-k8s-master-01 has two floating IPs Screep Open None
Other Operations 164206 Icinga loses downtime entries, causing alert and page spam Screep Open None
Other Operations 164209 webpagetest-alerts: Difference in size authenticated Screep Open None
Other Operations 164238 move icinga contacts file to public repo Screep Open None
Other Operations 164248 HTTP responses from app servers sometimes stall for >1s Screep Open None
Other Operations 164290 Set up external DNS record for wikitech-static Screep Open None
Other Operations 164341 Decommission old memcached hosts - mc1001->mc1018 Screep Open None
Other Operations 164460 Use DNS discovery record for deployment CNAME Screep Open None
Other Operations 164490 maintain-meta_p hangs on connecting to wikimedia.org.uk Screep Open None
Other Operations 164703 Integrate jessie 8.8 point release Screep Open None
Other Operations 164738 Investigate a simplified replication model for the Redis Job Queues Screep Open None
Other Operations 164819 reprepro: Support for buildinfo files / dbgsym packages Screep Open None
Other Operations 164993 archiva artifact links point to 127.0.0.1 Screep Open None
Other Operations 165105 Wiley requests for DOI and some other publishers don't work in production Screep Open None
Other Operations 165136 Ferm rules for labstore NFS hosts Screep Open None
Other Operations 165170 rack/setup/install ores2001-2009 Screep Open None
Other Operations 165171 rack/setup/install ores1001-1009 Screep Open None
Other Operations 165173 rack/setup/install dumpsdata100[12] Screep Open None
Other Operations 165323 Add Prometheus machine metric to track core dumps Screep Open None
Other Operations 165345 decommission indium Screep Open None
Other Operations 165348 Check long-running screen/tmux sessions Screep Open None
Other Operations 165366 rack/setup/install replacement stat1006 (stat1003 replacement) Screep Open None
Other Operations 165368 rack/setup/install replacement to stat1005 (stat1002 replacement) Screep Open None
Other Operations 165511 Change automatic shortlink in blog theme Screep Open None
Other Operations 165519 rack and setup mw1307-1348 Screep Open None
Other Operations 165520 rack and setup wtp1025-1048 Screep Open None
Other Operations 165531 rack/setup/install labvirt101[5-8] Screep Open None
Other Operations 165618 Audit / document reasons for not enabling HT? Screep Open None
Other Operations 165631 move gerrit.wm.org SSH service to private/behind LVS like phab-vcs Screep Open None
Other Operations 165779 rack/setup/install labnet100[34] Screep Open None
Other Operations 165781 rack/setup/install labcontrol100[34] Screep Open None
Other Operations 165784 rack/setup/install labmon1002 Screep Open None
Other Operations 165885 Create a cron to clean clientbucket every day or hour Screep Open None
Other Operations 166038 Sync internal nutcracker package with Debian package Screep Open None
Other Operations 166066 Integrate the puppet compiler in the puppet CI pipeline Screep Open None
Other Operations 166081 rack/setup/install conf1004-conf1006 Screep Open None
Other Operations 166180 rack/setup/install netmon2001 Screep Open None
Other Operations 166181 rack/setup/install restbase-dev100[456] Screep Open None
Other Operations 166233 Update redis puppet class to support stretch Screep Open None
Other Operations 166240 Determine if benefactorevents.wikimedia.org should be hosted on the production cluster or still on Microsoft Azure Screep Open None
Other Operations 166291 Exim panics when spamd reaches maxchildren Screep Open None
Other Operations 166322 spam from phabricator in labs Screep Open None
Other Operations 166368 Wipe of spare/replacement disks Screep Open None
Other Operations 166587 Ops Onboarding for Keith Herron Screep Open None
Other Operations 166782 wikimediafoundation.org's language selector is confusing to most visitors who don't have accounts there Screep Open None
Other Operations 166888 CI for operations/puppet is taking too long Screep Open None
Other Operations 166937 Broken /a/refinery-source/guard/run_all_guards.sh script on stat1002 Screep Open None
Other Operations 167035 stretch acct monthly cron will spam when /var/log/wtmp.1 doesn't exist Screep Open None
Other Operations 167091 Elasticsearch errors about BulkShardRequest Screep Open None
Other Operations 167104 Figure out how to disable starting of jobrunner/jobchron in the non-active DC Screep Open None
Other Operations 167121 Several hosts return "internal IPMI error" in the check_ipmi_temp check Screep Open None
Other Operations 167130 Decom mw1170-mw1179, and replace them with new systems. Screep Open None
Other Operations 167138 labvirt1008/labsdb1001: FreeIPMI returned an empty header map Screep Open None
Other Operations 167157 rack/setup/install labtestpuppetmaster2001 Screep Open None
Other Operations 167225 Upload hhvm to stretch apt repo in apt.wikimedia.org Screep Open None
Other Operations 167239 Redirect status.wikipedia.org to status.wikimedia.org Screep Open None
Other Operations 167245 prometheus-node-exporter - invalid group: ‘prometheus:prometheus' Screep Open None
Other Operations 167269 Make security updates of docker images manageable Screep Open None
Other Operations 167292 Collate jessie-wikimedia/backports into jessie-wikimedia/main Screep Open None
Other Operations 167333 Increase email log retention period for the main email relays Screep Open None
Other Operations 167412 host-vmem.erb is doing operations that make no sense Screep Open None
Other Operations 167422 Monitoring: add link to graph for Icinga timeseries alarms Screep Open None
Other Operations 167549 Create Icinga alert when OSM replication lags on maps Screep Open None
Other Operations 167664 New Service Request: recommendation-api Screep Open None
Other Operations 167714 Create Atikamekw Wikipedia Screep Open None
Other Operations 167763 Troubleshoot scb2005 NICs Screep Open None
Other Operations 167820 rack/setup/install labweb100[12].wikimedia.org Screep Open None
Other Operations 167845 Migrate zuul-server behind systemd service Screep Open None
Other Operations 167905 rack/setup/install labpuppetmaster100[12].wikimedia.org Screep Open None
Other Operations 167966 Look into feasibility of disabling sha-1 host keys on our ssh daemons Screep Open None
Other Operations 167984 rack/setup/install labstore100[67].wikimedia.org Screep Open None
Other Operations 167992 rack/setup/install new kafka nodes kafka-jumbo100[1-6] Screep Open None
Other Operations 168044 jobrunner / jobchron systemd services are in error state after a stop Screep Open None
Other Operations 168110 Puppet CA: virt1000.wikimedia.org' will expire on 2017-08-15 Screep Open None
Other Operations 168403 Aggregate prometheus functions yielding different results in grafana vs. prometheus console Screep Open None
Other Operations 168407 rack/setup/install labnodepool1002.eqiad.wmnet Screep Open None
Other Operations 168442 Grant AWight accounts on ores production clusters Screep Open None
Other Operations 168445 Reboots of cloud servers Screep Open None
Other Operations 168460 Update certificates on productions replicas of corp.wikimedia.org LDAP Screep Open None
Other Operations 168490 upgrade planet instances to stretch Screep Open None
Other Operations 168518 Create Dinka Wikipedia Screep Open None
Other Operations 168534 scb1003 unresponsive after reboot Screep Open None
Other Operations 168559 decom silver (was silver has trouble rebooting) Screep Open None
Other Operations 168562 Reimage gerrit2001 as stretch Screep Open None
Other Operations 168613 Broken disk on mw1228 Screep Open None
Other Operations 168619 Degraded RAID on lvs3001 Screep Open None
Other Operations 168683 Upgrade pandoc package to at least 1.12.3 Screep Open None
Other Operations 168705 nutcracker test config in puppet doesn't work Screep Open None
Other Operations 168765 Create Wikiversity Hindi Screep Open None
Other Operations 168767 Monitor PostgreSQL connection slots Screep Open None
Other Operations 168816 some elasticsearch servers in eqiad have CPU overheating Screep Open None
Other Operations 168881 Rename mw2148 / mw2149 / mw2259 / mw2260 to thumbor200[1234] Screep Open None
Other Operations 168891 rack/setup/install labtestmetal2001.codfw.wmnet Screep Open None
Other Operations 168892 rack/setup/install labtestservices2002.wikimedia.org Screep Open None
Other Operations 168893 rack/setup/install labtestservices2003.wikimedia.org Screep Open None
Other Operations 168894 rack/setup/install labtestcontrol2003.wikimedia.org Screep Open None
Other Operations 168927 Smartctl errors for one kafka1012 disk Screep Open 3.0
Other Operations 168962 wikitech-static sync check shouldn't happen so often Screep Open None
Other Operations 168967 Upload shiny-server .deb to our Jessie apt repository Screep Open None
Other Operations 169023 Remove left-over alias for wikidata.org/ontology (doesn't work) Screep Open None
Other Operations 169035 bast3002 sdb broken Screep Open None
Other Operations 169249 /usr/local/bin/xenon-generate-svgs and flamegraph.pl cronspam Screep Open None
Other Operations 169286 labstore1005 A PCIe link training failure error on boot Screep Open None
Other Operations 169287 etcd config depends on puppet certs, but puppet doesn't know Screep Open None
Other Operations 169290 New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS Screep Open None
Other Operations 169299 Mobileapps swagger spec is broken (no pronounciation for `page/mobile-sections-lead` endpoints) Screep Open None
Other Operations 169312 Implement poolcounter failover in Thumbor Screep Open None
Other Operations 169313 Investigate poolcounter failure leading to thumbor failing to generate thumbs Screep Open None
Other Operations 169318 Use multiple puppetdbs on puppet masters Screep Open None
Other Operations 169321 Monitor all management interfaces Screep Open None
Other Operations 169322 Research whether it makes sense to have OTRS installation in an HA setup Screep Open None
Other Operations 1075 Audit groups of metrics in Graphite that allocate a lot of disk space Screep Open None