Page MenuHomePhabricator

MediaWiki-File-managementComponent
ActivePublic

Members (10)

Watchers (5)

Details

Description

Management of multimedia files in the core MediaWiki software itself, including:

  • File backend (see also SRE-swift-storage for tasks about the actual server serving files in Wikimedia project, upload.wikimedia.org)
  • File repositories.
  • Media handlers.
  • Thumbnail generation (but not on Wikimedia wikis, see Thumbor instead)
  • Misc display handling (File description page, Special:ListFiles).

Parent project: MediaWiki-General

For uploading, see MediaWiki-Uploading.

Recent Activity

Today

simon04 closed T60169: UploadWizard: Should extract both camera and object location from EXIF and add location and object location template, a subtask of T67681: EXIF support (tracking), as Resolved.
Tue, Jun 10, 4:06 PM · Commons, Multimedia, Tracking-Neverending, MediaWiki-File-management
Pigsonthewing added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.
Tue, Jun 10, 11:49 AM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
tstarling added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

There is some guff about retail use cases in its introduction, but it is not explicitly required. Indeed it also gives the use case of "a manufacturer site that's not commerce enabled, strictly informational)" where the ''optional'' inclusion of a value for price is to be "assumed that the price attribute represents the manufacturer's suggested retail price. "

Tue, Jun 10, 12:21 AM · Commons, SEO, MediaWiki-File-management, MediaWiki-General

Yesterday

tstarling added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

46% of Googlebot requests use the smartphone User-Agent, and these mostly get a 302 redirect to commons.m.wikimedia.org. But the mobile site gives a link rel=canonical pointing back to commons.wikimedia.org, i.e. the redirect it just crawled. This is probably the biggest issue in terms of crawl budget. Mobile domain sunsetting (T214998) is already in progress and presumably will fix it; the SEO aspect of that change was discussed on mediawiki.org.

Mon, Jun 9, 11:42 PM · Commons, SEO, MediaWiki-File-management, MediaWiki-General

Sun, Jun 8

Pppery added a project to T396307: Add the missing link from LUA interpreter to SVG rasterizer: MediaWiki-File-management.
Sun, Jun 8, 7:10 PM · MediaWiki-File-management, SVG, Scribunto, Commons

Fri, Jun 6

tstarling added a comment to T396168: Commons videos not indexed by Google.

they abuse the hProduct microformat

Do they? Or does Google fail to understand it?

Fri, Jun 6, 11:19 PM · TimedMediaHandler, MediaWiki-File-management, Commons, SEO
sgrabarczuk updated subscribers of T370188: mw-datatable tables overlap UI.
Fri, Jun 6, 9:20 PM · MediaWiki-Change-tagging, Commons, MediaWiki-File-management, Trust and Safety Product Team, MediaWiki-Special-pages, Design
Pigsonthewing added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

Wikipedia is also using microformats incorrectly.

Fri, Jun 6, 2:46 PM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
Pigsonthewing added a comment to T396168: Commons videos not indexed by Google.

they abuse the hProduct microformat

Fri, Jun 6, 2:33 PM · TimedMediaHandler, MediaWiki-File-management, Commons, SEO
TheDJ added a comment to T396168: Commons videos not indexed by Google.

CommonsMetadata extensions outputs the ImageObject JSON+LD for images, but not the VideoObject for our video pages. That might be an improvement. Additionally, the open graph for all our pages designates type:website, which might reduce the chance that Google is able to recognize this as a video watch page as well I'm guessing.

Fri, Jun 6, 8:38 AM · TimedMediaHandler, MediaWiki-File-management, Commons, SEO
TheDJ updated the task description for T396168: Commons videos not indexed by Google.
Fri, Jun 6, 7:58 AM · TimedMediaHandler, MediaWiki-File-management, Commons, SEO
tstarling added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

I documented the history of sitemaps on Wikimedia websites.

Fri, Jun 6, 1:22 AM · Commons, SEO, MediaWiki-File-management, MediaWiki-General

Thu, Jun 5

tstarling added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

I split out the video issue to T396168: Commons videos not indexed by Google

Thu, Jun 5, 11:37 PM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
tstarling created T396168: Commons videos not indexed by Google.
Thu, Jun 5, 11:32 PM · TimedMediaHandler, MediaWiki-File-management, Commons, SEO
Ladsgroup added a comment to T328872: Commons: UploadChunkFileException: Error storing file: backend-fail-internal; local-swift-codfw.

I understand the need to have multi write backends and doing all write operations in both dcs but from that it requires a massive leap to require practically every operation be replicated in both dcs. For example, why doing consistency check in both dcs? let's say something is corrupted in between these two swift instances, the chance of the swift reconciliation script actually finding it or overwriting the secondary dc is much higher than someone accidentally deciding to upload a new version and then getting broken. i.e. I think mediawiki at the moment should be responsible for double uploads (and other write operations) but it shouldn't try to do integrity checks of two swift clusters (doubly so during upload). To me it's like mediawiki checking primary database and a replica for data integrity during page reads, worse than that even. It tries to do that while the replica is thousands of kilometers away. File backend shouldn't do the work of the infrastructure at run time.

Thu, Jun 5, 11:31 PM · API Platform, MediaWiki-File-management, MW-1.41-notes (1.41.0-wmf.25; 2023-09-05), Unstewarded-production-error, MediaWiki-Uploading, Wikimedia-production-error, SRE-swift-storage, Commons
tstarling added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

The class "vevent" is part of the vCalendar microformat [1]. The class is widely used in Wikimedia projects, so please be careful—and check whether it was intended or "blindly copied"—before removing it elsewhere.

Thu, Jun 5, 10:53 PM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
Maintenance_bot removed a project from T370188: mw-datatable tables overlap UI: Patch-For-Review.
Thu, Jun 5, 10:32 PM · MediaWiki-Change-tagging, Commons, MediaWiki-File-management, Trust and Safety Product Team, MediaWiki-Special-pages, Design
gerritbot added a comment to T370188: mw-datatable tables overlap UI.

Change #1114082 abandoned by Jdlrobson:

[mediawiki/core@master] DataTable: Fit data tables to content to avoid overflowing sidebar

Reason:

Abandoning. Might come back to this in September-time.

https://gerrit.wikimedia.org/r/1114082

Thu, Jun 5, 10:21 PM · MediaWiki-Change-tagging, Commons, MediaWiki-File-management, Trust and Safety Product Team, MediaWiki-Special-pages, Design
Maintenance_bot added a project to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text: Commons.
Thu, Jun 5, 4:30 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
sbassett moved T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text from Incoming to Our Part Is Done on the Security-Team board.
Thu, Jun 5, 4:27 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
sbassett triaged T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text as Medium priority.
Thu, Jun 5, 4:26 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
matmarex updated subscribers of T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

@sbassett Could you make this public? Thanks.

Thu, Jun 5, 4:21 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
matmarex closed T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text as Resolved.
Thu, Jun 5, 4:19 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
Pigsonthewing added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

One other thing I realised today, Google hates duplicate content, and due to how Commons description pages are proxied to the 'local' wiki's, they might be downlinked for being duplicate content ?

Thu, Jun 5, 12:42 PM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
Pigsonthewing added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.
Thu, Jun 5, 12:31 PM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
tstarling added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

One of the problems with relying on category pages for discovery (assuming that's what's happening) is that the "next page" link on paginated categories is blocked by robots.txt. For example https://commons.wikimedia.org/w/index.php?title=Category:Photographs_taken_on_2017-07-18&from=Z

Thu, Jun 5, 11:21 AM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
tstarling added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

74 videos indexed. 482K videos not indexed, reason "Video isn't on a watch page".

Thu, Jun 5, 10:46 AM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
tstarling added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

Can you try this on https://commons.wikimedia.org/wiki/Category:M%C3%B6nchenholzhausen again? The warnings should have been fixed by this and that edit.

Thu, Jun 5, 10:28 AM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
LennardHofmann added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

I have Google Search Console access for Commons. Here is a result for a random URL:

Thu, Jun 5, 9:07 AM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
tstarling added a comment to T54647: MediaWiki images and image pages are not being indexed properly by external search engines.

I have Google Search Console access for Commons. Here is a result for a random URL:

Thu, Jun 5, 7:29 AM · Commons, SEO, MediaWiki-File-management, MediaWiki-General
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1152791 merged by jenkins-bot:

[mediawiki/extensions/TimedMediaHandler@master] Improve HTML escaping in getLongDesc(), getShortDesc() methods

https://gerrit.wikimedia.org/r/1152791

Thu, Jun 5, 5:24 AM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1152790 merged by jenkins-bot:

[mediawiki/core@master] Improve HTML escaping in getLongDesc(), getShortDesc() methods

https://gerrit.wikimedia.org/r/1152790

Thu, Jun 5, 5:15 AM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
aaron added a comment to T328872: Commons: UploadChunkFileException: Error storing file: backend-fail-internal; local-swift-codfw.

The idea of preloadFileStat() was to allow concurrent HEAD requests to a list of objects after an relevant locks were acquired. If no locks are acquired, and "latest" is not set, maybe reusing prior loaded state entries is OK. From the perspective of FileBackend, it's mostly thinking that you call doOperations or doQuickOperations, which is supposed to do one preload (within any locking) and is done. The FileBackendMultiWrite class (itself a hack due to not having a proper regional swift cluster and swift-repl only able to do periodic reconciliation) also has to write to the remote backend and has consistency checks turned on...doing a preloads of local and remote backend. It also has to repeat the write operation on the remote backend, requiring another preload to the remote. Since FileBackendMultiWrite does it's own locking, it seems like a lot of these 'stat' entries could be reused instead of reloaded.

Thu, Jun 5, 3:08 AM · API Platform, MediaWiki-File-management, MW-1.41-notes (1.41.0-wmf.25; 2023-09-05), Unstewarded-production-error, MediaWiki-Uploading, Wikimedia-production-error, SRE-swift-storage, Commons
matmarex moved T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text from Inbox, needs triage to In progress on the MediaWiki-Platform-Team board.
Thu, Jun 5, 2:25 AM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
matmarex added a project to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text: MediaWiki-Platform-Team.
Thu, Jun 5, 2:24 AM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
Maintenance_bot added a project to T293701: Special:Undelete for a file on betacommons fails (502, Next Hop Connection Failed): MW-Interfaces-Team.
Thu, Jun 5, 1:30 AM · MW-Interfaces-Team, Commons, MediaWiki-Page-deletion, MediaWiki-File-management, Beta-Cluster-reproducible
Krinkle updated the task description for T293701: Special:Undelete for a file on betacommons fails (502, Next Hop Connection Failed).
Thu, Jun 5, 1:14 AM · MW-Interfaces-Team, Commons, MediaWiki-Page-deletion, MediaWiki-File-management, Beta-Cluster-reproducible
Ladsgroup added a comment to T328872: Commons: UploadChunkFileException: Error storing file: backend-fail-internal; local-swift-codfw.

Okay, I made deeper investigation. I uploaded a random file on verbose mode and here is the result:

Thu, Jun 5, 1:14 AM · API Platform, MediaWiki-File-management, MW-1.41-notes (1.41.0-wmf.25; 2023-09-05), Unstewarded-production-error, MediaWiki-Uploading, Wikimedia-production-error, SRE-swift-storage, Commons
matmarex added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Can you update the doc comments of the File and MediaHandler methods to indicate that it is unsafe HTML and Sanitizer::removeSomeTags() should be called on the result?

Thu, Jun 5, 12:01 AM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security

Wed, Jun 4

gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153710 merged by jenkins-bot:

[mediawiki/core@REL1_44] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153710

Wed, Jun 4, 10:47 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153711 merged by jenkins-bot:

[mediawiki/core@REL1_43] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153711

Wed, Jun 4, 10:46 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153712 merged by jenkins-bot:

[mediawiki/core@REL1_42] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153712

Wed, Jun 4, 10:44 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153713 merged by jenkins-bot:

[mediawiki/core@REL1_39] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153713

Wed, Jun 4, 10:42 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153713 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@REL1_39] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153713

Wed, Jun 4, 9:20 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153712 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@REL1_42] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153712

Wed, Jun 4, 9:18 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153711 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@REL1_43] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153711

Wed, Jun 4, 9:18 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153710 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@REL1_44] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153710

Wed, Jun 4, 9:18 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153687 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.4] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153687

Wed, Jun 4, 8:22 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153686 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.3] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153686

Wed, Jun 4, 8:22 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security
gerritbot added a comment to T395834: File::getLongDesc()/getShortDesc() is documented to return HTML, but some handlers return unescaped plain text.

Change #1153687 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@wmf/1.45.0-wmf.4] Treat File::getShortDesc() as possibly unsafe HTML

https://gerrit.wikimedia.org/r/1153687

Wed, Jun 4, 7:49 PM · Commons, SecTeam-Processed, Vuln-XSS, MediaWiki-Platform-Team, Security-Team, TimedMediaHandler, MediaWiki-File-management, Security