In the initial work that got Ladybird running on Windows, there were
some DLLs that WebContent implicitly depended on which was causing
runtime errors when launching as they didn't exist in libexec.
So the workaround was to explicitly link the targets that had issues to
WebContent and use lagom_copy_runtime_dlls() to ensure they got copied
to libexec.
But given libexec is not a standard Windows convention, in a later
review we made sure Services got output to the bin folder, but those
initial workarounds were not removed even though they were now
unnecessary.
The function currently has 2 purposes: (1) To copy dependent dlls for
executables to output binary directory. This ensures that these helper
processes can be ran after a build given not all DLLs from vcpkg libs
get implicitly copied to the bin folder. (2) Allow fully background
and/or GUI processes to use the Windows Subsystem. This prevents
unnecessarily launching a console for the process, as we either require
no user interaction or the user interaction is all handled in the GUI.
The Win32 API equivalent to pipe2() is CreatePipe(), which creates read
and write anonymous pipe handles that we can set to non-blocking via
SetNamedPipeHandleState(); however, this initial approach caused issues
as our Windows infrastructure assumes socket-based handles/fds and that
we don't use Windows pipes at all, see Core::System::is_socket() in
SystemWindows.cpp. So we use socketpair() to keep our current
assumptions true.
Given that Windows uses socketpair() and Unix uses pipe2(), this
RequestPipe abstraction avoids ifdef soup by hiding the details about
how the read/write fds pair is created and how response data is written
to the client.
Instead of having ExecutionContext track function names separately,
we give FunctionObject a virtual function that returns an appropriate
name string for use in call stacks.
We previously had no protection against the same URL being requested
multiple times at the same time. For example, if a URL did not have any
cache entry and became requested twice, we would open two cache writers
concurrently. This would result in both writers piping the response to
disk, and we'd have a corrupt cache file.
We now hold back requests under certain scenarios until existing cache
entries have completed:
* If we are opening a cache entry for reading:
- If there is an existing reader entry, carry on as normal. We can
have multiple readers.
- If there is an existing writer entry, defer the request until it is
complete.
* If we are opening a cache entry for writing:
- If there is an existing reader or writer entry, defer the request
until it is complete.
This object will be needed in a future commit to store requests awaiting
other requests to finish. Doing this in a separate commit just to make
that commit less noisy.
We previously waited until we received all response headers before we
would create the cache entry. We now create one immediately, and handle
writing the headers in its own function. This will allow us to know if
a cache entry writer already exists for a given cache key, and thus
prevent creating a second writer at the same time.
We currently manage request lifetime as both an ActiveRequest structure
and a series of lambda callbacks. In an upcoming patch, we will want to
"pause" a request to de-duplicate equivalent requests, such that only
one request goes over the network and saves its response to the disk
cache.
To make that easier to reason about, this adds a Request class to manage
the lifetime of a request via a state machine. We will now be able to
add a "waiting for disk cache" state to stop the request.
This will allow more easily using these from other files. This also lets
us hide the Windows.h header necessity in a single location, instead of
needing to remember to include it everywhre we would otherwise include
<curl/curl.h>.
This is used by tests to set the default time zone to UTC.
This is because certain tests create JavaScript Date objects, which are
in the current timezone.
If transferring a cached response body fails for any reason, we will now
issue a network request instead of failing the request outright.
The catch here is that we will have already transferred the response
code and headers to the client, and potentially some of the body. So we
attempt to only request the remaining data over the network using a
range request. This feels a bit sketchy, but this is also how Chromium
behaves.
However, the server may or may not support range requests. If they do,
we can expect an HTTP 206 response with the bytes we need. If not, we
will receive an HTTP 200 (assuming the request succeeded), along with
the entire object's body. In this case, we also behave like Chromium,
and internally drop number of bytes we had already transferred.
If we are unable to pipe the response body from a cache file to the
client, let's take the extra safe approach of deleting the cache file
for now. We already remove the file if we weren't able to read its
metadata during initialization.
This is a bit of a blunt hammer, but this hooks an action to clear the
HTTP disk cache into the existing Clear Cache action. Upon invocation,
it stops all existing cache entries from making further progress, and
then deletes the entire cache index and all cache files.
In the future, we will of course want more fine-grained control over
cache deletion, e.g. via an about:history page.
This adds a disk cache for HTTP responses received from the network. For
now, we take a rather conservative approach to caching. We don't cache a
response until we're 100% sure it is cacheable (there are heuristics we
can implement in the future based on the absence of specific headers).
The cache is broken into 2 categories of files:
1. An index file. This is a SQL database containing metadata about each
cache entry (URL, timestamps, etc.).
2. Cache files. Each cached response is in its own file. The file is an
amalgamation of all info needed to reconstruct an HTTP response. This
includes the status code, headers, body, etc.
A cache entry is created once we receive the headers for a response. The
index, however, is not updated at this point. We stream the body into
the cache entry as it is received. Once we've successfully cached the
entire body, we create an index entry in the database. If any of these
steps failed along the way, the cache entry is removed and the index is
left untouched.
Subsequent requests are checked for cache hits from the index. If a hit
is found, we read just enough of the cache entry to inform WebContent of
the status code and headers. The body of the response is piped to WC via
syscalls, such that the transfer happens entirely in the kernel; no need
to allocate the memory for the body in userspace (WC still allocates a
buffer to hold the data, of course). If an error occurs while piping the
body, we currently error out the request. There is a FIXME to switch to
a network request.
Cache hits are also validated for freshness before they are used. If a
response has expired, we remove it and its index entry, and proceed with
a network request.
I noticed the existing code would end up calling
`computed_properties->property(PropertyID::Custom)`
so let's actually ask for the custom property instead.
Global Privacy Control aims to be a replacement for Do Not Track. DNT
ended up not being a great solution, as it wasn't enforced by law. This
actually resulted in the DNT header serving as an extra fingerprinting
data point.
GPC is becoming enforced by law in USA states such as California and
Colorado. CA is further working on a bill which requires that browsers
implement such an opt-out preference signal (OOPS):
https://cppa.ca.gov/announcements/2025/20250911.html
This patch replaces DNT with GPC and hooks up the associated settings.
Allows formulas to update on Google Sheets, which uses a Worker to
update them and makes cookie authenticated requests, which was failing
before this commit.
This has the limitation that it has to proxy through the WebContent
process, but that's how the current infrastructure is, which is outside
the scope of this commit.
And make it a DOM::Node, not DOM::Element. This makes everything flow
much better, such as spec texts that explicitly mention "focused area"
as the fact that we don't necessarily need to traverse a tree of
elements, since a Node can be focusable as well.
Eventually this will need to be a struct with a separate "focused area"
and "DOM anchor", but this change will make it easier to achieve that.