dustin/xactfetch/pipeline/head This commit looks goodDetails
There is currently a [bug][0] in the Python Playwright API that causes
_asyncio_ to raise an `InvalidStateError` occasionally when the
`PlaywrightContextManager` exits. This causes the program to exit
with a nonzero return code, even though it actually completed
successfully, which will cause the Job to be retried. To avoid this,
we can catch and ignore the spurious exception.
I've reorganized the code a bit here because we have to wrap the whole
`with` block in the `try`/`except`; moving the contents of the block
into a function keeps the indentation level from getting out of control.
[0]: https://github.com/microsoft/playwright-python/issues/2238
While the original intent of the `secretsocket` script was to have `rbw`
run outside the `xactfetch` container, that is only useful during
development; both processes need to run in the container in Kubernetes.
The `secretsocket` server will now create its IPC soket at the location
specified by the `SECRET_SOCKET_PATH` environment variable, if set.
This way, both `secretsocket` and `xactfetch` can be pointed to the
same location with this single variable.
With the addition of ancillary scripts like `entrypoint.sh`, the `COPY .`
instruction in the build stage results in a full rebuild of the final
image for every change. To avoid this, we now only copy the files that
are actually required to build the wheel. The other scripts are copied
later, using an intermediate layer. This avoids needing a `COPY`
instruction, and therefore a new layer in the final image, for each
script. Hypothetically, we could use `RUN --mount=bind` and copy the
files with the `install` command, but bind-mounting the build context
doesn't actually work; SELinux prevents the container builder from
accessing the source directory directly.
If the `SECRET_SOCKET_PATH` environment variable is not set, or refers
to a non-existent path, then we assume we need to manage the
`secretsocket` server ourselves.
Playwright has a nifty feature called the [Trace Viewer][0], which you
can use to observe the state of the page at any given point during the
browsing session. This should make troubleshooting failures a lot
easier.
[0]: https://playwright.dev/python/docs/trace-viewer-intro
Earlier this week, `xactfetch` stopped being able to log in to the Chase
website. After logging in, the website just popped up a message that
said "It looks like this part of our website isn't working right now,"
with a hint that I should try a different browser. I suspect they have
enhanced their bot detection/scraping resistance, because the error
only occurs when `xactfetch` is run from inside a container. It happens
every time in that case, but never when I run it on my computer
directly.
After several hours of messing with this, the only way I was able to
get it to work is to use full-blown headed Chromium. Neither headless
nor headed Firefox works, nor does headless Chromium. This is a bit
cumbersome, but not really a big deal. Headed Chromium works fine in
an Xvfb session.
When logging in to the Chase website with a fresh browser profile, or
otherwise without any cookies, the user will be required to "validate
the device" using a one-time code delivered via SMS. Previously, I
handled this by running the `xactfetch` script with a headed browser,
manually entering the verification code when the prompt came up. Then,
I would copy the `cookies.json` file, now containing a cookie indicating
the device had been verified, to the Kubernetes volume, where it would
be used by the production pod.
Now that `xactfetch` uses asyncio, it is possible for the Chase `login`
method to wait for one of multiple conditions: either login succeeds,
or SMS 2FA is required. In the case of the latter, we can get the
2FA code from the secret server and enter it into the form to complete
the login process.
The real magic here is how we're getting the 2FA code from the SMS
message. There are two components to this. First, I've installed [SMS
to URL Forwarder][0] on my phone. This app does what it says on the
tin: it relays SMS messages to an HTTP(S) server. I have configured it
to forward messages from the Chase SMS 2FA short code to an _ntfy_
topic. The second component is the `chase2fa` script, which is called
by the secret server. This script listens for notifications on the
_ntfy_ topic where the SMS messages are forwarded. When a message
arrives, it extracts the verification code using a simple regular
expression that identifies a several-digit number.
With all these pieces in place, the `xactfetch` script is no longer
thwarted by the SMS 2FA barrier!
[0]: https://github.com/bogkonstantin/android_income_sms_gateway_webhook
Using the Playwrigt async API is the only way to wait for one of
multiple conditions. We will need this capability in order to detect
certain abnormal conditions, such as spurious 2FA auth or interstitial
ads.
`xactfetch` has three different ways of reading secret values:
* From environment variables
* By reading the contents of a file (specified by environment variables)
* By looking them up in the Bitwarden vault
This is very cumbersome to work with, especially when trying to
troubleshoot using the container image locally.
To make this easier, I've factored out all secret lookup functionality
into a separate process. This process listens on a UNIX socket and
implements a very simple secret lookup protocol. The client
(`xactfetch` itself in this case) sends a string key, identifying the
secret it wants to look up, terminated by a single line feed character.
The `secretsocket` server looks up the secret associated with that key,
using the method defined in a TOML configuration file. There are four
supported methods:
* Environment variables
* External programs
* File contents
* Static strings
The value returned by the corresponding method is then sent back to the
client via the socket connection, again as a string terminated with a
line feed.
Moving the secret handling into a separate process simplifies the
environment configuration needed in order to run `xactfetch`. Notably,
when running it in a container, only the `secretsocket` soket needs to
be mounted into the container. Since `rbw` is executed by the server
process now, rather than `xactfetch` directly, the vault does not need
to be present in the `xactfetch` container. Indeed, none of the secret
values need to be present in the container.
dustin/xactfetch/pipeline/head This commit looks goodDetails
When debugging a failure for one bank's website, I often want to run
the fetch for just that bank. To date, I've been commenting out the
other bank, but that is silly. Now, `xactfetch` can target a subset
of banks by specifying their name slug(s) as CLI arguments.
Chase changed the name of my credit card from *CREDIT CARD* to *Amazon
Visa*. Just in case they change it again or something, let's match only
on the card number.
By default, the transaction list for the Chase credit card shows
transactions that have posted since the last statement. This list can
sometimes be empty, particularly on the day the the statement is issued.
When this is the case, clicking the _Download Account Activity_ button
does not work; it simply displays a message stating "There's no account
activity showing to download." Since we are going to adjust the date
range on the download form anyway, it doesn't matter what's showing,
we just need the button to work. Thus, we now set the page show all
transactions and then click the button.
Playright needs to be updated frequently in order to update its Firefox
build. The Chase website has a very strict browser support policy, and
frequently drops support for old Firefox versions.
I've moved the bank website credentials to a shared collection in
Bitwarden and made them accessible to an account dedicated to
`xactfetch`. Using the `pinentry-stub` script, `rbw` can now
auto-unlock the vault, using the password in the file referred to by the
`PINENTRY_PASSWORD_FILE` environment variable. This means that
`xactfetch` can now run completely automatically, without any input from
me.
While debugging `xactfetch`, I do not need it to send me notifications
about failures, etc., since I am sitting at my computer. To suppress
them, I can now set the `DEBUG_NTFY` environment variable to `0`.
Sometimes transactions show up in the export with the previous day's
date. When this happens, these transactions may get skipped, since they
might have the same date as the most recent transaction in Firefly. To
help avoid skipping transactions, we need the start date to be the same
as the most recent transaction, rather than the next day. This can
cause duplicate imports, though, but fortunately, the Firefly Data
Importer handles this fairly well.
If the latest transaction was recent enough to skip importing
transactions, we don't even need to log in to the bank websites. Thus,
we should delay the login step until after we've checked this.
Since I ulimately want to run `xactfetch` in Kubernetes, running the
importer in a container as a child process doesn't make much sense.
While running `podman` in a Kubernetes container is possible, getting it
to work is non trivial. Rather than go through all that effort, I think
it makes more sense to just use HTTP to communicate with the importer I
already have running.
I had originally chosen not to use the web importer because of how I
have it configured to use Authelia for authentication. The importer
itself does not have any authentication beyond the "secret" parameter
(which is not secret at all, given that it is passed in the query string
and thus visible to anyone and stored in access logs), so I was hesitant
to add an access control rule to bypass authentication for the
`/autoupload` path. Fortunately, I discovered that Authelia will use
the value of the `Proxy-Authorization` header to authenticate the
request without redirecting to the login screen. With just a couple of
lines in the Ingress configuration, I got it to work using the regular
`Authorization` header as well:
```yaml
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/auth-snippet: |
proxy_set_header Proxy-Authorization $http_authorization;
proxy_set_header X-Forwarded-Method $request_method;
nginx.ingress.kubernetes.io/configuration-snippet: |
proxy_set_header Authorization "";
```
Apparently, Chase has switched back to the CSV schema without the Card
column at the beginning. Just in case they decide to flip-flop on that
field forever, we better try to handle both cases.
Chase made some minor updates to their site recently which affected some
of the element locators. The propaganda in the right-hand column of the
landing page has changed, and the Downlod Account Activity form is still
really terrible, and now behaves even more strangely.
Commerce likes to occasionally inject ads and other propaganda after the
login page, before loading the account summary page. To handle this, we
may need to specifically navigate to the account summary page after
logging in.
The Commerce Bank website no longer allows navigating directly to
`Download.ashx`; doing so just returns a generic "we're sorry" error.
They appear to have added some CSRF protection or something that makes
this not work. As a result, we have to go fill out the form on the
*Download Transactions* modal dialog in order to get the download to
work correctly.
In order to set the message for a notification with an attachment, the
text must be specified in the `Message` request header. Unfortunately,
HTTP header values are limited to the Latin-1 character set, so Unicode
characters cannot be included. As of *ntfy* 2.4.0, however, the server
can decode base64-encoded headers using the RFC 2047 scheme.
To maintain compatibility with older *ntfy* servers, the `ntfy` function
will only encode message contents this way if the string cannoto be
encoded as ASCII.
When there are multiple accounts associated with a Chase online banking
user, the dashboard page layout changes. Detailed account history is no
longer shown, so the elements we were waiting for in the "Waiting for
page to load completely" step never appear. Since we're navigating
directly to the download account transactions page now, anyway, we do
not even need to wait for this button to appear.
Although it is undocumented, *ntfy* accepts a `Message` header along
with a file upload, which sets the message content of the notification
when a file is attached. Since HTTP headers cannot contain multiple
lines, the newline character has to be escaped. The *ntfy* server
performs unescaping automatically.
When there are no transactions in the default display, the *Download
account activity* button is disabled. To avoid failing in this case, we
now navigate directly to the download page. This requires explicitly
selecting the credit card account from the dropdown list, as it is not
pre-filled when the page is loaded directly.
The `ntfyerror` context manager replaces `screenshot_failure` for
handling online banking interaction failures. It has several
advantages, notably:
* takes a screenshot of the browser page *before* logging out
* cleaner suppression of exceptions, with success tracking
* sends an `ntfy` message, with the screenshot attached