[conspire] Web spam and yandex forms

Wed Dec 8 08:45:10 PST 2021

Ivan Sergio Borgonovo writes:
> Rick wrote something that it's worth a bit more explanation.
> Generally they try to use forms as mail reflector... but also as link
> factories. Because sometimes what you put in forms end up being published.

Yes, makes sense. In this case, it wasn't published, just sent to
one email address. But the link to their yandex form was in the email.

> I'd check the whole form validation (or lack of...) since they were able to
> send a really long bunch of text in the user name field.

Yes, definitely. I had validation for the email field, but the
only check on username was that it be non-null. I've since added
a few other heuristics: a max length and that it not contain "://".
I'll probably add others as I think of them.

Of course this has me thinking about other forms on the site,
but I think that's the only one that sends out email, and I
think the only others are for bill numbers, so if the input
doesn't match a bill in the given session it should just give an
error message.

> > And I hate recaptcha (the number of hours of my life I've wasted
> > clicking on traffic signal photos over and over, because Google
> > *never* agrees with me about traffic signals -- I have no idea what
> > they think a traffic signal is) and was very resistant to that idea,
> 
> Newer versions of recaptcha may not require any effort from the user and
> just check some other signal to guess if you're a legit user (IP, browser
> signature, possibly even how you interact with mouse/touch screen.

I think so; in the past few months, sometimes (maybe half to 2/3 of
the time) I can even get past "traffic signal" ones. I still hate
them. For years they penalized non-google users, i.e. if you're
signed in to a google account in that browser, it's easy to pass,
but if you prefer not to do all your browsing signed in with google
cookies, sometimes it would be a gauntlet of eight or ten different
screens, no matter how accurate your clicks might be. But that seems
to have gotten better recently; perhaps something to do with EU or
CA privacy laws.

> The most important advantage of a not very popular captcha is there won't be
> many bot around that know how to solve it but the downside is that it could
> be algorithmically very easy to solve (as the one with the dices).

Reading Rick's description of his English questions and Bruce
Schneier's question, I got to thinking how I could ask NM-specific
questions, like "What's the Governor's first name?" or "What shape
is the state capitol?" or "red or green?" (The real answer to that
last one is of course green, but I'll probably allow red or Christmas
for nonbelievers.) And I figure I'll put the questions and their
allowable answers in a local file that the flask site reads, not
in the checked-in code where anyone could see the answers on github.

> If I were a spammer I'd go to the list Rick mentioned in a previous post and
> see which one I can easily circumvent... and dice captcha would be on the
> top of my list.

Yes, I thought that too: I loved the idea initially, but did think
that it wouldn't be that hard to solve programmatically. As opposed
to the chatty questions which most people will already know how to
answer, and if not, a quick web search will tell them. Hopefully
that's more than a typical spammer wants to invest.

Meanwhile I've figured out how to ban IPs in apache. In the past
day the requests have all come from two IPs, though I know that will
change and I'll have to add more over time, and I'm noting when IPs
are added in case I want to move them off the blacklist after a
while. I think I'm going to need some intelligent log-monitoring
scripts, keeping track of which IPs are seen when.

        ...Akkana