[Imc-tech] [Imc-web] Spam Is...Until a Solution Is Found (fwd)

Arun Bhalla bhalla at uiuc.edu
Thu Jun 23 01:57:07 CDT 2005


[imc at ucimc.org taken off the cc list]

I poked around the ucimc-fullaccess logs.  Only got through a bit of June 19
(the beginning of the log) before I stopped.  I was looking for requests
containing "newswire/update" in the URL, focusing on the requests with 
older article IDs in the referer URL.

Here are some sample client addresses:
98.101.196.200.in-addr.arpa domain name pointer 98-101-196-200.linkexpress.com.br.
Host 33.110.8.81.in-addr.arpa not found: 3(NXDOMAIN)
198.250.128.62.in-addr.arpa domain name pointer dsl-250-198.monet.no.
58.150.244.148.in-addr.arpa domain name pointer host-148-244-150-58.block.alestra.net.mx.
118.240.248.207.in-addr.arpa domain name pointer host-207-248-240-118.block.alestra.net.mx.
118.240.248.207.in-addr.arpa domain name pointer host-207-248-240-118.block.alestra.net.mx.
8.211.201.64.in-addr.arpa domain name pointer 64-201-211-8.regn.hssx.sasknet.sk.ca.
Host 63.118.227.128.in-addr.arpa not found: 3(NXDOMAIN)

Not much of a pattern except for the repeated 207.248.240.118, and that
the alestra.net.mx domain was the home for 3 of these 8.  More analysis
might reveal that only a small handful of client IPs are responsible for most
of the spam, but I imagine there is some sort of distributed spam job going
on.  I might find more time tomorrow or later to check out more of the
access log, but right now I don't think that blocking any of these IPs would
have a meaningful effect on stopping spam.

If it's possible to reject newswire posts/comments that contain keywords
like "poker," it seems that would stop most of the spam.

In response to Mike's email, I imagine Dada does not have a hard limit
on the number of entries it can handle.  If it does, it's probably on the
order of billions (or larger).  

If it's technically feasible to grant comment hiding privileges to many
people without granting full access rights to Dada, it might be worth
sharing the load of web editor among many people.  Further, I think we
could mainly concentrate on keeping recent articles (i.e., articles
directly linked off the main page) free of spam.  The spammers (people
or scripts) seem to focus on the older articles, probably because the
URLs are established rather than dynamic.  In addition, I doubt the
older pages are read as much -- I know I rarely do -- so there isn't as
much harm in not addressing spam on older articles.  I know it's not
great to house spam, but if we can't address it socially or technically,
well, that's life on the Web; people are used to it on email and open
forums.

Arun

"Zachary C. Miller" writes:
> I have not had the free time to deal with this at all. Is there anyone
> else out there that can look through the logs and figure out what IP
> addresses the spammers are coming from? Should be easy enough but I'm
> SWAMPED. If there is anyone that I can give access to the server logs
> to who wants to volunteer to figure this out just let me know. 
> 
> Mike Lehman wrote:
> >
> > Due to the fact that spam advertisements posted as comments to 
> > legitimate articles have become a major problem on the UC IMC website, I 
> > will be suspending dealing with such material at this point. When a 
> > technical solution is achieved that can effectively deal with this 
> > problem, it really needs to be addressed. Providing a venue for sleazy 
> > phishing scams and poker ads is certainly not part of our mission.
> > 
> > The rate of such posts now approaches 1,000 a day. This is quite an 
> > increase from even last week, when the number was typically 200-300 a 
> > day, making it impossible to consider devoting the time that is needed 
> > to manually deleting such posts.
> > 
> > If there were some way to facilitate mass deletions within our current 
> > version of Dada, such as being able to change the editor's page display 
> > to 100 entries at a time, then it might be feasible to continue manual 
> > deletions, assuming there is no further increase in the rate of this 
> > shit. When you can only display 12 enties on a page it simply takes too 
> > long to deal with the current mess.
> > 
> > We have approximately 48,000 entries at this time in the website's 
> > database. On the assumption that approximately 1 in 5 of those is an 
> > original article, then there are about 9,600 original articles for the 
> > spam to be attached to as comments. This means that in approximately 10 
> > days, nearly every original article we have will likely be proudly 
> > displaying advertising from these creeps at the current rate.
> > 
> > I am uncertain how many entries the current version of Dada will 
> > sustain, but this might be something to check into. By the end of July, 
> > we will likely be looking at over 100,000 database entries, assuming we 
> > don't turn off comments or take other such drastic steps in the meantime.
> > 
> > We may reach the point at this rate where the database will no longer 
> > accomodate all the spam and our legitimate posts as well, but not being 
> > a technie myself, I don't know what this limit is, other than it is 
> > finite from what I understand. Those with access to such info on Dada 
> > might want to give us an estimate of when we might start expecting 
> > trouble with the website if such a scenario will occur in the near future.
> > 
> > Sorry for this bad news, but it increasingly is a waste of time to 
> > attempt to manually deal with this.
> > Mike Lehman
> > _______________________________________________
> > Imc-web mailing list
> > Imc-web at lists.ucimc.org
> > http://lists.chambana.net/cgi-bin/listinfo/imc-web
> > 
> 
> -- 
> Zachary C. Miller - @= - http://zach.chambana.net/
> IMSA 1995 - UIUC 2000 - Just Another Leftist Muppet - Ya Basta!
>  Social Justice, Community, Nonviolence, Decentralization, Feminism,
>  Sustainability, Responsibility, Diversity, Democracy, Ecology
> _______________________________________________
> Imc-tech mailing list
> Imc-tech at lists.ucimc.org
> http://lists.chambana.net/cgi-bin/listinfo/imc-tech
> 


More information about the Imc-tech mailing list