View Full Version : Scraping the forum - anyone?

02-28-2014, 12:14 PM
Anyone know about scraping or screen-scraping the forum ? how to do it or what is the best program to use for a forum such as ours. i have a bee in my bonnet to see everytihng collected here presented as a data format so newbies like me can see very clearly the answers to many questions asked over and over.

i think its a very exciting prospect , the idea of seeing the answers put into a spread sheet or graph like presentation i think could be pretty bloody ace !

my brother who is a CEO of a data analysis company said once i get going he could lend an hour, he has little time so i want to make sure i learn everything i need to first. HOWEVER i figured me learning scraping is not as good as someone here who may possibly already know it or be able to help us more :p he said the following and so thats where i shall start.

What you're trying to do is scrape the forum. There are various scraping tools although I've not tried them. For example - Site Sucker - via http://ask.metafilter.com/196755/How-to-scrape-a-web-forum.

Googling for an OSX Forum scraper might get you a solution also. And if you use Google Spreadsheets, this might be useful.

Let me know how you get on, I might be able to find an hour to sort it out if you get stuck.

anyway i have two avenues to pursue - one is a new data collection technique, lots of formats to do that but it would be so ace to see all the data collected here over the years and now i know thats done by scraping

so before i leap in and reinvent the wheel - ie figure out how to do this and make every mistake along the way, is there anyone out there who knows what the hell they are doing?? ;)

02-28-2014, 12:46 PM
i have a bee in my bonnet to see everytihng collected here presented as a data format so newbies like me can see very clearly the answers to many questions asked over and over. You Go GIRL!:)

02-28-2014, 04:27 PM
You must eat Wheaties for Breakfast!!! Love your energy

02-28-2014, 10:45 PM
lol guys :o

03-01-2014, 11:12 AM
My brother got back to me, it's a few days work for someone as qualified as he is, to do the scrape,....which as you hinted, means its a big task BUT....arg I want it. So I asked him to quote me a couple of different ways, I said we could perhaps make it into a downloadable book for newbies (objective answers to every question asked by a discus newbie for years and years now!!) to sell at a small price to benefit you and the forum and to cover his work...

Just turning my wheels and seeing what could happen, he seems to know what he is talking about and only does high quality work He is kind of coaching me and just chatting at this point but he wanted to make sure I have your permission to export data Al

I was like he trusts me! But here : I Marnie George do swear to only collect or organize data for the benefit of the simply discus forum members, with express permission of Al Brewmaster all of my work belongs to him, not myself and will not be shared with anyone outside the forum by myself. After al is fit to use the data as he sees fit, to best benefit the interests if the forum or himself, my contributions are voluntary only .

Think that covers it

Also, I started a new data collection data survey for the disease section, in case this doesnt work. to get new posts into a clean data sample and analysis format. Wil send to you when done and if you like it or if you can help me tweak it, we can use it - the graphs and charts on answers are all done for me so I don't Need mad skills:angel:

03-01-2014, 11:21 AM
Lets take this to Pms at this point. I'm really cautious when it comes to board software access and member privacy here.... I'll need alot more info from your brother on what kind of access he needs and data he will generate. lets talk further by pm at this point.


03-01-2014, 11:31 AM
Interesting, why not do a test run in just one subsection of the forum if Al will allow it :) Maybe the breeding section?


03-01-2014, 12:08 PM
I need to know to much much More... Theres potentially alot of problems with this and much to consider.


03-01-2014, 12:11 PM
I will be the first to admit I have no clue about the process ;)

03-01-2014, 12:17 PM
I will be the first to admit I have no clue about the process ;) and I'll be the second.:)

03-01-2014, 08:04 PM
lol sent. reason i approached my brother is i really trust him to have integrity and i cant say that of something i download via the internet. so if we did do it that way and it might not be reasonable to do it that way, i wanted to talk to him who for example, raised the point of forum privacy before i did, that didnt even enter in my head because i have zero interest in anything but the numbers and the responses to the questions.

i did pick disease's to work on first. as aspects of husbandry i think are the most asked questions in the entire forum and the most repeated AND the most irritating (if the answers are to be judged) questions repeated over and over. so if the data could show several years of answers and the best version of the results we can show . ie question 6 of disease questionare had x many responses. x many had a bb. x many had substrate the severity of disease issue in both was 75% vs 22% if you get what i mean?> and Al would SEE the data results firs and decide if they should be made public or if they are too clouded or not useful enough. not me. Al and only Al

i just want to see if the data is useful thats all. i could do it manually without worrying about people's info but .....wow oh boy. i could also , and have started a NEW collection point , anonomous in terms of the data gatherer does not take any member info
only the data flagged for collection. to show graphs and charts to compare answers given.

so we will chat in pm, but at no point is any of this worth any worry to you so this is all only discussion i would absolutely NOT do anything and thats in writing ! without permission and privacy for all members concerned. its why i asked my brother i cant guarantee any tool i used was safe enough. HE CAN he runs a great and very proffessional company in the UK for multi million dollar businesses so he deals with this daily. which also means i probably cant afford to have him do it, but we sure can get some free advice and he is repying to me on a daily basis on this, which is rare, we dont get enough time to chat so its a rare chance to get his time and it must have sparked his interest.