Author Topic: The End of Shrinemaiden As We Know It  (Read 70195 times)

Re: The End of Shrinemaiden As We Know It
« Reply #60 on: February 18, 2020, 01:58:24 AM »
As for me, I downloaded a copy of all (accessible to my account) thread HTML pages in paginated and printable forms, forum attachments linked in them, on-site images included in them in <img> tags, and some other random on-site files linked in them.

So I think I have everything important, now it's just a matter of hammering it into a more usable form. Should I do that? Just upload the raw data? None of the above? I'm asking because we probably don't need multiple people doing the same work independently.

Also, in case the site dies, you can e-mail me at <my username>@tlen.pl.

Re: The End of Shrinemaiden As We Know It
« Reply #61 on: February 18, 2020, 04:01:22 AM »
I spent the past few days scraping the site.
Here is every post on shrinemaiden.org that's accessible through an account.

it's all in a .csv file of about 800mb. I hope someone else can figure out a way to make it all easy to access.

Wow.
That csv file is extremely useful. It is the entire MoTK corpus in an easy to parse format, that is amazing.
Well done, I was hoping someone would make a crawler that would construct database like this.  :o :o
Cirno the Ice Fairy~

Re: The End of Shrinemaiden As We Know It
« Reply #62 on: February 18, 2020, 09:30:07 AM »
Since it looks like at least one scraper has finished and a copy has been made available in csv format, I will speed up my script (which had an intentional time delay and is a little more than half completed atm) and get things done with.

On the topic of omitting topics: assuming the others are working similar to mine, the forum topics are structured in a way that's easy to lookup (topic=12345 in the URL and then a second number which increases by 30 [posts per page] for every new page).  That said, the scraper(s) is/are indiscriminate, and any topics that folks want omitted from the archive would need to know what topic number represents each topic.  Topics are numbered in chronological order (date the topic was created), so a category that holds many topics will likely have the numbers for those topics jump around (they're non-sequential).  This would make it difficult to omit any one section of topics.

With some more analysis, it's possible to re-categorize topics into the section they came from, then pull an entire section of topics, but only after everything's already been retrieved.

Suwako Moriya

  • Hey you with the pretty face
  • *
  • Welcome to the human race!
Re: The End of Shrinemaiden As We Know It
« Reply #63 on: February 18, 2020, 01:08:44 PM »
I mean, the only people that have any real control over what gets archived are the archivers. (In retrospect, I suppose we could have closed LJ viewership when archival was first suggested, but what's done is done.) Since that .csv is now provided, all I can do at this point is ask people to please try to respect the privacy of those who may have posted personal information in that subforum. I'm thinking in particular of one thread from years ago about potential legal name and career path changes of a former staff member (because not a day goes by where I don't think about that thread, for better or worse <_<), but that's merely one example; I'm sure there are plenty of others I can't recall or didn't read in the first place.
The only thing this dimension does well is show its ass. We might as well applaud it! - Albert Burneko

Well, this game happened.

Alcoraiden

  • Do not go gentle into that good night
Re: The End of Shrinemaiden As We Know It
« Reply #64 on: February 18, 2020, 04:12:26 PM »
Well, I'm glad we were able to pull out a copy of the forum. Dang. End of an era going on here.

Re: The End of Shrinemaiden As We Know It
« Reply #65 on: February 19, 2020, 03:39:56 AM »
Well. For years, I've always been told that digital storage media can hold the information of several dozen libraries. Always taken that fact for granted, but now it really hits home.
That shrinemaiden.csv is on this flash drive, in front of this Alice fumo. The concentrated text of the entire past 11 years completely fits on something smaller than a fumo with QUITE A LOT of room to spare. 11 years of ups, downs, forum games, fanfiction, banter between 2009-2020.



It's mind blowing to say the least.
Cirno the Ice Fairy~

ふねん1

  • Scientific editor
  • If you're alive, you can always keep moving.
Re: The End of Shrinemaiden As We Know It
« Reply #66 on: February 19, 2020, 05:25:58 AM »
I'm afraid I'm not familiar with .csv files, is the one that was posted supposed to be basically every post from this site backed up (minus certain exceptions, as mentioned), or is it much more limited than that? I also don't really know how to use it myself, as Excel isn't opening it right lol. I might sound selfish by saying I'm primarily interested in my own post history, but I have a personal project I'd want to legitimately use that for, so I'm curious if I'd have to take other measures to preserve it beyond that .csv file.
"Science is more than a body of knowledge. It's a way of thinking." - Carl Sagan

NEW AND IMPROVED YOUTUBE, now with 60 fps Touhou videos! Latest video update: WBaWC Lunatic/Extra no-miss no-bomb no-Roars no-Spirit-Strikes compilation.

Re: The End of Shrinemaiden As We Know It
« Reply #67 on: February 19, 2020, 06:49:44 AM »
Tom has already put up a mirror of the public part of the forum at archive.lunarcast.net/shrinemaiden.org.
I'm afraid I'm not familiar with .csv files, is the one that was posted supposed to be basically every post from this site backed up (minus certain exceptions, as mentioned), or is it much more limited than that? I also don't really know how to use it myself, as Excel isn't opening it right lol. I might sound selfish by saying I'm primarily interested in my own post history, but I have a personal project I'd want to legitimately use that for, so I'm curious if I'd have to take other measures to preserve it beyond that .csv file.
The .csv file posted contains the text content of (I assume) every post, its title, thread ID, name of the author, date of posting, and subforum it was in. It's not easy to use as is because of its huge size.

If you tell me exactly what you need I can probably help you.
« Last Edit: February 19, 2020, 06:51:16 AM by niektory »

ふねん1

  • Scientific editor
  • If you're alive, you can always keep moving.
Re: The End of Shrinemaiden As We Know It
« Reply #68 on: February 19, 2020, 07:40:55 AM »
Well I can't really describe "exact" things I'd want, it's more having the post history to observe my general thought progression through certain periods of my Touhou career. I might be able to get what I want written down sooner rather than later, who knows. Don't worry about extracting anything right now, though thanks for the offer.

But on that note, I imagine we wouldn't be able to log in to Tom's archived version once the original goes down? In terms of my post history, the stuff I'd want to look up for reference is spread across many forum threads, so going to my post history from my profile would be the best way for me to do that on my own.
"Science is more than a body of knowledge. It's a way of thinking." - Carl Sagan

NEW AND IMPROVED YOUTUBE, now with 60 fps Touhou videos! Latest video update: WBaWC Lunatic/Extra no-miss no-bomb no-Roars no-Spirit-Strikes compilation.

nav'

  • nothing to see here
  • definitely not a Ditto
Re: The End of Shrinemaiden As We Know It
« Reply #69 on: February 19, 2020, 11:56:46 AM »
Tom has already put up a mirror of the public part of the forum at archive.lunarcast.net/shrinemaiden.org.The .csv file posted contains the text content of (I assume) every post, its title, thread ID, name of the author, date of posting, and subforum it was in. It's not easy to use as is because of its huge size.

If you tell me exactly what you need I can probably help you.
Unfortunately this one still doesn't seem to work properly.

I'm probably going to prepare a small half-functional archive myself, but it's not going to be much more than a curio.
Рабинович глядит на плакат ?Ленин умер, но дело его живет!?
? уж лучше бы о он жил!

Re: The End of Shrinemaiden As We Know It
« Reply #70 on: February 19, 2020, 02:59:28 PM »
Well I can't really describe "exact" things I'd want, it's more having the post history to observe my general thought progression through certain periods of my Touhou career. I might be able to get what I want written down sooner rather than later, who knows. Don't worry about extracting anything right now, though thanks for the offer.

But on that note, I imagine we wouldn't be able to log in to Tom's archived version once the original goes down? In terms of my post history, the stuff I'd want to look up for reference is spread across many forum threads, so going to my post history from my profile would be the best way for me to do that on my own.

From what i can tell, all of your posts are in the CSV file and Tom's archive. The only issue though is your name is Japanese characters, and the CSV doesn't seem to have used UTF-16 in the Author names, rather opting to use U+ denotation to store your name. For reference, I'm pretty sure your name is <U+3075><U+306D><U+3093>1 in the csv file.

Main benefit of having the CSV file with everything in it is that most of the hard work of crawling the website and getting all the posts into an easily convertible format has already been done. All that's left would be to figure out a way to make searching the contents and present it in an easy to use fashion.  The way I'm thinking about going about it is to make a converter that can convert CSV into an SQL database, then let SQL handle the searching once it's been converted over. Theoretically, since topic IDs were preserved in the CSV file, to search the database and link to a specific page in the static HTML archive. Would also make it possible to list all the posts made by a user, or show the entire forum thread on a single page.

Might take a while on my end tho, life is busy.
Cirno the Ice Fairy~

Re: The End of Shrinemaiden As We Know It
« Reply #71 on: February 20, 2020, 01:55:04 AM »
hello. i am excited to have any really old bad posts i made be wiped off the internet. dont reply to this saying thats all my posts.

Arcorann

  • hey catgirl
  • Surprisingly unkempt
Re: The End of Shrinemaiden As We Know It
« Reply #72 on: February 20, 2020, 03:13:43 AM »
It's too late - how many archives are we up to now? Infy's archive, Tom's archive, ArchiveBot's archive (which will be going onto the Wayback Machine once it's done) makes three public archives so far by my count.

Re: The End of Shrinemaiden As We Know It
« Reply #73 on: February 20, 2020, 04:09:33 AM »
I don't really have a problem with archival, but I think it's better to have the archives hard to find but circulated among the members, instead of publicly indexed on the wayback machine. That way the people who it's emotionally significant to have it but not the random internet weirdos digging for a person's online history

Lebon14

  • 椛ちゃん、助けてぇぇぇぇぇ!
Re: The End of Shrinemaiden As We Know It
« Reply #74 on: February 20, 2020, 05:01:21 AM »
I don't really have a problem with archival, but I think it's better to have the archives hard to find but circulated among the members, instead of publicly indexed on the wayback machine. That way the people who it's emotionally significant to have it but not the random internet weirdos digging for a person's online history

Don't worry, LettyJournal, while it'll probably backed up, won't be made available per request.

Sapz

  • There's no escape.
  • *
  • It's time to burn!
Re: The End of Shrinemaiden As We Know It
« Reply #75 on: February 20, 2020, 02:56:01 PM »
Edit: Never mind, seems to be a bit late for that. :V
« Last Edit: February 21, 2020, 01:50:51 PM by Sapz »
Let's fight.

Hieda no Mukyu

  • Eastern philology in limbo
Re: The End of Shrinemaiden As We Know It
« Reply #76 on: February 21, 2020, 03:11:19 PM »
I had been away for months.

While I knew the community was facing some unspecified major difficulties, I had no idea the situation was anywhere near this grave. Yet I was feeling kind of skeptical regarding the future of the Kaleidoscope forums.

Having skimmed through this thread last night and read many parts of it with thought, I am deeply impressed by the community spirit that has surfaced and the self-sacrificing efforts of several members.

Under similar circumstances, many an online community would have just abandoned the ruins even if they had proper root access to the database. Sure, there would have been individuals who would have saved dozens or even hundreds of threads as mementos. Most likely there would have been some sort of continuation, possibly as a number of different Discord groups or whatever the people would have found convenient enough.

As I told one of my friends last night, no social medium that I have discovered so far can serve me like web discussion forums ? as a cross between a chatting circle, a study group, a publication platform, and a searchable online archive.

While I am sad for what has led us here, I am most grateful to those who have helped salvage, save, and revive the discussion forum incarnation of this community. Whether you have contributed with work, money, ideas, or encouraging words, you have been a part of it.

Teetoku

  • You should...
  • Take it easy.
Re: The End of Shrinemaiden As We Know It
« Reply #77 on: August 28, 2020, 08:52:07 AM »
Shame that this forum's days are numbered, RIP to the owner.
I haven't been here for the past few years, just decided to check this place out and then this happenes.