Jump to content

All my products and services are free. All my costs are met by donations I receive from my users. If you enjoy using any of my products, please donate to support me. My bare hosting costs are currently not met so please consider donating by either clicking this text or the Patreon link on the right.

Patreon

Recommended Posts

Posted

Thanks to headkaze's excellent work we now have a set of XMLs with all the data from AllGames.com so I thought I get a topic going for Step 3 of the database project... replacing the Categories in the current databases with the categories from AllGames.com

I wrote a quick and dirty app that'll go through each XML grabbing the game name and category which I then output to a txt file

So now I have a txt file for every XML

Now is were the fun begins :)

I'm not really too sure what to do now... my initial thought was to merge all the txt files into 1 big list of games and categories, doing this would mean that even if a game wasn't covered in AllGames for 1 system it may been in their database for another system and if its the same game name it should be the same category, what do you think?

I could then write a tool similar to my FuzzyTextMatch to match up the names in the reference file with the names in the DB's

This seemed all well and good but the txt file, after removing duplicates, has 26000 entries :D which on the fastest machine in my work (2x 3ghz dual core and 8 gigs of ram) takes almost an hour to process with FuzzyTextMatch though FuzzyTextMatch is only using 1 of the available 4 CPUs

I'm now wondering if maybe I should write something to grab all the names from all the databases and run that through FuzzyTextMatch with the 1 big list of games and categories, I'm sure will take quite a long time but at least we wouldn't have to do it for every mdb.

I could divide up the results so that a few of us can check them and finally have another tool that will then take the confirmed results and make the changes to the MDBs for us?

How does this sound Nologic?

I haven't bothered uploaded the individual txt files as I don't see them being of any use to anyone so there's no point in wasting Tom's bandwidth but if you want them PM me and I'll send you them

Stu

  • Replies 100
  • Created
  • Last Reply

Top Posters In This Topic

Posted

I made a few more changes to the databases that I had posted in the step 2 topic so here is the complete WIP package as it stands at the moment 7zip'd

MDB Package Beta3

I feel that this set, even as it stands at the moment, far surpasses the original set supplied with GameEX

Stu

Posted
I made a few more changes to the databases that I had posted in the step 2 topic so here is the complete WIP package as it stands at the moment 7zip'd

MDB Package Beta3

I feel that this set, even as it stands at the moment, far surpasses the original set supplied with GameEX

Stu

Thanks Stubie

:)

Posted

Interesting questions...

My concern with using the name of the game to get the info from another system is duplicate game names with different games. What I mean is, take for example the game name "Baseball". How many systems had a game with that name? My guess is, especially early systems, they all had one and they were all from different software companies. In this case, you can safely assume that they were going to be a sports game where the sport is baseball. But, what if the game is called "Stunner" or whatever.

I like the idea, I'm just concerned that we may have false matches. Maybe there could be a way to flag the ones that aren't matched for manual verification? So, your fuzzy match creates the suggested info and someone has to manually verify. Of course, that might be difficult if you don't have the rom to run...

Posted

Id be happy to help with whatever I can, although I would need detailed directions on what to do.

If you have 4 cores to work with then run 4 instances of your program.

make 4 copies of the text file if you need.

What processor(s) is it?

Posted

Yeah this is why I wanted more input before battered on... I done a simple check and there are 400 odd games out off the 26000 that have the same name but a different genre, I don't know the names yet I just quickly removed the genre column in excel and then sorted and removed duplicates in the name column so I'll need to write something to find those games first...

The problem is that if there were 2 games called "Stunner" on 2 different systems they would always return 100% match, so without checking on a per system level first we wouldn't know if it was the same game or not which is the bit I'm trying to avoid :) if we definately have a few people willing to pitch in its not so bad, I just don't fancy going through all the databases again myself... the other option I'm just thinking is merging same era sets eg. snes and genesis, master system and nes and so forth, this should give us better results and minimise mixups

@LB11

The problem is to get the best matches the program needs to be comparing the whole data set each time... I think I could write my tool to to use multi threading for the bulk of the processing, haven't really done much with multi threading but I've been looking for a project to implement it on as a test :) it's not that much use to me though as my home pc doesn't have dual core or multiple cpu's and I'll not be able to sit in work sorting games all day :) It has 2 Xeon 5160 chips which are dual core themselves

Stu

Posted

*Hurry's to Install Office*

Okay for the time being this is the feed back I have...Since the data we currently have is based off Moby's...if games have the same name...they should be able to be tested to see if they match or not by simply comparing the the descriptions from one or more...I know some games have little extra's added to the descriptions...which will throw things off a little even tho games match...but it should prevent false positives...however it may induce false negitives.

The false negitives should be few...but it seems like the only way to safely match games.

*Office Installed*

I'll leave more feed back in a sec.

*Update*

Okay now what I see right now is that we currently have:

Category\Perspective\Sport\Non-Sport\Misc

Making up our primary ways of qualifying games...based off Moby's:

Gener\Perspective\Sport\Non-Sport\Misc

So first off I'd suggest we rename our current Category to match its original title of Gener as provided by Moby. This keeps everything in its original context.

Next we create a new field aptly named Category which will store the Cat. info from AllGames.

Now I did also notice that we snagged some very useful info from Moby...namely the AlsoFor field...which point blank tells us what other systems the game was also released on...we could additionally check Description as mentioned prior...but we could also ether\or use Publisher & Developer fields to also qualify positive match's through out the db's....tho I do know Pub's and Dev's some times change from platform to platform...so its not a constant.

Posted

I'm following this closely, as the DBs are an awesome addition to my GameEx setup. And would love to help out in anyway that I can.

I noticed that the AllGames data has a "Genre" field and a "Style" field, That work quite well together. The style field seems to work like a subcategory, which would be awesome if/and when Tom decides to include subcategory support in gameEx. Will both these fields be supported in the proposed changes to the DBs?

Ideally (at least for me) the current category field would be replaced with allgame's much narrower, and hence more usable, "genre" field, and a subcategory field could be added to the DBs using allgame's "style" information (which would be used later if we can talk tom into subcategory support).

At least then it would actually be useful, and at least remotely accurate, to search for games by category in gameex, where as at the moment it returns a huge list of multi-category nonesense. Honestly I think the current categories (moby's genre info) aren't very usable for sorting games, and at best, prove mildly informative. There's just far too many cryptic, multi, categories in it (ie. Action-strategy-sport-RPG as a single category). Perhaps I'm missing something... I dunno.

that's my 2 cents anyway :)

Posted

I was thinking on the order of nologic on this one. Ideally, we would be able to use date of release, publisher, title, and developer for filling in, but if a title doesn't have all of this (or worse, there's a difference) things get complicated. I would think that if there was a Moby field for AlsoFor and a similar date (or one other field match), it would have to be a match even if it wasn't necessarily perfect.

Posted
The problem is that if there were 2 games called "Stunner" on 2 different systems they would always return 100% match, so without checking on a per system level first we wouldn't know if it was the same game or not which is the bit I'm trying to avoid :) if we definately have a few people willing to pitch in its not so bad, I just don't fancy going through all the databases again myself... the other option I'm just thinking is merging same era sets eg. snes and genesis, master system and nes and so forth, this should give us better results and minimise mixups

Wouldn't this be copywrite infringement? Unless they just happened to make the same game different between systems. Seriously though how many games would end up like that. And how many people would actually notice that the description is wrong. I bet most people don't even play half of the games they have.

Either way, thats why they are beta. :D Maybe have a sticky feedback thread where people could say hey! this game is not correct. Then that game could be changed with the correct info that is already available, and a new DB released. I don't know, just a thought.

Posted

Especially early on, game writers weren't exactly creative with titles. My example of "Baseball" wasn't an exaggeration! I had a TI99/4A that had a game called "Football". I guarantee there is at least one more title of the same name. There wouldn't be a copyright thing anyway since a title like football is simply the name of the sport. You can't copyright the name of a sport! Either way, it's unlikely we would run into too many of these problems (which I stated above...sorta). I still think these should be matched as best they can and just flagged for review.

Posted

One way to solve this would be to rename the ROM to the CRC. This will be unique.

Then have a MAP file Title the game correctly in gameex.. Noone wants to have a huge list of CRC's

Then the game database insted of going by name can go by crc aswell.

Now you can choose Football or say Street fighter II That have about a hundered versions across multiple platforms. And have it pull up the game from all platforms along with showing the correct info..

This would be alot of work and I am not talented enough to do it.

But it would complement the search feature if Tom implements it.

Posted

@ Brian

Well CRC\MD5\SHA1\... checks are only good for release group release's....anyone dumping their own may use different settings or software that may give different checksums. :(

Plus...all those checksums may become out dated by the same release groups...as they find or create better dumps....really its a no win situation.

The databases are always going to be off in some fashion...there is simply no getting around it...all that can be done is to try and limit where things get off base...while still making these maintainable.

hehe these have been massively automated in their creation (thank god)...maintaining old & new checksums would be a major pain in the ass...and seriously blow up the size of the db's...and since we are dealing with Access rather than SQL...size is a major issue.

So while you are correct that it would greatly limit errors...maintaining the db's would be a pain in the ass that I would wish on no one...and the size would cause great slow downs.

====

Well I'm heading out of town tomorrow for two weeks give or take...so what ever the outcome is, I'll be tickled pink about it...if only for the reason that things are usable. :)

Posted
@ Brian

Well CRC\MD5\SHA1\... checks are only good for release group release's....anyone dumping their own may use different settings or software that may give different checksums. :(

Plus...all those checksums may become out dated by the same release groups...as they find or create better dumps....really its a no win situation.

The databases are always going to be off in some fashion...there is simply no getting around it...all that can be done is to try and limit where things get off base...while still making these maintainable.

hehe these have been massively automated in their creation (thank god)...maintaining old & new checksums would be a major pain in the ass...and seriously blow up the size of the db's...and since we are dealing with Access rather than SQL...size is a major issue.

So while you are correct that it would greatly limit errors...maintaining the db's would be a pain in the ass that I would wish on no one...and the size would cause great slow downs.

====

Well I'm heading out of town tomorrow for two weeks give or take...so what ever the outcome is, I'll be tickled pink about it...if only for the reason that things are usable. :)

You are completly right.. I wouldn't want to do it either.. :)

I hope you enjoy your time off..

Take care my friend

Brian Hoffman

Posted

I spent a bit of time checking the SNES mdb names with the names I extracted from the SNES xml and my FuzzyTool found 540 exact matches and 192 possible matches which after checking there were 180 odd valid matches... There only are like 730 or entries in the mdb any so for that set & probably most of the rest I reckon we'll be fine just working a system at a time, that way there should be no chance of mixups. I'll need to modify my fuzzy tool to add the new categories for us then we can deal with any left overs afterwords

I'll hopefully get something done for us in a few days... though everytime I go near that tool my life seems to slip away as I try to improve the match function :)

Stu

Posted

I have a little something for us

You'll need headkaze's xmls pack from his post here

Here's the instructions

01. Open the MDB your working on in Access

02. Rename the current colum call 'Category' to 'Multi Category'

03. Add a new column called 'Category'

04. Close the MDB

05. Run my modified FuzzyTool

06. Drag and drop the MDB onto the List 1 box

07. Drag and drop the matching XML onto the List 2 box

08. Leave all other options as they are and click Match

09. Amuse yourself until the Results window appears... may take a while :)

10. Check down through the 2 columns and put a tick in the box if you DISAGREE with the match

11. Click 'Update MDB'

That should be you, check the MDB to make sure the Categorys column in the MDB now has some data

If you drop the MDB into my tool again it will only load the games that don't have a category

This is handy because if you let it run again it may find some other useful results because now all the exact match have been removed it keeps my tool more options to play with eg. you may have previously match 'NHL 05' with 'NHL 2005' but now it might match 'NHL 05' with 'NHL 06' which for this case is a vaild match as both games are going to have the same category

I've tested this with the SNES database and it's matched everything apart from 25 which I plan to do manually... but it's bed time now :)

Stu

FuzzyTextMatch135.zip

Posted

I am having a problem....

I open the database in access 2003. Change Category to Multi Category. Close access which asks me to save the file. I do.

Now when I load it into your tool, a .ldb file appears on the desktop and the tool tels me Unsuported format Category.

I dont know why the LDB file is appearing I think it is locking the database somehow because it wont let me change anything in the database afterwards. If I try without changing anything in the database it wont give me any error and wont do anything.

Posted

The LDB file goes away when I close the fuzzytext tool. It reapears everytime I try and load the database again. It only does it after modifying the database to multi category.

Posted
The LDB file goes away when I close the fuzzytext tool. It reapears everytime I try and load the database again. It only does it after modifying the database to multi category.

The LDB is normal it appears everytime a MDB is opened... even when you open it in Access the LDB should appear

That error means that it's not finding the Category column, the only way I can make that happen is by spelling Category wrong in the MDB, I wonder is it something to do with Access 2003, I'm using Office XP but I tested it with 2007 yesterday

Try loading the database I've attached in and see if it works

Stu

_Console__Atari_5200.zip

Posted

These are all completly matched, I've manually matched the last few that my tool couldn't find

[Console] Atari 7800 - Complete

[Console] Atari Jaguar - Complete

[Console] Coleco Vision - Complete

[Console] Fairchild Channel F - Complete

[Console] GCE Vectrex - Complete

[Console] Mattel Intellivision - Complete

_Console__Atari_7800.zip

_Console__Atari_Jaguar.zip

_Console__Coleco_Vision.zip

_Console__Fairchild_Channel_F.zip

_Console__GCE_Vectrex.zip

_Console__Mattel_Intellivision.zip

Posted

That database worked, but when I pressed update MDB at the end it said "problem saving changes".

Posted
That database worked, but when I pressed update MDB at the end it said "problem saving changes".

I forgot this happens with some databases but I can't work out way, the actual error is "No value given for one or more required parameters" but I've now idea how to fix it. I have a feeling it's because some of the cells in the mdb are being left blank but previously I've tried making sure all the cells have some sort of data but it doesn't fix it... maybe somebody that knows a bit more about OleDb or MDBs in general could help me out

I think this only happens with 2 or 3 of the MDBs so I've just beeb ignoring it :)

Have you tried any of the other MDBs by adding the Category header yourself? Just wondering if you first problem is Access 2003 or not

Stu

Guest
This topic is now closed to further replies.

×
×
  • Create New...