Session Start: Thu Feb 01 15:10:16 2001 [Intervening dreck excerpted by GrokFusebox] Much later: What time will we start? Whenever fusebox picks up the gavel should be very soon :-) (picking up the gavel!) * Maladryne gasps (gavel raised high over head) hehe alright folks we're talking about verity. but honestly i haven't used verity in a long time, so someone else should probably lead this discussion anyone want the gavel? are we specifically on verity or rather searches as they apply to fusebox? btw, who was working on that fusebox-verity app and talked about it on the list? is he here tonight? (I vote for both) * jpriest is back (gone 00:43:36) *** Joins: ChrisM (~monty@cs162186-105.satx.rr.com) *** firefox sets mode: +v ChrisM *** ChanServ sets mode: +v ChrisM That would be Russ Johnson He was wrote on 1/15 about his verity/fusebox app I have my own methods with dealing with this problem, but its a little klunky It involves keeping a list of fuseactions in my database (which I do anyway) and then using CFHTTP to get each page (in its entirety) and then create a "Fake" query and indexing it *** Quits: Fusebox (Ping timeout) *** Quits: Eron (Quit: ) *** Joins: Guest25536 (~ebc@user-2ivenu4.dialup.mindspring.com) *** firefox sets mode: +v Guest25536 *** ChanServ sets mode: +v Guest25536 *** Guest25536 is now known as eron ? whoa what happened did russ ever post his code that he was working on anywhere - he mentioned this last week sometime he told he was going to share it today i think I don't think he ever did. Netsplit. I guess. *** Joins: Fusebox (m@216.12.13.30) *** firefox sets mode: +v Fusebox *** ChanServ sets mode: +v Fusebox Who pulled the plug? dang, system crash. what did i miss? *** sT|Owned sets mode: +o Fusebox heh not too much were kinda stalled Seems like we're in need of a Verity guru at this point... :) let me see if i understand the problem. when someone does a verity search on a directory(s) it returns a bunch of dsp_ files which is fine... but those cannot be viewed without the index.cfm file is that the main issue here? or is there something else too? Sort of on the right track. The main problem is that you have to "Reconstitute" your pages before they can be searched well from all what I have read is Verity even worth the discussion as it stands in cf 4.5? Alternatives? huh? english for us dummies here :) They are supposed to be improving it in 5.0, right? yea There are LOTS of issues with verity and its an old version of the verity engine to begin with *** Quits: ChrisM (Quit: ) i don't think the version really matters though people are still using it the way it is The alternatives aren't so good either (except for the SQL Server fulltext search) because that's all there is to offer fulltext is cool I do use it, but I do alot of reindexing and alot of index dumping Very flakey let me dig up the draft whitepaper on fusebox and verity As I was saying earlier I use CFHTTP to index fusebox sites It works fine, but you almost need a verity indexing server because it pulls lots of resources that sounds like more of a CF problem than a fusebox problem is it? for you windows people, ever consider using MS index server? You can get at it via ole db and com... do you do the indexing during a certain time or whenever someone searches. Well Verity is very CPU intensive and so can CFHTTP be. with win2k, the index service is standard (even with pro) (Phillyd if you mean me, I do it offhours--I don't do any "live" indexing...) I've had problems with the index server over a period of time; these days, I generally don't even install it anymore. Hi Guys, Thanks for your time. I have a possible solution to the fusebox problem in regard to search engine spiders. I've built an application that dynamically creates several static version of a site. *** Quits: Fusebox (Excess Flood) Whoops. eron, that's what i thought. oop *** Joins: Fusebox (m@secretagent2.cv.mvl.intelos.net) *** firefox sets mode: +v Fusebox he did a big o' paste *** ChanServ sets mode: +v Fusebox hehe *** sT|Owned sets mode: +o Fusebox i didn't think that would work... :-) few lines at a time For non-fusebox sites (or for indexing databases) I do index live with Verity, but it doesn't make how much went through? 3 sentences sense to do it live with my fusebox method doh I have a possible solution to the fusebox problem in regard to search engine spiders. I've built an application that dynamically creates several static version of a site. *** Quits: Fusebox (Excess Flood) i'll post it on a server *** Joins: Spider-Man (~marshj@mundo-swsa.providence.org) give me a sec *** firefox sets mode: +v Spider-Man *** ChanServ sets mode: +v Spider-Man I wonder if you could use Steve's method to dynamically build a version of the site that verity could index? While we wait do you all want to know some undocumented VERITY features? like sure ah crap... this is for search engines... not verity. sorry no dice That's a big topic for me at the moment though Steve I just joined. What are we waiting for? You will want to be aware of the file "vdk20.stp" in your CFUSION\VERITY\COMMON\ENGLISH\ directory. It contains a list of words that Verity will ignore when a search is performed. These are generally "human garbage" words that you'd normally want tossed out of a search. For example "A," "An," "Another," "is," "are," and "my," etc. You can add your own words to the list or subtract some of the Verity defaults. Does anyone have any specific fusebox questions? (To do with Verity) (or any verity questions to do with fusebox?) Is anyone else still here? Yes. yes how many "fields" can verity index? Title, Description, Custom1, Custom2 database searches - not file searches I don't think there is a limit There is a 100 megabyte limit to index sizes though Is there a way to use WDDX to store text data say for your static pages and then use that to index them more efficiently? (according to the Allaire KB) You have to be able to match it back to the "live" page. As long as you can do that, you're golden. someone mentioned ms index service as a substitute for verity under windows, but has anyone had experience with popular unix searching engine called ht://Dig (http://htdig.sourceforge.net/). could we use that (under cf-linux i mean)? I ended up populating Custom2 with many pipe delimeted fields. When Verity returned search results, I had to do more processing/filtering before displaying the results. I've been messing with Atomz search engine and Fusebox and have gotten nowhere - I have used it successfully in the past on non-fusebox sites i have not had experience with htdig (i am aware of it), but to offer up another indexing solution for *nix, the excite engine is free for *nix as well it seems like when the engine goes out to spider the site it gets caught in an ugly loop going through the index.cfm file (not that i have experience with it either...) Regardless of what index server you do use though you'll always come up with this problem of matching dsp page content to index.cfm page links... That is a general problem with search engines I mean it seems kinda intractable. (hopefully search engine safe URLs do something to avoid this trouble) here is that email about search engine spiders http://www.fusebox.org/development/emails/possible_verity_solution.htm i haven't tried it yet though i think Russ's main problem is the biggie - how to get the info in the file that will allow it to point back to the index.cfm - he didn't like his solution of embedding some code in each dsp_ file Here is my 3 step process: 1) Keep a list of fuseactions in a database table (this is something I do by default) 2) Loop through the fuseactions and use CFHTTP to grab the pages that result from each fuseaction 3) Make a "fake" query useing Querynew, QuerySetCell, and QueryAddrow. 4) Use CFINDEX to index the fake query, the key or a custom field is going to be the fuseaction so that when someone searches, they can be dumped back to the correct fuseaction. Because this can be a slow process, you might want to use some kind of timestamp to keep track of what pages have been updated and which ones not. Only index the ones that have not been updated "Recently but that might be unavoidable... hopefully he'll pop in tonight eron, any chance you could write a short article on your verity solution? eron - do you have any sample code you could share? It sound's interesting - though I have never messed with CFHTTP before eron, what if instead of making a query, you place the information in a db table? I do have the basic code if anyone wants it. email it to me, i'll put it up on fusebox.org The thing is that most of the sites I am doing this with are "content managed" so most of the text of the site is already in a database. I'd like the code - jbmarsh@providence.org so I am assuming then that there are a number of qry_ or act_ files in each fuseaction of your app You could put the resulting HTTP into a database table, but since its temporary anyway, you might as well do "fake" queries. that would affect the data in each fuseaction I think that was one thing Russ was hoping to do is to index both dynamic and static pages - my current fusebox project is a mix of both i think it really depends on how often a page's "dynamic" content changes. Folks, I have pinpointed the code in question. I'd like to edit it a bit and comment it better before I share it. darn, i gotta run... if you guys come up with a solution to the problem, send me an email, i'm creating a new /development section on fusebox.org i'll post new possible solutions there Here's a brief synopsis of the problem with verity and fusebox: The issue is that you have a directory filled with dsp_pages which need to be indexed. When the verity text engine goes into the directory it records which dsp page has what content. Once it's done that you're faced with the problem that any search against that index will result in a link to a dsp page rather than the index.cfm?fuseaction=etc page that INCLUDES that dsp page * mcknzm a bit OT but still: from beta.allaire.com "ColdFusion 5.0 Beta Cycle begins 02/02/01!!" that's tomorrow! ok later Steve later *** Quits: Spider-Man (Quit: ) Are any of you using Search Engine Safe URLS? yes I am. so what really needs to be done is a way to link the dsp_ search results with the proper fuseactions in the search results displayed, right? not yet - wish I had coded them in from the start however :\ http://www.fusebox.org/development/ i'll start posting possible/real solutions to fusebox problems there that's correct phillyd later everyone. let me know how this turns out! *** Quits: Fusebox (Quit: ) unless you have something in the dsp page that lets the verity engine know to use a custom link when displaying results you're gonna get links to your dsp pages in any search results okay, so then a little combination of eron's method of a table with fuseactions with maybe a table of dsp_files and some link between the two. yeah but how are you gonna get verity to put that combination together well, that's my question on verity because i haven't worked with it that much, how do results get sent back? straight url, dpends on what you want? I think this could work Generally for static page indexing you get back a path to the page. I guess you could use that to look up the appropriate page in the database, but it becomes a big maintenance chore. Whenever you want to add new pages you'd have to update two tables in the DB. One for fuseactions and one for corresponding dsp_pages right, so the question would then become automating the linking of the dsp and fuseaction yes. And god forbid if the dsp page takes additional query parameters.. to display properly. Then you're really screwed Thats why I use CFHTTP and just index the rendered pages. yeah..I see why. yes, i do too. *** Joins: GrokFusebox (jeff@66-44-86-48.s556.tnt1.lnhdc.md.dialup.rcn.com) *** firefox sets mode: +v GrokFusebox *** ChanServ sets mode: +v GrokFusebox *** Quits: july9th (Quit: ) * GrokFusebox wonders how quiet a room can be... the sound of stumpness It's 'cuz we say you coming hehehe not unusual... ;> I know I'm late, but come ON! How do other architectures handle this problem? so eron you take the CFHTTP.FileContent and put it into a custom query using QuerySetCell() and CFINDEX it? Yes URL = "http://www.mysite.com/index.cfm?fuseaction=#fuseaction#&i_am_the_search_engine=yes" resolveurl = 1 throwonerror = Yes columns="title"> ]*>", "", "All") > hmmm.. what if we called our different fusebox app pages in a loop with surrounded by and add that to a query and CFINDEX that? would that work? is someone archiving this? * mcknzm is this is a little OT but is there any reference to Regular expression in cold fusion on the web etc. I just emailed my not-so-neatly-written code to jbmarsh and steve nelson As always, we have plans to post on GrokFusebox.com afterwards. okay, how about eron's code as well * firefox remember your friend, dcc eron, send to me for posting on GrokFusebox.com: jeff@grokfusebox.com * firefox remember your friend, dcc http://www.cfug-md.org/cfugDec99.ppt (that's the URL to the REGEX reference) thanks eron... Jeff, I just emailed you the code, but I really need to spend some time commenting my rationale. It will give you the idea though. (And what I sent will work as written)( well gotta run, thanks for the edifying discussion and links.... see ya np; send whatever you like whenever--I'm always ready to add to GrokFusebox *** Parts: chowdhue (~ed@216.112.226.121) Okay. Let me know if you have any questions on what I sent in the meanwhile. (Assuming, of course, it contributes to the fullness...) Will do. You can either do x number of fuseactions at once or do them one at a time. could something like this work then? fuseaction="#fa#" headerfile="" footerfile=""> and CFINDEX too at the end That IS bascially what I am doing. The trouble is that Verity can be weird and slow... i know it's basicly what you're doing but this removes the CFHTTP bottleneck *** Joins: Nat (nat@24-130-191-136.san.rr.com) *** firefox sets mode: +v Nat *** ChanServ sets mode: +v Nat *** Joins: Warcry (~Russ@surf102-48-153.jacksonville.net) *** firefox sets mode: +v Warcry *** ChanServ sets mode: +v Warcry Good point. Actually great point. Very nice approach, Erki. I'll have to try this when I get a chance. Wait, is MCKNZM erki? yes it is eron, have you kept performance stats on your approach? If so, then I agree. sorry; yes Sorry Im late, what did I miss? * GrokFusebox reminds himself to stick to nick! No. :-( you people and all your anti-name names... we could use custom1 for generated page content and custom2 for query string a la fuseaction=myFA&object_id=123 etc. :P Nat? Hey, Stan^H^H^H^H Nat, what've you got against nicks? ;> could we then use that data to make up the correct link again? with all the right and needed URL variables..? nothing against nicks, just that we all aready know each other's real names from the list, and I don't know who the hell anyone is in here. WarCry=Russ aooooogah! erki, maybe, but would the resources needed be too much? *** mcknzm is now known as Erki *** GrokFusebox is now known as Jeff Now that I learned everyone's names, I am going to leave :-( dunno, haven't done anything with verity to be honest :) but we could try this idea out... seems like you still gotta get the dsp page pointed back to the index.cfm thats the problem I am having with my app... right, there has to be some way to link the two I use a custom tag right now it sucks you have to add it into every dsp page My final comment is: verity can be weird// Verity indexes tend to get bloated with time. Plan ahead for this and ensure that it's no problem to regularly drop and reindex your collections. If it's not possible to delete and recreate your index, you should at least OPTIMIZE it on a regular basis. This can be done automatically by using the ColdFusion scheduler with the CFCOLLECTION tag to establish a regular time. but dudes - if you have more than one FA call a dsp page, what happens? If all you're allowing Verity to search is the constructed query, you've got it locked. Bye *** Quits: eron (Quit: ) good point Nat I try to avoid this though Nat, the approach in question is to auto-copy all interpreted page content to a query, then let Verity go along on the query instead of the actual pages. ha! oh well, throw that whole re-useability thing out the window That sound better than what I have done so far jeff Why, Nat? jeff - your explanation sounds good how do you change your nick? but what's the difference between that and cffileing all pages to pure html, then verity-ing those pages? Not much; the question I think is one of performance (and how do you redirect back to index.cfm with fa instead of to html?) You have more options when you index a query * Jeff says change nick with /nick NewNick all the links could stay as index.cfm?fuseaction=blah, so the links would still work btw what do you get back from a search on verity collection? *** Warcry is now known as Russ russ! Its me!! *** phillyD is now known as Phil Im late Im late oh dear Im late run away... *** cfoam is now known as anonymous he he he ;> whos on first? What? * Jeff thinks we digress... Erki, I'm not sure about Verity results on queries. But it might be interesting to experiment. The code you offered a couple of minutes ago looked like the makings of an interesting custom tag. verity results on queries are resultsets The app I have worked up indexes the dsp pages fine... right, should we maybe set up a few experiments on similar files to see which methods we've discussed works and stats on performance the problem is redirecting to the index page Yeah, Russ. This is why I think the query approach might be the better option. (Note the presence of weasel word "might"...) ;> You have a point, indexing queries gives you more options with the fields you can create and search against *** Joins: Spider-Man (~marshj@mundo-swsa.providence.org) *** firefox sets mode: +v Spider-Man *** ChanServ sets mode: +v Spider-Man why can't anything that needs to be indexed just use html files? Maybe I'm not understanding the scenario where one would need to index .cfm files... Ok... i am indexing all dsp pages * Jeff thinks rather like Nat however, the dsp pages are part of circuits that are called by index pages your search results are limited in verity so you cant add url strings to the return urls thus no fuseactions I have successfullu indexed a whole site that I knew nothing about with my app... * database searches but why would you have content in .cfm files in the first place? OK, but let's go back to the fundamental question Nat raised again I just havent found an easy way to include the index page and the fuseaction to call to the relevatn pages ;> Why should we care about search engines within our apps? Don't we just want them to find us & then we control the content, searching and otherwise? ...more importantly, why should we care about file indexing in our apps? ...since all our content is in our magnificent dbs? because some of us work for corps that "have to have it" AH! right The truth comes out... PHBs are driving this requirements. all of the static content is not in the db... Are we talking about search engines finding fusebox apps or using searches within fusebox apps? yeah - that is the 'perfect world' in reality is a big mess of static and dynamic content Yes. ;> within Got it. We ain't talkin' nice, architected Fusebox apps. We're talking legacy crap that we have to deal with! my static content lives in dsp files that have to be indexed in order to search against them well damn. Pay someone else bigger bucks to deal with legacy crap Im not talking legacy crap... move your content to html files we dont have the option of putting static content in the db have your dsp files write html files then index those spidey - I mentioned that before. I'm still trying to understand why Russel needs to index his .cfm files. whe're on the same page... but then how do you 'wrap' the html files in the fusebox framework (header/footer) indexed of course Russ - entire content-only sites store their content in DBs ssi? that's how spectra works I get the point, it goes against better judgement... FWIW, I've never done a CF site that didn't have all its content in the DB but some of us are forced to do it this way :) *** Joins: Kekawaka (Kekawaka@syr-24-169-80-234.twcny.rr.com) *** firefox sets mode: +v Kekawaka *** ChanServ sets mode: +v Kekawaka lets all take a moment and pity russ yeah, but can you do intelligent searching efficiently on the data in your database? like AND and OR keywords and wildcards that verity lets you use? I put my static content in a circuit called "content" they are dsp files that are controled with an index.cfm what about large paragraphs of text? are those in the DB too? btw, spectra uses a LOT of verity collections for finding the objects.. *** Joins: fuseNYC (~just@user-2ive7hi.dialup.mindspring.com) *** firefox sets mode: +v fuseNYC *** ChanServ sets mode: +v fuseNYC Wouldn't it be better to have those in html files where the UI guys could update them anytime without bothering you or the DB guys? anyway. i'll try to put together an example app for my loop-bodycontent-module idea tomorrow. No. Put 'em in the DB with an interface that lets the text guys update 'em anytime without bothering the DB guys OR the UI guys. gotto go and get some sleep now, though.. it's 2:22 AM here... :( i'm just finishing up redesigning redstorm.com a game site - and while 'most' of the site is db driven (all the games) I still have quite a few pages that are static - I could probably dump that in a db but the current budget isn't going to allow it Bye, Erki. Sleep, man. bye Erki later, look forward to hearing how your example works! is anyone logging this? i have logged from about 5:50 EST.. but can someone continue? jpreist, thats exactly why we need a way to index the static content and wrap it up in our fusebox i'm trying to grab most of it Erki, Mal's logging for me. from the beginning? yep. not much happened till about 5:30 or so.. :) ok.. put it to grokfusebox.com then.. erki - spidey and my full-text searching solution KICKS ASS! it does all the cool verity stuff, including a thesaurus Count on it, Erki. nat: mind sharing it? well, i have to go too. i'll be looking the archive on GrokFusebox.com *** Parts: Phil (~nogoals4u@cg425181-a.adubn1.nj.home.com) spidey has the storedprocs to do it... i reformatted recently and, well... in sql server? yep how hard would it be to convert something like that to oracle the hardest part is finding a smart oracle person heh we have lots of those... * Erki off now... to see beautiful dreams about CF5.0 and beta1 available tomorrow :) *** Erki is now known as mcknzm *** Quits: mcknzm (Quit: ) thats about the only group that knows what the heck they are doing at our corp. sorry to shot your app full of holes, russ, and then run like I gotta - back in an hour if anyone is that much of a nerd If I stored all of my static content in the db I would give a crap about verity!! *** Quits: fuseNYC (Quit: ) that's exactly why no one else gives a crap :) I should be arond later... bye :) i'll be around tonight - unless i get sucked into a UT game I hereby declare the weekly session ended (for Maladryne's sake) Will be posted on GrokFusebox.com tomorrow.