2012年2月16日星期四

Any Full-Text Guru's - Need help with a query

I'm not sure how I can best phrase this, but I need help writing a query that
is inflectional, thesaurus and non word ordered specific. In other words, say
a user is looking for a "car wash" but types in "wash auto" or "auto wash" or
"vehicle wash" or "cars washed" or "washing cars" or "car and auto wash" or
"automobile car wash", etc. I still want a result produced of the entry in
the database "car wash", how can I produce this query. Another example, say I
want to produce a result of a record in the db "server administrators - sql"
and a user searches for "sql server administrator" or "sql administrator",
how can I produce the result with the same query as that of the above. The
same query should be able to produce both results given any quantity of words
searched for or words in the table.
I also have a concern of spelling. Say a user enters "sql srver
aministrater", is there any SQL Server 2005 CTP query that can help with
this? Will I need a third party tool? If so, how can I implement this.
The UI is an asp.net web based app and I want to target SQL Server 2005.
Thanks in advance everyone! I appreciate what you folks do!
the FreeText predicate does exactly what you are looking for. You will have
to edit the thesaurus file. You can find it in C:\Program Files\Microsoft
SQL Server\MSSQL.1\MSSQL\FTData.
Make sure you remove the comment tags, and you will have to use the
expansion option.
ie
<expansion>
<sub>car</sub>
<sub>cars</sub>
<sub>autor</sub>
</expansion>
For spell checking you will need to use something like
http://www.google.com/search?hl=en&l...r+web+serv er
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"wkbia" <wkbia@.discussions.microsoft.com> wrote in message
news:C0946A94-4836-4BAC-AD91-B7B191 E39FC3@.microsoft.com...
> I'm not sure how I can best phrase this, but I need help writing a query
that
> is inflectional, thesaurus and non word ordered specific. In other words,
say
> a user is looking for a "car wash" but types in "wash auto" or "auto wash"
or
> "vehicle wash" or "cars washed" or "washing cars" or "car and auto wash"
or
> "automobile car wash", etc. I still want a result produced of the entry
in
> the database "car wash", how can I produce this query. Another example,
say I
> want to produce a result of a record in the db "server administrators -
sql"
> and a user searches for "sql server administrator" or "sql administrator",
> how can I produce the result with the same query as that of the above. The
> same query should be able to produce both results given any quantity of
words
> searched for or words in the table.
> I also have a concern of spelling. Say a user enters "sql srver
> aministrater", is there any SQL Server 2005 CTP query that can help with
> this? Will I need a third party tool? If so, how can I implement this.
> The UI is an asp.net web based app and I want to target SQL Server 2005.
> Thanks in advance everyone! I appreciate what you folks do!
|||Thanks for the reply, however neither solution works well for me. Neither is
intuitive enough...or maybe that's me.
FREETEXT is too "fuzzy". It will produce a result of "car wash" from "auto
wash" for example, but the result will also include results of all records
that the thesaurus has an expansion on. Every record with car, auto, wash,
etc will be included in the result for example.
CONTAINS is more precise however it doesn't work well since you have to
provide a THESAURUS of not only synonyms, but INFELCETIONAL words as well,
ie., "cars", "automobiles", "servers", "administrators", etc. Creating a
THESAURUS is a daunting task by itself let alone having to expand on all
inflectionals as well and I "ain't" no english major if you know what I mean.
Is there an existing way to get an inflectional list for a word and then use
this result in a query? Same question for synonyms?
If big brother MS has their ears on, would be nice to have functions that
will do this. If you need help, let me know$$$. I don't have time to do
develop this and it's killing me. Infelections should be programable and not
hard coded. Should also be able to define different instances of these as
required in the query, database, or app. Same holds true for the Thesaurus.
Should have functions that will pull either synonyms or antonyms. I know the
question is why, but that's proprietary. Also, give us a Thesaurus and
Dictionaries that we can hack as needed.
Also, if someone could tell me how the heck rank and weight are ranked and
weighed, I'd appreciate it. The results I'm getting don't seem to be ranked
and weighed as I would have thought.
Can't program myself out of a paper bag sometimes but still trying.
Thanks in advance and again.
"Hilary Cotter" wrote:

> the FreeText predicate does exactly what you are looking for. You will have
> to edit the thesaurus file. You can find it in C:\Program Files\Microsoft
> SQL Server\MSSQL.1\MSSQL\FTData.
> Make sure you remove the comment tags, and you will have to use the
> expansion option.
> ie
> <expansion>
> <sub>car</sub>
> <sub>cars</sub>
> <sub>autor</sub>
> </expansion>
> For spell checking you will need to use something like
> http://www.google.com/search?hl=en&l...r+web+serv er
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "wkbia" <wkbia@.discussions.microsoft.com> wrote in message
> news:C0946A94-4836-4BAC-AD91-B7B191 E39FC3@.microsoft.com...
> that
> say
> or
> or
> in
> say I
> sql"
> words
>
>
|||You may be able to get away with Inflectional and a wildcard (is car*, or
auto*). This may also be problematic as you may get too many hits and
irrelevant hits like with FreeText.
There is no way to spill the thesaurus or inflectional terms in your query.
Lrtest does show you how the words are stemmed. You might also want to look
at Porter Stemming Algorithm to get an idea of how stemming works.
FreeText rank uses Okapi BM 25, Contains uses a variant of Gerrad Salton's
SMART algorithm.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"wkbia" <wkbia@.discussions.microsoft.com> wrote in message
news:822EAAB9-DBFB-47C9-BC06-C9CAC6FE95F4@.microsoft.com...
> Thanks for the reply, however neither solution works well for me. Neither
is
> intuitive enough...or maybe that's me.
> FREETEXT is too "fuzzy". It will produce a result of "car wash" from "auto
> wash" for example, but the result will also include results of all records
> that the thesaurus has an expansion on. Every record with car, auto, wash,
> etc will be included in the result for example.
> CONTAINS is more precise however it doesn't work well since you have to
> provide a THESAURUS of not only synonyms, but INFELCETIONAL words as well,
> ie., "cars", "automobiles", "servers", "administrators", etc. Creating a
> THESAURUS is a daunting task by itself let alone having to expand on all
> inflectionals as well and I "ain't" no english major if you know what I
mean.
> Is there an existing way to get an inflectional list for a word and then
use
> this result in a query? Same question for synonyms?
> If big brother MS has their ears on, would be nice to have functions that
> will do this. If you need help, let me know$$$. I don't have time to do
> develop this and it's killing me. Infelections should be programable and
not
> hard coded. Should also be able to define different instances of these as
> required in the query, database, or app. Same holds true for the
Thesaurus.
> Should have functions that will pull either synonyms or antonyms. I know
the
> question is why, but that's proprietary. Also, give us a Thesaurus and
> Dictionaries that we can hack as needed.
> Also, if someone could tell me how the heck rank and weight are ranked and
> weighed, I'd appreciate it. The results I'm getting don't seem to be
ranked[vbcol=seagreen]
> and weighed as I would have thought.
> Can't program myself out of a paper bag sometimes but still trying.
> Thanks in advance and again.
>
> "Hilary Cotter" wrote:
have[vbcol=seagreen]
Files\Microsoft[vbcol=seagreen]
http://www.google.com/search?hl=en&l...r+web+serv er[vbcol=seagreen]
query[vbcol=seagreen]
words,[vbcol=seagreen]
wash"[vbcol=seagreen]
wash"[vbcol=seagreen]
ry[vbcol=seagreen]
example,[vbcol=seagreen]
administrators -[vbcol=seagreen]
administrator",[vbcol=seagreen]
The[vbcol=seagreen]
of[vbcol=seagreen]
with[vbcol=seagreen]
2005.[vbcol=seagreen]
|||Thanks again Hilary but it will produce too many hits. There has to be a way
to do this since I have seen this basic functionality in production...just
wish I could ask them how they're doing it hence the reason for my post. Hate
to say it, but it sounds like their not doing it via SQL Server.
These ranking algorithms aren't working like I need them to. For example, a
freetext search with an expanded thesaurus as aforementioned will produce a
greater weight of one expanded word found twice in the record then just the
two words searched with one of which was expanded from in the thesaurus or
not and the other being an inflection etc.
Your knowledge and trenchancy however does impress me very much Hilary; it
appears to be tremendous in these regards.
Hope I can find an answer to this as it is really throwing a wrench in the
works.
Thanks anyway and again!
"Hilary Cotter" wrote:

> You may be able to get away with Inflectional and a wildcard (is car*, or
> auto*). This may also be problematic as you may get too many hits and
> irrelevant hits like with FreeText.
> There is no way to spill the thesaurus or inflectional terms in your query.
> Lrtest does show you how the words are stemmed. You might also want to look
> at Porter Stemming Algorithm to get an idea of how stemming works.
> FreeText rank uses Okapi BM 25, Contains uses a variant of Gerrad Salton's
> SMART algorithm.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "wkbia" <wkbia@.discussions.microsoft.com> wrote in message
> news:822EAAB9-DBFB-47C9-BC06-C9CAC6FE95F4@.microsoft.com...
> is
> mean.
> use
> not
> Thesaurus.
> the
> ranked
> have
> Files\Microsoft
> http://www.google.com/search?hl=en&l...r+web+serv er
> query
> words,
> wash"
> wash"
> ry
> example,
> administrators -
> administrator",
> The
> of
> with
> 2005.
>
>

没有评论:

发表评论