<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Hans Engler's Weblog</title>
	<atom:link href="http://hansengler.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://hansengler.wordpress.com</link>
	<description>Mostly Math.</description>
	<lastBuildDate>Fri, 17 Apr 2009 12:52:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='hansengler.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Hans Engler's Weblog</title>
		<link>http://hansengler.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://hansengler.wordpress.com/osd.xml" title="Hans Engler&#039;s Weblog" />
	<atom:link rel='hub' href='http://hansengler.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Data Mining (Georgetown University, 25.04.2009)</title>
		<link>http://hansengler.wordpress.com/2009/04/16/data-mining-25409/</link>
		<comments>http://hansengler.wordpress.com/2009/04/16/data-mining-25409/#comments</comments>
		<pubDate>Fri, 17 Apr 2009 03:05:34 +0000</pubDate>
		<dc:creator>hansengler</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://hansengler.wordpress.com/?p=5</guid>
		<description><![CDATA[Plan for Saturday, April 25 What is Data Mining (DM)? (30 minutes) The data flood Areas of applications What can DM do? And how? And how well? The Netflix Prize Two technical examples (45 minutes) Wine classification Direct mail response prediction DM and law enforcement (45 minutes) Promises, including false ones Total Information Awareness (2002-2003) [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hansengler.wordpress.com&amp;blog=3103589&amp;post=5&amp;subd=hansengler&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h3><strong>Plan for Saturday, April 25</strong></h3>
<ul>
<li>What is Data Mining (DM)? (30 minutes)
<ul>
<li>The data flood</li>
<li>Areas of applications</li>
<li>What can DM do? And how? And how well?</li>
<li>The Netflix Prize</li>
</ul>
</li>
<li>Two technical examples (45 minutes)
<ul>
<li>Wine classification</li>
<li>Direct mail response prediction</li>
</ul>
</li>
<li>DM and law enforcement (45 minutes)
<ul>
<li>Promises, including false ones</li>
<li>Total Information Awareness (2002-2003)</li>
<li>DM = Profiling?</li>
</ul>
</li>
<li>DM and Privacy (30 minutes)
<ul>
<li>Is DM a threat to privacy?</li>
<li>How can this threat be controlled?</li>
<li>Privacy preserving DM</li>
</ul>
</li>
</ul>
<p><strong><span style="color:#ff0000;"><em>Post your questions, observations, suggestions in the Comments area!</em></span></strong></p>
<h3><strong>Definitions</strong></h3>
<ul>
<li>Wikipedia: <a href="http://en.wikipedia.org/wiki/Data_mining"><strong>Data Mining</strong></a> is the process of extracting hidden patterns from data.</li>
<li><span style="color:#000000;">Federation of American Scientists: <strong><a href="http://www.fas.org/irp/crs/RL31798.pdf">Data Mining</a></strong> involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets.</span></li>
<li><span style="color:#000000;"><span style="font-size:small;">From a blog I found with Google: <strong> <a href="http://knowledgeworks.wordpress.com/2008/02/26/data-mining-definition/">Data Mining</a></strong> is a process of</span><span style="font-size:medium;"> </span><span style="font-size:small;">extraction of    non-trivial patterns from massive datasets which either provides descriptive    insights of the data (not perceived without this extraction) or provides    actionable intelligence (in the form of reusable patterns which the process    extracted).</span></span></li>
<li><span style="font-size:small;">Greg Piatetsky-Shapiro: </span><a href="http://www.kdnuggets.com/faq/what-is-data-mining.html"><strong>Data Mining </strong></a>is the process of finding new and potentially useful knowledge from data. Data Mining is the art and science of finding interesting and useful patterns in data.</li>
</ul>
<h3><strong>Synonyms</strong></h3>
<ul>
<li>Knowledge discovery (KD)</li>
<li>Machine learning</li>
<li>Statistical learning</li>
</ul>
<h3><strong>Websites</strong></h3>
<ul>
<li><a href="http://www.kdnuggets.com/">KDNuggets</a></li>
<li><a href="http://www.sigkdd.org/">ACM special interest group on KD and DM</a></li>
<li><a href="http://www.netflixprize.com/">Netflix Prize</a></li>
</ul>
<h3><strong>Intro Reading</strong></h3>
<ul>
<li>Herb Edelstein&#8217;s &#8220;<a href="http://www.twocrows.com/intro-dm.pdf">Introduction to Data Mining and Knowledge Discovery</a>&#8221;
<ul>
<li>Read pages 1-10.</li>
<li>Read anything that interests you from pages 11-21.</li>
</ul>
</li>
</ul>
<h3><strong>DM, Terrorism, Crime</strong></h3>
<ul>
<li>The Electronic Pricavy Information Center on <a href="http://epic.org/privacy/profiling/tia/">TIA</a> (Total Information Awareness). <em>This program seems to be dead &#8230; but <a href="http://www.schneier.com/blog/archives/2006/10/total_informati.html">maybe not</a>.</em></li>
<li>Bruce Schneier on <a href="http://www.schneier.com/blog/archives/2006/03/data_mining_for.html">Mining for terrorists</a></li>
<li>Herb Edelstein&#8217;s <a href="http://www.information-management.com/issues/20030401/6512-1.html">views on TIA</a></li>
<li>Bruce Schneier on <a href="http://www.schneier.com/blog/archives/2007/08/police_data_min.html">DM use by the police</a> of Richmond, VA &#8230; with comments</li>
<li>Fraud detection for <a href="http://cs.fit.edu/~pkc/papers/ieee-is99.pdf">credit cards</a>, for <a href="http://www.stc.org/edu/47thConf/files/Data-Mining.pdf">electronic commerce</a></li>
</ul>
<h3><strong>DM and Privacy</strong></h3>
<ul>
<li>Counterterrorism programs using DM should be evaluated for effectiveness <span style="text-decoration:underline;">and</span> privacy impact &#8211; <a href="http://www8.nationalacademies.org/onpinews/newsitem.aspx?RecordID=10072008A">press release and summary</a> of  NAS report (2008)</li>
<li>A discussion of DM and privacy was initiated by <a href="http://www.dhs.gov/xinfoshare/committees/editorial_0699.shtm">DHS</a></li>
<li>The view of the <a href="http://www.aclu.org/privacy/gen/37088prs20081008.html">ACLU</a></li>
<li>A <a href="http://www.sigkdd.org/civil-liberties.pdf">letter from ACM</a> on DM and privacy</li>
<li><a href="http://www.sigmod.org/record/issues/0403/B1.bertion-sigmod-record2.pdf">Privacy preserving</a> data mining</li>
<li><a href="http://www.sigkdd.org/explorations/issues/10-2-2008-12/SocialNetworkAnonymization_survey.pdf">Privacy preserving publishing</a> of social network data</li>
</ul>
<h3><strong>Software and Datasets</strong></h3>
<ul>
<li><a href="http://www.cs.waikato.ac.nz/ml/weka/">WEKA</a> &#8211; free</li>
<li><a href="http://rapid-i.com/">RapidMiner</a> &#8211; free</li>
<li><a href="http://www.sas.com/technologies/analytics/index.html">SAS</a> Analytics</li>
<li><a href="http://www.salford-systems.com/">Salford Systems</a></li>
<li><a href="http://archive.ics.uci.edu/ml/">UCI</a> datasets</li>
<li><a href="http://www.sigkdd.org/kddcup/index.php">KDD Cup</a> annual competition</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hansengler.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hansengler.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hansengler.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hansengler.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hansengler.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hansengler.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hansengler.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hansengler.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hansengler.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hansengler.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hansengler.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hansengler.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hansengler.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hansengler.wordpress.com/5/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hansengler.wordpress.com&amp;blog=3103589&amp;post=5&amp;subd=hansengler&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://hansengler.wordpress.com/2009/04/16/data-mining-25409/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/170196f6d00157a89aaad01b5f7e9c16?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">hansengler</media:title>
		</media:content>
	</item>
	</channel>
</rss>
