<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>alfven.org</title>
	<atom:link href="http://alfven.org/wp/feed/" rel="self" type="application/rss+xml" />
	<link>http://alfven.org/wp</link>
	<description></description>
	<lastBuildDate>Wed, 23 Nov 2011 22:10:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>PSA: Why using Dataset.value is discouraged in h5py</title>
		<link>http://alfven.org/wp/2011/11/psa-why-using-dataset-value-is-discouraged-in-h5py/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=psa-why-using-dataset-value-is-discouraged-in-h5py</link>
		<comments>http://alfven.org/wp/2011/11/psa-why-using-dataset-value-is-discouraged-in-h5py/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 05:17:09 +0000</pubDate>
		<dc:creator>ac</dc:creator>
				<category><![CDATA[h5py]]></category>

		<guid isPermaLink="false">http://alfven.org/wp/?p=50</guid>
		<description><![CDATA[Developing h5py has been one of the most rewarding programming experiences I&#8217;ve had.  H5py is a Python library which lets you read and write HDF5 files, which can be used to store all kinds of numerical data from bathymetry information &#8230; <a href="http://alfven.org/wp/2011/11/psa-why-using-dataset-value-is-discouraged-in-h5py/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Developing <a title="HDF5 for Python" href="http://alfven.org/wp/hdf5-for-python/" target="_blank">h5py</a> has been one of the most rewarding programming experiences I&#8217;ve had.  H5py is a Python library which lets you read and write <a title="HDF Group" href="http://hdfgroup.org" target="_blank">HDF5</a> files, which can be used to store all kinds of numerical data from <a href="http://vislab-ccom.unh.edu/~schwehr/rt/18-bag-hdf-xml.html" target="_blank">bathymetry information</a> to <a href="http://hdfeos.org/" target="_blank">NASA images of Earth</a>.</p>
<p>It&#8217;s also provided a number of lessons about the differences between how I perceive the software and how users actually interact with it.</p>
<p>One of the objects that h5py provides is a Dataset object.  It represents an HDF5 dataset, which is like a big multidimensional array which lives on disk.  One of the most useful aspects of h5py is that you can perform slicing or partial I/O on datasets; in other words, you can read just the parts you want instead of trying to fit the whole thing in memory.  If you have a dataset which is 1000 x 1000 in shape, and you want a little 10 x 10 square somewhere in the middle, you could do e.g.:</p>
<pre>data_out = mydataset[300:310,400:410]</pre>
<p>H5py will communicate your selection to the HDF5 machinery and only read the 10 x 10 slice off disk.  Experienced Python/NumPy users will recognize this as the standard NumPy slicing syntax, which h5py borrows for this purpose.  A large amount of code in the h5py &#8220;high level&#8221; interface is dedicated to supporting this feature.</p>
<p>For historical reasons, Dataset objects in h5py also have a little wart/property named &#8220;value&#8221;, which when accessed simply reads the entire dataset from disk and dumps it into an array.</p>
<p>Guess which one I generally see people using.</p>
<p>At first I was a little irritated (especially when I got some bug reports relating to poor performance using the &#8220;value&#8221; property!),  but I&#8217;m beginning to understand how some of the decisions I made encouraged people to interact with h5py like this.  First, some of the earliest example code I released used &#8220;.value&#8221; to read in data.  Other people developed their own code based on the h5py documentation (including the &#8220;.value&#8221; example), and posted their code online.  Second, it&#8217;s not obvious to someone sitting at the IPython prompt that Dataset objects support slicing.  It is obvious that they have a &#8220;.value&#8221; property.  Third, using .value generally works just fine.  Until, that is, you put it into a loop, or try to use it on a 200GB dataset.</p>
<p>Fourth, and back in the distant past, the reason .value was added in the first place, nobody knew how to read data in a scalar dataset!  Do you?  Create a NumPy scalar array (shape &#8220;()&#8221;) and try to get something out of it without using item().  Turns out the right way in the NumPy slicing syntax is (and this still looks bizarre to me):</p>
<pre>data_out = mydataset[()]</pre>
<p>So get the word out!  You can slice up Dataset objects just like arrays!  Free yourself from .value and the chains of the past!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://alfven.org/wp/2011/11/psa-why-using-dataset-value-is-discouraged-in-h5py/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Website redesign</title>
		<link>http://alfven.org/wp/2011/05/website-redesign/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=website-redesign</link>
		<comments>http://alfven.org/wp/2011/05/website-redesign/#comments</comments>
		<pubDate>Sat, 07 May 2011 22:36:22 +0000</pubDate>
		<dc:creator>ac</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://alfven.org/wp/?p=31</guid>
		<description><![CDATA[Alfven.org is now hosted via WordPress!  I have also updated the research page with more info about my past and current work. Also re-hosted is my side project HDF5 for Python.  The canonical URL for h5py remains http://h5py.alfven.org. &#160;]]></description>
			<content:encoded><![CDATA[<p>Alfven.org is now hosted via WordPress!  I have also updated the <a title="Research" href="http://alfven.org/wp/?page_id=2">research page</a> with more info about my past and current work.</p>
<p>Also re-hosted is my side project HDF5 for Python.  The canonical URL for h5py remains http://h5py.alfven.org.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://alfven.org/wp/2011/05/website-redesign/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

